Rosaceae family members typically grow in cold condition and often subjected to cold stress tolerance. It is important to study the cold tolerance mechanism and the genes involved in the stress tolerance for these plants. In this study, we used computational approaches to identify and analyse putative cold stress responsive genes and their transcription factor binding sites in the promoter of Rosaceae plants. We obtained differentially upregulated gene information for cold stress from apple and strawberry to identify putative genes in eight other Rosaceae family species. A function annotation of these DEGs showed a variety of gene families such as transcription factors, cytochromes, kinases, transferases and membrane proteins. Majorities of the genes were transcription factors and most of them were from the groups of AP2/ERF and MYB transcription factor families. For other species, genes evolved from a common ancestor were traced using synteny analysis. There were a total of 1469 syntelogs from all ten species that were analysed in detail. When we compared the number of syntelogs predicted from Maloideae (2n = 34), Prunoideae (2n = 16) and Rosoideae (2n = 14) subfamily species, we noticed a direct correlation with genome size and chromosome number. Higher number of syntelogs were identified from Maloidae species. A high number of syntelogs were identified in both P. bretschneideri and P. communis from Maloideae subgroup compared to other species. Evolution of protein-coding gene families happens through events like WGD or segmental duplication, tandem duplication, and chromosomal and gene rearrangements. We observed more number of dispersed duplication events in M. domestica, could be due to recent WGD in Maloideae clade (30–45 MYA) compared to other plants [28]. Apart from the WGD events, other duplication events (like tandem, dispersed and segmental) has contributed to the repertoire of this syntelogs in these species. This suggests that evolutionarily cold stress response gene pool would have expanded and contributes to the cold survival among the Rosaceae family plants.
A function annotation and enrichment analysis of these potential genes showed many transcription factors in these groups, which play a significant role in plant development and stress tolerance. They act as regulatory proteins by regulating a set of targeted genes in a coordinated manner and consequently enhance the stress tolerance of the plant. AP2/ERF is an important transcription factor family that has a major role in response to cold stress. So far, many cold stress responsive genes and their gene regulatory network have been reported in plants. ICE-CBF-COR pathway is one of the most studied pathway related to cold stress in plant crops. CBF, a member of the AP2/ ERF family of transcription factors, are expressed in response to cold temperatures, which in turn, activates many downstream genes that leads to cold acclimation chilling and freeze tolerance in plants. Apart from these key regulators, many other TF families such as bHLH, WRKY, NAC and MYB also known to helps in regulating the gene expression under cold stress.
A cis-element is required in the promoters of stress-responsive genes for the expression under specific stress. The gene promoter analysis using STIFAL identified and classified popular abiotic stress transcription factor-binding sites for these putative cold stress response genes. There are 19 such models of cis-elements in STIFAL, based on abiotic stress response transcription factor families, which were built as HMMs and were validated using Jack-knifing method [48]. STIF predicted a total of 11145 TFBSs from the promoter sequence of 1408 syntelog genes. MYB is the largest and diverse group of TFs and often co-occur with other TFs. Hence, MYB classes were most abundant followed by bHLH, WRKY and AP2/ERF TF families. However, the trend remains almost similar when compared the occurrences of TFBS between Rosaceae species. CBF or DREB transcription factors, which belongs to AP2/ERF family, is the key regulator in the pathway, which binds to the DRE or CRT cis-elements in the promoter of CORs. The abundance of this important cold regulated transcription factor family in the dataset was revealed by functional annotation and enrichment analysis. We find few gene promoters that are enriched with various group of TF families, which could play role in multiple stress response or other functional roles. PP2C-type protein phosphatase gene from P. dulcis (Prudul26A011712P1) predicted 34 various TFBSs in 1000 bp promoter sequence. This includes MYB (20), NAC (2), AP2 (2), WRKY (2), ARF (2), bHLH (2) and HSF (4) TF family binding sites. Another gene, serine/threonine-protein kinase from P. dulcis (Prudul26A014996P1) predicted 32 cis-regulatory elements including MYB (7), NAC (4), AP2 (1), WRKY (4), ARF (6), bHLH (6), HSF (3) and bZIP (1). Apart from highly abundant MYB binding sites, tandemly repeated AP2 binding sites were observed in many of the promoters. It will be interesting to investigate the role of these genes in response to stress.
Further, we noticed few sequences from different Rosaceae species sharing highly similar promoter sequences. The TFBS pattern was conserved among those syntelogs. A higher amount of conservation was observed in closely related species in terms of position and combination of TFBS. Cytochrome p450 genes from Maloideae species P. communis and P. bretschneideri showed similar TFBS architecture (AP2 ~ AP2 ~ MYB ~ MYB ~ AP2 ~ AP2 ~ bHLH ~ MYB ~ MYB ~ MYB ~ HSF). Gene duplication events must have played a role in this conservation among closely related Rosaceae family species. There are also instances of similarities between different subfamily species, such as P. communis (pycom09g00070), a cytochrome p450 gene with Hypostatin resistance gene from P. dulcis (Prudul26A022009P1). These two different species genes showed same promoter TFBS architecture (WRKY ~ MYB ~ MYB ~ MYB ~ MYB ~ AP2). These similarities and differences in TFBS architecture in each syntelogs were further studied using an in-house algorithm ADASS. Overall, we find that for most of the species, M. domestica and R. occidentalis have higher percentage of association. Whereas, P. avium showed less association with M. domestica compared to other species. Even though the number of syntelogs were less in P. persica, it showed higher percentage with M. domestica. This analysis suggest that the trend is almost similar when we see the percentage of similar gene promoter sequences within threshold 0.4 across Rosaceae species.
There have been recent WGD events in the Maloideae and Prunoideae clades, therefore we can expect at the genome level. We noticed few TFBS patterns within same subfamily species were similar, whereas the patterns among syntelogs were divergent when compared across other subfamilies from Rosaceae species. This indicates that the similarity in the promoter region of the syntelog genes could be proportional to the evolutionary distance of the species.