Identification and Molecular Characterization of MYB Gene Family in Sweet cherry
Genome-wide analysis was performed to identify the R2R3-MYB genes in P. avium genome. The MYB HMM profile (Pfam: 00249) was being utilized as a query in a BlastP search against the sweet cherry genome database. Finally, 69 PavMYBs proteins in sweet cherry were identified and their sequences were further examined for the presence of MYB_DNA-binding domains by using the Pfam, Interpro tool, and SMART database. Single MYB DNA-binding domain was identified in 51 PavMYB genes, 14 PavMYB genes contained two domain and two PavMYB genes had four domains. Only PavMYB16 contained three MYB_DNA-binding (Table S1). In meanwhile, we discovered that PavMYB genes from the same subfamily having identical domain distributions and compositions elaborated that these subfamilies have had a similar evolutionary history. PavMYB proteins had amino acid lengths ranging from 63bp to 1056bp with average of 272bp (Table S1). Moreover, the molecular weight of PavMYB genes varied from 7421.6 to 117469.94 kDa with an average of 30452.16 kDa and the approximate Isoelectric point (IP) ranged from 4.09 to 10.44 with an average of 6.78 (Table S1).
Phylogenetic relationship of the Sweet cherry MYB Gene Family
To investigate the evolutionary relationship of the MYB genes in sweet cherry, a maximum likelihood method (ML-M) phylogenetic tree were constructed by using the full-length R2R3-MYB protein sequences from P. avium and Arabidopsis thaliana. We classified the R2R3-MYB members of sweet cherry into 28 subgroups (C1-C28) according to the sequence similarity and topology with Arabidopsis (Fig.1).The defined clades in Arabidopsis were labeled in the evolutionary tree. Subfamily C9 contained the highest MYB members (18), while the subfamily C3, C12, C14 and C24 had the lowest two MYB members (Fig.1 and Fig.2). MYB protein sequences from sweet cherry contributed in all subfamilies with Arabidopsis except C1 subfamily. Our results proposed that they may have been gene gained or lost events that happened through the evolutionary process. The loss and gain of particular MYB gene members instigated functional divergence [60, 61]. This phenomenon revealed that certain evolutionary events happened within MYBs of different species.
Conserved motifs, domain and intron number evaluation
A total of 69 MYB proteins from Prunus avium were checked against the Pfam database to validate that they contain the MYB DNA-binding domain. Each MYB gene's protein domains were determined (Fig.1 and Table S1). At least one MYB DNA-binding domain was found in all MYB genes that encode proteins, while 14 genes encoded proteins with two MYB DNA-binding domains such as Pav_sc0000480.1_g870.1.mk, Pav_sc0000464.1_g250.1.br and Pav_sc0000143.1_g310.1.mk. Two genes including Pav_sc0000124.1_g210.1.br, Pav_sc0000872.1_g190.1.mk contained four MYB DNA-binding domains while only Pav_sc0000800.1_g1130.1.mk had 3 MYB DNA-binding domains (Table S1 and Fig.1). We revealed that MYB genes from the same subfamily had equal domain distributions and compositions, implying that these subfamilies had a same evolutionary history. The position of introns was retained throughout evolution and might be implemented to determine a gene's phylogenetic relationship [62, 63]. We used intron structure analysis to figure out genesis of MYB gene family in sweet cherry. Most genes in the same cluster had common exon/intron structures, especially in terms of the number of introns, like I, II, III, and so on (Fig.6). Nevertheless, there were a few notable exceptions. The members of V and VI, for example, had varying numbers of introns. Intron/exon numbers ranged from 1-11/1-11 in sweet cherry (Fig.6 and S1).
Current findings suggested that the PavMYB subfamily has a lot of strongly conserved structures with in subfamilies while a lot of sequence diversity between subfamilies was observed. Genomic and CDS sequences were evaluated in order to investigate the structural diversity of MYB genes. The majority of MYB genes were found in clade "I" which contained 1 to 2 introns, while two members was found in clade "II," who has only intron (Fig.6). Furthermore, the number of intron varied from 1 to 11, the clade "VIII" having the most (1 to 11) accordingly (Fig.6). These findings suggested that it intron loss or gain happened during the evolutionary process of the MYB genes family. The conserved motifs were identified by using MEME web server to evaluate the sequence characteristics of MYB members. Finally, each comparison revealed twenty motifs, which were classified into 1 to 20. The majority of MYB genes, as specified in Fig.5, included several motifs (1, 2, 3, 4, and 5). Clade II particularly, had only three motifs (1, 4 and 5), while clade III members contained 2 motif (1, 4) but other clades had several (Fig.5). The motif compositions of most highly associated identical MYB genes indicating that MYB genes within the same subfamily have some functional similarities. Furthermore, we discovered a subfamily-specific motif that could play a key role in subfamily-specific activities. Moreover, several motifs, such as motifs 1and 2, were identified in practically all subfamily, suggesting that these motifs are crucial for MYB gene expression with similar functions. The evolutionary study of the MYB gene family was supported by the homogeneity in the compositions of motif and intron structure of MYB genes within the same group, whereas differences between distinct groups indicated their special functionalities.
Cis-Acting Element Analysis
Two complementary regulatory components exist in the plant transcriptional mechanism: (1) trans-acting elements (2) cis-acting elements. Trans-acting factors are transcription factors as well as other DNA-binding proteins that attach to particular regions in cis-acting elements to boost or repress gene transcription. The DNA sequence in the non-coding or coding regions of the genome are defined as cis-acting elements. Cis-regulatory elements play an important role in the management of regulatory networks, particularly multi-stimulus responsive genes and defining the stress-responsive expression patterns or tissue-specific expression patterns of genes was deeply linked with cis-elements in their promoter regions. The cis-acting elements on the promoter regions were classified into three major classes: phytohormones responses, biotic/abiotic stress responses and plant growth/development. Moreover, growth and development, cis-acting elements are prevalent in promoter regions, containing the Skn-1_motif, GCN4_motif, MRE, Box-4, CAT-box, O2-site and circadian. We identified the CAT-box motifs who encompassed the highest proportion (33%), which are useful for meristem expression, followed by the O2-site (30%) and Box-4 (23%) (Fig.3B) that are responsible for zein metabolic control and plant growth in response to light respectively [64]. The TGA-element is engaged in auxin sensitivity, GARE-motif and P-box for gibberellin response and ERE for ethylene responsive expression. The most prevalent cis-element in the second group was CGTCA motifs and a TGACG _motif cis acting elements associated to methyl jasmonic acid (MeJA) responsiveness [65-67]. The phytohormones response related cis-elements like ABRE (23%), P-box (6%), TGACG-motif (26%), TCA-element (15%), GARE motif (4%) and CGTCA-motif (26%) were also revealed (Fig.3B), which are linked with SA, ABA, ethylene and MeJA responses, signifying that these played key role in dormancy [68, 69]. Numerous stress-responsive elements were determined, such as ARE (44%), LTR (14%), MBS (28%), which are associated to light stress, cold and drought stress response respectively. These findings suggested that members of MYB gene family have the ability to improve cold stress response.
Gene duplication events, Syntenic Analysis and Expansion Patterns of the Sweet cherry MYB Genes
Expansion of Gene family and the development of novel functionalities are known to be aided by gene duplication (tandem and segmental) and divergence [67]. Duplicated genes generally mutate to acquire new functions or to divide ancestral gene functions which is essential for plant adaptation [70, 71]. An ancient whole-genome duplication (WGD) event occurred around 140 million years ago in apple [72]. We used MCScanX and circos software to visualize duplications within the genome of sweet cherry genome to evaluate the effects of duplication on the sweet cherry MYB genes family. Furthermore, PavMYB genes, we revealed 14 dispersed duplication pairs, 7 WGD duplication pair, 5 transposed duplication, 3 proximal duplication and two tandem duplication pairs (Fig.7 and Fig.2). This indicates that DSDs and WGDs play important role for MYB gene family expansion. These results revealed that duplications events contribute vital role in MYB gene expansion. Moreover, DSDs and WGD duplications happened with higher frequency for expansion.
To further understand the evolutionary relationship of MYB members in different plant species, the syntenic relationship was traced between PavMYB’s and homologous in other species including P.mume, M. domoestica, P. persica, P. bretschneideri and A. thaliana. These four plants belong to the Rosaceae family and shared a similar ancient. Consequently, the collinearity relationships were also analyzed between all MYB genes in four Rosaceae genome (Table S4). Total 359 collinear genes pair events were found between four Rosaceae species along with A. thaliana, included 57 orthologous pairs among sweet cherry and Japanese apricot, 106 orthologous pairs among sweet cherry and apple, 65 orthologous pairs among sweet cherry and peach, 91 orthologous pairs among sweet cherry and Chinese pear, and 40 orthologous pairs sweet cherry and A. thaliana (Fig.4) suggesting the closer relationship among four Rosaceae genomes (Fig.4b). These results recommend that there are collinearity connections between the sweet cherry and the other Rosaceae genomes, signifying a potential evolutionary process between them.
For better visualizing the evolutionary history, Ka/Ks-values was calculated to estimate evolutionary evidence of duplication occurrences in general (whole genome duplication or segmental duplication). Previous research has also been shown that the apple genome was formed by two phases of ancient whole-genome duplications around 140 million years ago (Ks 1.5–1.8) and a recent whole-genome duplication around 30–45 million years ago (Ks 0.15–0.35) [73]. The Ks values were determined to investigate whole or segmental genome duplication events with in MYB gene family (Table S5). The Ks values of duplicated gene pairs were determined. Ks average value was 0.2813 which revealed that MYB genes duplications might well have resulted from a recent whole-genome duplication. Furthermore, the Ka/Ks ratios are commonly utilized to reflect gene selection pressure and rate of evolution. Positive selection with faster evolution is indicated by Ka/Ks > 1, purifying selection with the functional constraint is indicated by Ka/Ks <1, and Ka/Ks = 1 shows that the genes are migrating naturally [74]. In sweet cherry, the Ka/Ks ratios of duplicated gene pairs <1 (Pav_sc0000625.1_g100.1.mk, Pav_sc0000464.1_g250.1.br, Pav_sc0002360.1_g910.1.mk, Pav_sc0001051.1_g240.1.mk) suggesting that MYB genes developed beneath severe purifying selection (Table S5). Moreover, in sweet cherry gene pairs like Pav_sc0000103.1_g410.1.mk (Ka/Ks ~3.37), Pav_sc0001938.1_g500.1.mk (Ka/Ks ~1.13), Pav_sc0001181.1_g940.1.mk (Ka/Ks ~1.85) Pav_sc0000464.1_g130.1.br (Ka/Ks ~1.28) and Pav_sc0000069.1_g100.1.mk (Ka/Ks ~1.63) had higher Ka/Ks ratios implying that this family might have a complex evolutionary history. For sweet cherry, the mean Ka/Ks calculated values for the TD, TRD, WGD, PD, and DSD gene pairs, which were 0.72, 1.05, 1.00, 0.70 and1.06, correspondingly (Fig.6, Table S5). The DSD and PD duplication events had a higher Ka/Ks ratio as compared to other mode of duplications, representing that that MYB genes have expansion and complicated evolutionary history (Fig.6). Positive selection occurs in certain sections of protein-coding genes at the same time, implying the emergence of novel gene functions. Strongest evolutionary constraints were engaged in the evolution history of the MYB gene which may contribute to gene function stability.
Chromosomal distribution analysis
We construct a map of chromosomal locations based on the genomic information of sweet cherry to illustrate the dispersal of the MYB members throughout the chromosomes (Table. S3). The MYB genes were found in a random pattern on 8 chromosomes, with the majority of them clustered near the tail intermediate end of a single chromosome. The highest MYB genes (15%) were found on Chr1, while, 14% were found on Chr2 and Chr3 which were organized into gene clusters (Fig. S1). Furthermore, chromosome 5 and 6 have 11.5% MYB genes, while chromosome 4 had 5% of total MYB genes. Moreover, the least MYB genes (2.8%) were found on chromosomes 8. Furthermore, on chromosomes 7, 5% MYB genes was traced and 13% genes were found on scaffold (Fig. S1).
Go annotation analysis
Go ontology annotation analysis was performed to predict subcellular localization and evaluated different functions of MYB protein in sweet cherry. 69 PavMYB proteins were categorized into 50 functional groups based on protein sequence similarities and categorized into four main ontologies, namely Biological process, Molecular functions, cellular component and subcellular localization. In the molecular process, we analyzed that more than 38.89% of annotated proteins functions in the activity of DNA binding, followed by nucleic acid binding (35.99%), protein-binding (11.59%) and ion binding (3.38%) (Fig.8). In the biological process, PavMYB members involved in the cellular nitrogen compound metabolic processes (11.66%) and biosynthetic processes (11.66%), stress response process (9.92%) followed by anatomical structure development(9.59%), signal transduction (9.36%), cell differentiation (8.98%) and reproduction (8.61%) (Fig.8). Subsequently, predicted PavMYB proteins are also annotated with cell cycle (4.99%), immune system process (4.51%), and cell morphogenesis (>1%) in biological process annotation .The cellular component annotations showed that the PavMYB proteins annotated with the intracellular, nucleus, organelles and having the same percentage (19.17%) and nucleoplasm (5.50%) (Fig.8). Furthermore, in subcellular localization, these results showed that most of the PavMYB proteins were contained in the nuclear, mitochondrial, extracellular, cytoplasmic and plasma membrane (81.10%, 10.80%, 4.10%, 2.70%, 1.40 respectively).
Transcriptomic analysis
RNA-seq datasets were used to assess the expression patterns of the MYB genes in different phases of dormancy under normal growth conditions. The amount of Transcripts Per Kilobase Million (TPM) were counted and compared across the investigated samples to obtain standardized RNA-seq data, which are visualized in form of heat map (Fig.9). Based on the current findings, MYB genes showed significant expression patterns on particular stage as shown in Fig.9. According to RNA-seq data MYB genes showed three types of expression level, each of which had genes that were predominantly expressed during distinct phases of bud dormancy (fig). The expression of nine MYB genes (Pav_sc0001807.1_g120.1.mk, Pav_sc0000659.1_g090.1.mk, Pav_sc0001968.1_g010.1.br, Pav_sc0000678.1_g150.1.br, Pav_sc0000028.1_g520.1.mk, Pav_sc0000877.1_g610.1.mk, Pav_sc0001181.1_g940.1.mk, Pav_sc0000910.1_g810.1.mk and Pav_sc0000354.1_g490.1.mk) were considerably higher across all stages of dormancy. The findings also revealed that MYB genes were seldom expressed in paradormancy than in endo dormancy (Fig.9). In all dormancy phases, majority of genes were showed highest expression level, this phenomenon revealed that these genes play key role for dormancy regulation. During data analyzing it was also found that some genes (Pav_sc0000009.1_g210.1.mk, Pav_sc0000464.1_g230.1.mk, Pav_sc0000308.1_g480.1.mk, Pav_sc0000638.1_g100.1.mk, Pav_sc0001576.1_g360.1.mk, Pav_sc0000886.1_g620.1.mk, Pav_sc0000110.1_g160.1.br, Pav_sc0000192.1_g040.1.mk, Pav_sc0000396.1_g530.1.mk, Pav_sc0001756.1_g010.1.mk) were remain silent during all dormancy stages. This phenomenon demonstrated that these genes are not involved in the dormancy initiation or dormancy elimination. These group of genes might be involve some other important functions in later stages of floral development in sweet cherry. Some genes like Pav_sc0000009.1_g270.1.mk remain silent in whole dormancy phases but show the aggression at dormancy release while some other genes like Pav_sc0000872.1_g190.1.mk and Pav_sc0000143.1_g310.1.mk had indicated peak expression in ecodormancy phase. These results illustrated that PavMYB genes also had stage specific expression patterns too.
Expression analysis
Moreover, for further investigate the expressional behavior of PavMYB’s in dormancy and in other later developmental stages, we chose 28 genes (one gene from each clade) to explore the expression levels in bud, flower, and fruit. The qRT-PCR results validated the RNA-seq data, indicating that these genes had different expression profiles at the bud, flower and fruit. Pav_sc0002433.1_g080.1.mk, Pav_sc0002360.1_g910.1.mk, Pav_sc0012310.1_g010.1.br, Pav_sc0001335.1_g540.1.mk, Pav_sc0001051.1_g240.1.mk, Pav_sc0004890.1_g010.1.mk, Pav_sc0000591.1_g010.1.mk, Pav_sc0001807.1_g120.1.mk, Pav_sc0000124.1_g210.1.br, Pav_sc0000254.1_g020.1.mk and Pav_sc0000311.1_g1150.1.mk expressions were considerably upregulated at bud stage, revealing that these genes are engaged in the dormancy. Pav_sc0001756.1_g010.1.mk, Pav_sc0000396.1_g530.1.mk, and Pav_sc0001576.1_g360.1.mk expression levels were identical to RNA-Seq data, their expression being elevated only at flower and fruit. No expression was observed on bud stage. Results demonstrated that these genes (Pav_sc0001756.1_g010.1.mk, Pav_sc0000396.1_g530.1.mk, and Pav_sc0001576.1_g360.1.mk) were remained silent in bud dormancy and show their expression in further stages. Some genes like Pav_sc0001938.1_g500.1.mk and Pav_sc0000800.1_g1130.1.mk had identical expression profiles all dormancy stages while remaining other genes had variable expression levels.