Genome-wide identification of R2R3-MYB genes and classification in S. spontaneum genome
Based on the functional annotation of the Myb_DNA-binding domain (PF00249), a total of 418 MYB genes (695 alleles) were identified in the S. spontaneum genome by combining the HMMER program and NCBI-CDD database (Figure 1). The SsMYB gene family was classified into four distinct subfamilies, including 207 MYB-related (329 alleles), 202 R2R3-MYB (356 alleles), 3 R1R2R3-MYB (3 alleles), and 5 Atypical MYB (7 alleles) genes (detailed data presented in supplementary Table S1). Total 122 SbMYB-related, 125 SbR2R3-MYB, 3 SbR1R2R3-MYB, and 2 Atypical MYB genes were also identified to increase the understanding of SsR2R3-MYB genes (Table S3).
To analyze the plant MYB genes thoroughly, twenty species in 11 lineages were screened to construct a plant phylogenetic tree with S. spontaneum, including Green algae, Bryophyta, Gramineae, Cruciferous, Leguminous, Rosaceae, Solanaceae, and others. The tree topology reflected the phylogenetic relationship of these species and divergence time (Figure 1). Plant phylogeny showed that the higher plants possessed more MYB genes than the lower plants, such as green algae (e.g., Ostreococcus lucimarinus, Volvox carteri, and Chlamydomonas reinhardtii). A significant expansion of MYB genes was observed after the Cambrian (about 540~480MYA), demonstrating an explosive biological diversification episode near the early period [26]. Most of the phylogenetic nodes of plant species were observed in the Cretaceous, a geological period when a typical global warming climate contributed to the diversity of the terrestrial species [27]. Compared with the other four kinds of grasses, S. spontaneum had one of the largest MYB genes as predicted by PlantTFDB. One reason is the tetraploid nature of the autopolyploid S. spontaneum (mainly octoploid). However, when corrected for ploidy level, the number of SsR2R3-MYB genes in S. spontaneum was still significantly higher than most of the species, including Arabidopsis and other grass species. From green algae to bryophyte and land plants, the number of MYB genes increased. The phylogenetic analysis of the plant species using the number of MYB genes indicated the extending of MYB genes from lower to higher plants, consistent with previous reports [28].
A neighbor-joining phylogenetic tree of R2R3-MYB genes from O. sativa and S. spontaneum showed that the sugarcane genome contained 15 subgroups (G1-G15) (Figure 2, Table S2) with OsR2R3-MYB genes [29]. Sugarcane and rice diverged in the Paleogene (67-26MYA) (Figure 1); the short divergence time indicated relative conservatism of the ortholog genes. As expected, two species of R2R3-MYB genes were evenly distributed in the tree, and most genes in rice clustered with sugarcane, except for LOC_Os03g14100. However, the number of genes in each clade varied greatly; for instance, the biggest group, G4, contained 26 genes while the group G13 comprised just one SsMYB gene Sspon.02G0044740-1B. Twenty SsMYB genes from three unique subgroups, G7, G10, and G15, did not contain rice genes, indicating the species' genetic divergence. Besides, the clusters depicted that the sugarcane MYB family exhibited a greater number of genes than that in rice, showing a significant expansion of the SsMYB family.
Analysis of genomic location, gene structure, and regulatory elements
A total of 202 SsR2R3-MYB genes were named in turn according to their physical position on the chromosomes. MYB genes were distributed throughout all 32 chromosomes (Figure 3C); the autopolyploid S. spontaneum genome comprised of 8 homologous groups of 4 members each [3]. The chromosome distribution map showed that the location of the MYB genes was not evenly distributed. Most of the SsMYB genes were located on Chr3A and Chr7A, encompassing 19 and 16 genes, respectively. About 11 enrichment clusters, tiny fragments on genomic regions containing 3 MYB genes, were detected, and half of these genes contained MYB-binding sites (MBS) depicting potential interaction among each cluster. However, some chromosomes only contained a few MYB genes. For instance, five chromosomes, including Chr2C, Chr2D, Chr6C, Chr8B, and Chr8D, had only one MYB gene.
S. bicolor is one of the closest lineages of sugarcane, possessing relatively perfect genome data [30-31]. Total 125 SbR2R3-MYB genes were identified from the available sorghum genome using a similar method (Figure 1, Table S3). The diversity of the gene structure might be a shred of evidence regarding the evolution of gene families. The phylogenetically and gene structure analysis were performed by the Neighbor-Joining method using diverse gene information (Figure 3A and Figure S1). The distribution of the tree branches was basically consistent with the structural features of the genes. In many clusters, various sorghum genes were clustered with highly similar SsR2R3-MYB genes, e.g., SbMYB92 clustered with SsMYB149 and SsMYB156 while SbMYB27 was clustered with SsMYB30 and SsMYB44. These results sharpened our understanding of the evolution of gene events during sugarcane polyploidization. A total of 19 SsR2R3-MYB genes did not show the presence of intron, including SsMYB154, SsMYB188, SsMYB194, SsMYB170, SsMYB122, SsMYB182, and SsMYB189. Many MYB genes demonstrated a domain with a cross-intron structure.
Cis-elements in promoter regions play an essential role in controlling transcription and expression, and hence they can deepen the understanding of the regulatory function of MYB genes. Total 2000 bp upstream of transcription initiation site (ATG) was regarded as MYB gene promoters and submitted to the PlantCARE for predicting the motifs. Various motifs from 202 SsR2R3-MYB gene promoters were involved in various plant bioprocesses (Figure 3B). These diversified cis-regulatory elements could be divided into four main categories in terms of function: stress response, hormone response, light response, and plant growth and metabolism. A high percentage of MYB genes in the anaerobic induction (92%) and drought elements (58.9%) indicated that the MYB genes were more likely to function under these stresses. Moreover, a notable gene, MYB88, was found to have 10 LTR motifs, which is a cis-acting element involved in low-temperature responsiveness. The significantly enriched LTR elements (5'-CCG AAA-3') suggested that the MYB88 gene might be involved in plant metabolic response to cold stress. Many of the MYB genes regulate the plant hormone response, especially methyl jasmonate (MeJA) and abscisic acid (ABA) responsiveness. A total of 75 genes promoters enriched regulatory elements TGACG-motif (5'-TGACG-3') and CGTCA-motif (5'-CGTCA-3') involved in MeJA-responsiveness, while 38 gene promoters enriched regulatory elements ABRE involved in abscisic acid responsiveness. These MYB genes were predicted to regulate MeJA and ABA signaling in plants and function in plant defense and leaf abscission. Furthermore, more than thirty light response-related elements were predicted; for instance, conservative light element G-box was widely present in the upstream sequence of genes. Several regulatory elements were also associated with other functions in plant growth and development and regulation of seed growth and meristem development. Genes involved in seed-specific regulation contained the same RY-element (5'-CATGCATG-3'), and the elements involved in meristem expression demonstrated CAT-box (5'-GCC ACT-3') and NON-box (5'-AGATCGACG-3') in promoter regions. Finally, 119 genes were detected to be scattered on MYB binding sites, and 49 genes showed more than one binding site, suggesting that these genes probably interacted with other MYB genes. Four MYB binding elements were found in 202 SsR2R3-MYB promoters, including CCAAT-box (5'-CAACGG-3'), MBS (5'-CAACTG-3'), MBSI (5'-aaaAaaC(G/C)GTTA-3'), and MRE (5'-AACCTAA-3'). There was only one base difference between the former two elements, which accounted for 80% of the total MYB binding elements, suggesting the conservative nature of the sequence CAACG/TG of the MYB binding site. The autoregulation of plant transcription factors is common in one family, which showed sequence-specific interactions of the family [32-33]. Dof1 binds the PEPC1 promoter, but Dof2 blocks the transactivation of Dof1 [34]. Hence, these MYB genes with MYB binding site indicated the potential interaction effects.
Pervasive gene duplications
Duplication is a striking feature of the plant genome. Gene duplication in the R2R3-MYB gene family occurred during earlier evolution in land plants and contributed to its amplification [35]. We estimated gene duplication events in the S. spontaneum genome by collinearity analysis. A total of 274 collinearity pairs of SsR2R3-MYB genes were identified by Blastp for all protein sequences and evaluated with MCScanX, including 144 allelic pairs and 130 non-allelic pairs (Figure 4, Table S4). The collinearity relationships revealed that over half of the collinearity genes were concentrated in Chr 3 and Chr 7. The duplication events for MYB genes were predicted. Total 91 (25.84%) genes were tandem repeats, of which one-quarter of genes were located on Chr 7. Furthermore, 146 (39.88%) genes were identified to derive from segmental duplication events; 28.1% genes on Chr 2 and 33.5% on Chr 3 evolved from segmental duplication (Figure 4, Table S5). Segmental duplication played a critical role in the evolution of S. spontaneumMYB genes, similar as in the other species. Totally, 66.5% of the R2R3-MYB genes derived from gene duplication events, driving the MYB gene family expansion.
Temporal and spatial expression of the R2R3-MYB gene family
To characterize the expression profiles of MYB transcription factors, the temporally and spatially expression profiles of 202 SsR2R3-MYB genes were analyzed using a total of 50 RNA-seq data among three transcriptome models, including tissue and developmental stages, leaf developmental gradient, and circadian rhythm. The expression heatmap showed that most of the MYB genes had low expression levels, but 71% of gene expression values were greater than 1 (FPKM) in at least one RNA-seq sample (Figure 5A, Table S6). Expression values of 15 MYB groups were presented in Table S6, and G14 genes seemed to be expressed greater than the other groups.
Five different expression patterns, i.e., C1-C5, were investigated on the tissue and developmental stages transcriptome by K-means (Figure 5B). A total of 85 SsR2R3-MYB genes belonging to the C1 and C3 clusters had low expression value, particularly C1 genes with almost no expression. On the contrary, C2 cluster genes displayed a relatively higher expression level in all developmental periods of leaf and stem. Interestingly, 37 genes of the C4 cluster were highly expressed in the stem during the seedling stage, the early stage of the stem formation (Figure S2A). Moreover, in the C5 cluster, 35 genes were highly expressed in the stem during each period, probably playing a regulatory role in the stem development (Figure S2B). The clusters indicated that the gene expression levels in the stem as a whole were significantly higher than those in the leaves, suggesting SsR2R3-MYB genes might play an important role in stem tissue. The relative expression of SsMYB43, SsMYB52, SsMYB65, SsMYB78, and SsMYB99 were quantified by qPCR, verifying the results of RNA-seq data (Figure S3B); additionally, SsMYB3, SsMYB15, and SsMYB157 predominant expressed in the early stage of stem formation depicted as prophase of the stem (Pro-stem), which was much higher than other stem nodes and leaf tissues (Figure S3A).
Sugarcane is a typical C4 plant with high light use efficiency. The developmental gradient model of grass leaves could be used to study C4 photosynthesis and its regulatory factors [36-38]. The regulatory role of SsMYB genes on C4 photosynthesis was investigated on the developmental dynamical transcriptome of sugarcane leaf. As suggested by the C4 photosynthetic development model, leaves are gradually differentiated for active photosynthesis [36]. A total of 27 differentially expressed SsR2R3-MYB genes were detected by the leaf developmental gradient alone, and most of the genes (class I) showed an expression profile, illustrating high value in the early stage of leaf development (Figure S4). Only three genes SsMYB169, SsMYB181, and SsMYB192 in class II (Figure 5C, Figure S4), were identified as putative C4-related transcription factors using the method that associated the co-expression pattern with the photosynthetic activity [37]. The expression increased with the development of C4 photosynthesis and displayed the highest accumulation at the leaf mature zone. Interestingly, SsMYB181 and SsMYB192 shared one haplotype gene Sspon.07G0015250 with SsMYB169, as the tandem genes SsMYB181 and SsMYB192 derived from a gene duplication event. Circadian rhythm is another module to study photosynthesis, in which previously identified C4-related regulators could also be verified. Nine SsR2R3-MYB genes showed a significant association of expression profile with the light-dark cycle (Figure 5D). These genes were divided into three types, containing three genes each type. The expression level of SsMYB169, SsMYB159, and SsMYB153 tailed off during the daytime until around 6:00 pm, and then it gradually recovered till the next cycle. However, the expression profiles of SsMYB48, SsMYB57, and SsMYB158 were just opposite to the expression pattern of the former, rising during the day and falling at night. Unexpected but reasonable, the preliminarily identified three C4-related regulators, SsMYB169, SsMYB181, and SsMYB192, also showed daylight expression pattern, hinting at their involvement in the regulation of circadian rhythm. This strong evidence showed that the three candidate MYB transcription factors were associated with C4 photosynthesis.
MYB genes involved in response to drought and disease-induced stress.
The expression patterns of SsR2R3-MYB genes were evaluated under environmental stress (biotic and abiotic stress). Six SsR2R3-MYB genes with significantly differentially expressed genes (SDEGs) were responsive to drought induction (Figure 6A, Table S7). The transcripts of four genes, SsMYB54, SsMYB36, SsMYB61, and SsMYB48, rapidly accumulated after drought treatment, but their expression reduced to normal after rewatered. On the other hand, SsMYB29 and SsMYB166 showed the opposite trend. Further, the upstream regulatory elements of these six genes contained the MBS element (5’-CAACTG-3’), which was identified as MYB binding site involved in drought-inducibility. Half of these genes retained more than one MBS.
Pokkah boeng disease of sugarcane (PBD) is one of the most serious and devastating diseases caused by the Fusarium species complex, a fungal pathogen [39-40]. Nineteen19 different MYBs were associated with sugarcane PBD-infection and response (Figure 6B, Table S7). According to the gene expression trends, these genes could be divided into 14 genes with increased expression in defense response and the other 5 genes with reduced expression.
Sugarcane mosaic disease is a highly transmissible viral disease present in the cane-growing regions worldwide. Sugarcane mosaic virus (SCMV), belonging to the positive-sense single-stranded RNA viruses, reduces yields by damaging chloroplast and blocking photosynthesis [41-42]. After SCMV infection, 10 SsR2R3-MYB genes expression increased, and one gene, MYB176, decreased, suggesting that these MYB genes were involved in defense against SCMV infection (Figure 6C, Table S7). We discovered that these MYB genes were unique to sugarcane diseases, indicating the defense specificity of MYB genes for conferring the resistance of sugarcane pokkah boeng and mosaic disease.
Functional characterization
The potential function of SsR2R3-MYB genes was predicted on the identified genes with significantly specific expression. Fifty-six SsMYB genes were involved in seven plant bioprocesses (Figure 6D), of which six MYB genes only expressed during seeding stem and were possibly involved in stem differentiation and formation (Figure 6A). Three MYB genes were identified as candidate C4 photosynthesis regulators, and nine genes responded in the circadian clock. Under diverse stresses, it was seen that six, nineteen, and ten SsR2R3-MYB genes responded to drought, pokkah boeng disease, and mosaic disease, respectively. Notably, SsMYB51 and SsMYB162 illustrated different expression changes between two sugarcane diseases (pokkah boeng and mosaic disease). SsMYB162 significantly accumulated, actively responding to the infection of two diseases (Table S7). However, SsMYB51 showed a different expression pattern, negatively responding to pokkah boeng but positively answering SCMV. Moreover, 13 MYB genes had more than one putative function, indicating their role in diverse plant bioprocesses (Figure 6D).
Allelic expression dominance drove SsMYB to function in stem
The transcriptional levels of R2R3-MYB allelic genes were compared among different tissues and different developmental stages to investigate the transcriptome dynamics of R2R3-MYB genes in the allopolyploid across eight homoeologous chromosome pairs, of which 25% of the R2R3-MYB genes displayed allelic expression dominance in all samples. The number of expression dominant genes in the A, B, C, and D genomes was 84, 93, 82, and 79, respectively. Further, the allelic genes were compared in pairs, including A-B, A-C, A-D, B-C, B-D, and C-D (Figure 7a). Both the number of dominant genes in a single set of homoeologous chromosomes and the pairwise comparison of alleles showed no significant allelic dominance. Captivatingly, the number of dominant genes in the stem was more than that in the leaf in each allelic pair comparison. For four sets of homoeologous chromosomes, the percentages increase was 46.5%, 90.6%, 10.2%, 143.4%, corresponding to A, B, C, and D genomes, respectively, and the overall average rise was 64.5%. The transcriptional expression of allelic genes in the stem tissues showed significant differences among different alleles than those in the leaf tissues. Allelic expression dominant genes derived predominantly from stem transcriptomes. Selective pressure analysis demonstrated that Ka/Ks values of expression dominance MYB genes in the stem were higher than in the leaf, indicating tissue specificity (Figure 7B). In contrast with the neutral genes, the Ka/Ks values of differential expression genes were higher, while the subordinate genes exhibited top Ka/Ks values (Figure 7C).