2.1 Identification 161 R2R3-MYB genes with conserved domains in the genome of sea buckthorn
In order to accurately search genome-wide R2R3-MYB genes in sea buckthorn, we used Hmmsearch to search for gene sequences containing MYB conserved domain (PF00249). Finally, 230 MYB-related proteins were preliminarily identified in the sea buckthorn genome. According to the number of conserved domains, there were 161 R2R3-MYB proteins are selected. Among them, R2R3-MYB protein accounts for more than half of MYB protein, forming the largest number of MYB subfamilies [21]. According to the previous Arabidopsis MYB nomenclature[22], the 161 R2R3-MYB proteins were named HrMYB1 – HrMYB161 (Additional file 1: Table S1). R2R3-MYB subfamily comprised 70% of sea buckthorn MYB genes and revealed that it was the largest MYB subgroup, which same with previous studies on cabbage[21].
Meanwhile, we used same methods to collected R2R3-MYB genes of 20 species from lower plants(algae, mosses) to higher plants(Angiospermae),such as Populus trichocarpa, Pyrus bretschneideri, Physcomitrella patens, Coccomyxa subellopsoidea and other plants in Additional file 2: Table S2. Among these species, higher plants have more R2R3-MYB genes than lower plants. All these results indicated that the R2R3-MYB family members experienced a huge expansion in the process of higher plants evolution.
2.2 The classification, conserved domains, gene structure and motif composition of sea buckthorn R2R3-MYB gene family
In order to investigate the characteristics of the homology domains, the R2 and R3 amino acid domains of 161 HrMYB proteins were analyzed their multiple sequence alignment (Fig. 1; Additional file 3: Table S3.). In general, the two conserved domains are approximately 97 amino acid residues and are highly conserved. According to previously reported, insertions or deletions in the two conserved domains were barely seen[21]. Consistent with other species, there are characteristic amino acids in the R2 and R3 amino acid repeats of sea buckthorn. The R2 repeat included three Trp (W) residues and the R3 repeat contained two Trp (W) residues. It was shown that those continuous Trp (W) residues play a vital role in ensuring the integrity of the helix-turn-helix structure of the R2R3-MYB gene [23]. Some hydrophobic amino acids, such as Phe (F), Ile (I) and Leu (L), often replaced the first Trp residue of the R3 repeat in MYB,. Similarity, this substitution were also uncommonly observed at the third Trp residue of three HrMYBs (HrMYB2、HrMYB60 and HrMYB118) proteins R3 repeat. This change may affect HrMYB protein ability to recognize and directly combine DNA[24]. Not only contiguous Trp (W) residues in the MYB gene, but also many highly conserved amino acid residues in R2 and R3 repeats, such as Glu-8, Asp-9, Cys-44, Asn-50 in R2 repeat, and Glu-8, Gly-20 in R3 reprat. We also observed that Leu residue inserted into the R2 repeat of overwhelming majority of HrMYB proteins, which was different from MYB homologs in animal. This is an important step for the R2R3-MYB protein origin[25].
To gain the evolutionary relationship, we used MEGA7 to construct the ML phylogenetic tree for the amino acid sequence of 161 HrMYB proteins (Figure. 2A). According to sequence similarity and topology, the HrMYBs were divided into 28 subgroups, of which ten genes appeared alone and didn’t classified into any subgroup. The 28 subgroups were named A1-A28, and the number of genes in each subgroup was 2–12. Those with higher homology in the same subgroup.It is worth noting that there might be two or more HrMYB genes at the terminal nodes of each subgroup, which have high homology. This highly homologous protein might have similar functions[26]. In addition, there will also be low bootstrap support values in internal nodes, it’s could be due to the large number of taxon and few available features. The phenomenon that internal nodes in the MYB gene family phylogenetic tree had low bootstrap support values had also been found in many animal and plant gene families. It was speculated that this might be due to the number of genes and the relatively small information
The gene structure of sea buckthorn R2R3-MYB genes were analyzed by TBTools. The results indicated that with the exception of 8 genes, the coding sequences of most HrMYB genes were separated by introns (Fig. 2B). The number of introns in the coding region of the MYB gene ranges from 0 to 12. The length of introns of some genes is much larger than coding region. Overall, approximately 70% of the DNA-binding domain of the sea buckthorn MYB gene contained three exons and two introns. According to the grouping information in the phylogenetic tree, the genes in the same group all showed similar gene structures. For example, all no introns genes in 20 group, and in the 28 group, the intron and exon distribution patterns are the same. Analysis of gene structure also provides important evidence for the classification of phylogenetic trees. The online software MEME was used to analyze the motifs of sea buckthorn R2R3-MYB. Fifteen conserved motifs were finally identified from the HrMYB gene, which were designated as motif 1 to 15 (Fig. 2B; Additional file 4: Table S4; Additional file 5: Table S5). The results displayed that in the part outside the conserved domain, most MYB proteins sharing one or more motifs will be distributed in the same subgroup, showing that the structure of proteins in a specific subgroup was conservative (Fig. 2C). It is similar to the results of phylogenetic analysis, further indicating that the functions of those MYB genes clustered in the same subgroup might be similar[22]. As can be seen in Fig. 2B and 2C, it is indicated that the MYB gene in the same subgroup exhibited similar gene structures and conserved motif, which might have similar functions in the regulation of downstream target genes, and provided strong support for the possibility of MYB subfamily classification.
2.3 Chromosome localization, replication events and collinearity analysis among sea buckthorn R2R3-MYB genes
Chromosomal localization analysis of genes revealed that 161 R2R3-MYB candidate genes of sea buckthorn were distributed to 12 chromosomes, as shown in Fig. 3 and Additional file 6: Table S6. However, genes were not randomly distributed on all chromosomes and had a certain degree of bias. Among them, the largest number of MYB genes were distributed on chromosome 11, followed by chromosome 9 and chromosome 1, containing 27 and 22 MYB genes, respectively. In comparison, only two MYB genes are contained on chromosome 8. From the distribution pattern of genes in staining, the MYB genes of sea buckthorn were more concentrated on longer length chromosomes than shorter length chromosomes, such as chromosome 0, 1, 2, 9, 11. In addition to the different numbers of distributions, the uniformity of distribution across chromosomes was also different. In some chromosomes, the region with high MYB gene density was found, which often form lots of clusters. For example, HrMYB137 and HrMYB138 located on chromosome 9 were contained on a 24 kb segment.
Gene duplication occurred during plant evolution, helping to derive genes with new functions, which is consistent with previous studies[27]. Multi-gene families formed were caused by the occurrence of genome-wide polyploidization or region-specific replication of chromosomes. Figure 4 showed total collinear relationship among all HrMYB genes were obtained using MCScanX. In the end, we found a total of 123 pairs of highly similar paralogous genes with highly similar protein sequences (Additional file 7: Table S7). Collinear gene pairs appeared on all chromosomes of sea buckthorn, simultaneously, the sea buckthorn genome also included the intrachromosomal duplication. If the distance between the two or more genes was within 100 kb, which can be defined as a gene cluster[28]. Eight tandem duplications were identified in sea buckthorn (Additional file 6: Table S6). The number of segmental duplications in sea buckthorn was significantly higher than the number of tandem duplications, which had been found in many plants. This suggested that the occurrence of segment duplications might play a more critical role in the expansion of the sea buckthorn R2R3-MYB gene family[29].
2.4 Comparative phylogenetic analysis of R2R3-MYB gene family between sea buckthorn and Arabidopsis
To gain a deeper understanding for potential function of HrMYB gene family, 126 Arabidopsis and 161 sea buckthorn R2R3-MYB proteins were used to construct a combined ML phylogenetic tree (Fig. 5). Five HrMYB genes did not belong to any subgroups. This means that these MYB genes might have a special role in the evolution of the sea buckthorn genome[30]. The remaining MYB genes are divided into 38 subgroups with the number of MYB genes ranging from 2 to 18. We also used the Arabidopsis R2R3-MYB gene classification model obtained in previous studies[7, 22] to classify the subgroup of the sea buckthorn R2R3-MYB genes. In the phylogenetic tree, some Arabidopsis genes classified in the same group in are still in the same group (e.g. C10, C13 and C21). The HrMYB genes was also mixed into the other Arabidopsis MYB gene groups (e.g. C1, C2 and C3). Counting the members of each subgroup, the number of HrMYBs was significantly more than AtMYBs. However, a small number of subgroups contain more AtMYBs than HrMYBs (e.g. C5, C7 and C8). It was worth noting that subgroups C16, C18, C32, C35 and C36 contain 3, 4, 2, 6 and 2 AtMYBs, respectively, and didn’t contain HrMYBs. This might be due to a partial loss of function in evolution. In contrast, subgroups C15, C22 and C28 contained 6, 5 and 2 HrMYBs, respectively, and didn’t contain AtMYBs. This might be part of the special role that sea buckthorn gained during evolution. These findings suggested that R2R3-MYB genes in sea buckthorn experienced not only duplication but also losting some gene functions after the differentiation of sea buckthorn and Arabidopsis[30]. In addition, the genomic drift may also cause the gene loss and lineage-specific amplification[31].
2.5 Expression of HrMYB family genes in sea buckthorn
In order to further understand the expression level of HrMYB gene, the spatial-temporal expression pattern analysis results of sea buckthorn R2R3-MYB gene family were obtained using transcriptome data of 7 different tissue and hierarchical clustering methods (Fig. 6, Additional file 8: Table S8). The results of the spatial-temporal expression pattern analysis showed that the expression profiles of HrMYB gene were significantly different in different tissue samples. Among them, 158 HrMYBs were detected in at least one tissue, although some genes had low transcriptional abundance. Of these, 33 HrMYB genes were detected in 7 samples (FPKM > 1) indicating that they might play a regulatory role in multiple developmental stages. At the same time, the transcriptional abundance of the three HrMYB genes (HrMYB72, HrMYB116 and HrMYB126) were not detected in either sample, which may indicate that the genes were pseudogenes of sea buckthorn in different tissues. Some genes tended to be expressed in specific tissues. For example, HrMYB6, HrMYB20 and HrMYB89 were only expressed in root nodule, stem and fruit, respectively. These genes might regulate special biological processes that occur only in specific tissues, and they were ideal candidates for subsequent functional analysis. In addition, the expression levels of some HrMYB genes in sea buckthorn fruits at different developmental stages are similar (e.g. HrMYB53, HrMYB76, HrMYB83 and HrMYB152), while others were different at different developmental stages, with high expression at one or two developmental stage and low or no expression at other stages. For example, HrMYB63, HrMYB73 and HrMYB89 were expressed in fruit1 and fruit2, but the expression level in fruit3 was very low. The genes HrMYB139, was higher expressed in fruit2 than in fruit1 and fruit3. Therefore, the results indicated that the expression of the sea buckthorn MYB gene family was spatiotemporal specific.
According to the phylogenetic analysis of sea buckthorn and Arabidopsis (Fig. 5), the function of sea buckthorn MYB genes could be inferred by comparing with the MYB gene that has been verified in model plants. AtMYB69 could reduce SCW thickening both in terfascicular fibers xylaryfibers[32]. HrMYB60 and AtMYB69 were closely related in the phylogenetic tree, and only had high expression in root, which indicated that this gene might also be involved in regulating the development of SCW in root. Interestingly, there were high expression of HrMYB42, HrMYB63, HrMYB120 and HrMYB160 in different tissues. The four genes were from subgroup C1 which is known that this subgroup was involved in the response of Arabidopsis to abiotic stress [33]. This indicated that subgroup C1 might play a role in responding to abiotic stress during the whole growth and development of sea buckthorn. In addition, the transcriptome data of HrMYB genes could also provide more information for further study of gene function.
2.6 Putative functions of the sea buckthorn R2R3-MYB gene
Genes with similar functions usually belonged to homologous genes, and often clustered in the same clade or subclade in the phylogenetic tree[34]. However, paralogous genes usually exhibit different functional roles, which might be due to the fact that genes with similar functions can recognize similar target genes, and tihere might be also functional redundancy among them[35]. One of the most significant methods to predict the function of gene family was to identify the homologous HrMYB gene by phylogenetic analysis. By reason of Arabidopsis was a model species in plants, and all the gene of MYB had almostly studied. In particular, all 27 evolutionary branches of R2R3-AtMYBs had annotations[22, 36]. Therefore, in order to predict the function of the HrMYBs, a co-constructed ML phylogenetic tree by R2R3-MYB proteins of Arabidopsis and Hippophae rhamnoides were utilized in our study (Fig. 5). By comparing with the known functional model plant Arabidopsis MYB gene, the potential function of each HrMYB gene can be further predicted. Genes from the same subgroup sharing the same motif might play similar roles during plant growth and development[22].
The first category of MYB genes mainly play an important role in controlling of tissue traits, morphogenesis and organogenesis during plant growth and development, such as plant cells, anthers, stomatal cell development and embryogenesis, including six clades (respectively S9, S14, S15, S18, S19, S21, S25). Different types of Arabidopsis epidermal cells were affected by the MYB gene. However, the MIXTA-like transcription factor AtMYB106 of the S9 subgroup had a negative regulatory effect on the trichome branch of Arabidopsis roots[37]. HrMYB34 which belong to the same group as S9(C35) were also presumed to have similar functions. In addition, AtMYB21 and AtMYB24 (S19) and AtMYB33 and AtMYB65 (S18) jointly control the development and function of Arabidopsis anthers and pollen[38–40]. Based on the functions of these two clades of At-MYBs, it is reasonable to assume that HrMYB77, HrMYB123, HrMYB53, HrMYB3Which were in the same branch might have similar functions.
The second category is mainly responsible for regulating the stress response of plants to biotic and abiotic stresses, and helping plants to adapt to environmental unfavorable conditions, mainly distributed in seven subgroups (S1, S2, S11, S15, S18, S20, S22). The three proteins in subgroup 1 (AtMYB30, AtMYB60 and AtMYB96) have a certain stress response to both biotic and abiotic stresses and protect plants to grow and develop normally [41–44]. AtMYB15 in S2 was involved in allowing plants to achieve low temperature tolerance[45]. Two MYB genes (AtMYB102 and AtMYB41) located in S11 enhanced Arabidopsis resistance to insects[46, 47]. Many HrMYB proteins belonging to these six groups may also help plants to resist biotic and abiotic stresses and better adapt to the environment.
The third type of MYB gene mainly regulated the synthesis of plant secondary metabolites, including eight subgroups (S2, S3, S4, S5, S6, S7, S12, S31). Sea buckthorn was an important economic tree species, which fruits and leaves were rich in many secondary metabolites, among which flavonoids compounds had the most nutritional value. Flavonoids is a kind of crucial secondary metabolites, which is important for people's health and has potential medicinal value. The members of the three subgroups S2, S4,S6, and S7 played a regulatory role in the synthesis of different flavonoid compounds. AtMYB123 in S2 regulates the biosynthesis of proanthocyanidins (PAs) in Arabidopsis seed coats[48], and four genes in S6 are involved in the regulation of anthocyanin synthesis in vegetative tissues[49]. The synthesis of flavonols can be regulated in all tissues by AtMYBs in the S7[50]. The MYB proteins of the sea buckthorn clustered in the phylogenetic tree with the three subgroups have similar conserved motifs to the Arabidopsis MYB protein, suggesting that these HrMYB genes had the possibility involved in regulating flavonoid synthesis. AtMYB4 in subgroup 4, a transcriptional repressor, is regulated by aUV to regulate the biosynthesis of intracellular gluconate[51]. HrMYB22 and AtMYB4 were on the same terminal branch and had similar genetic relationships, so it is speculated that they may have similar functions. There were many HrMYB genes in these clades to regulate the synthesis and accumulation of secondary metabolites in itself.
In conclusion, according to the R2R3-MYB gene in Arabidopsis, it is speculated that R2R3-MYB of sea buckthorn also has a wide range of functions. They jointly affected the expression of tissue traits, morphogenesis and organogenesis, the production and accumulation of secondary metabolites in the growth and development of sea buckthorn, and helped plants cope with various biotic and abiotic stresses. So that plants could better adapt to the environment and maintain normal growth and development. The specific functions of the different sea buckthorn R2R3-MYB genes need to be further verified later.
2.7 Verification of HrMYB gene expression by QRT-PCR
The fruit of sea buckthorn is a kind of berry, rich in secondary metabolites, which is the most nutritious and economic value part. Therefore, to further verify the expression pattern of R2R3-MYB genes related to flavonoid biosynthesis during different fruit development, 14 HrMYB genes that might be closely related to flavonoid biosynthesis were selected. Then, their expression levels in five different developmental stages of fruits were determined by real-time quantitative PCR. Further explore the changes of transcript abundance of 14 genes in different developmental stages of fruits, and provide reference for subsequent functional verification. The results were obtained using QRT-PCR showed that the expression pattern of 14 HrMYB genes is closely associated with the change of flavonoids during the fruit development of sea buckthorn (Fig. 7; Additional file 9: Table S9). The results showed that some genes peaked only early in fruit development, such as HrMYB8, HrMYB9, HrMYB23, HrMYB27, HrMYB114 peaked at first stage and HrMYB110 which peaked at second stage. The expression levels of gene HrMYB157 peaked at first stage and gene HrMYB104 peaked at second stage, subsequently, their expression levels gradually decreased with fruit development. This trend was in line with the change of flavonoids content during sea buckthorn fruit development[52]. In addition, gene HrMYB11, HrMYB14, HrMYB19, HrMYB107 and HrMYB58 had higher expression levels at the first stage and lower at the second or third stage. Subsequently, the expression levels significantly up-regulated, expect for the HrMYB58, which the expression level had declined at the fifth stage. Consequently, we guessed that they might be negative regulators of flavonoids synthesis during sea buckthorn fruit development.[53]. In particular, the expression level of gene HrMYB80 showed a tendency of increasing at first stage and reaching a peak at the third stage and then decreasing with fruit development. Thus, we speculated the gene HrMYB80 may play a special role at the third stage of sea buckthorn fruit development.
Because the MYB transcription factors between herb and xylophyta have complex functions, the functional predictions performed in this paper may not be completely correct, and subsequent functional verification is needed to further determine their specific functions during plant growth and development.