Evolutionary Signicance of RAV Family in 25 Plants from Chlorophyta to Angiospermae

Background: Plants are easily affected by various environmental factors such as temperature, high salinity, and drought which will cause poor plant development and reduce crop yield. Transcription factors (TFs) are related to the stress resistance found in many plant species and play vital roles in the growth and development of plants. Among them, the RAV (related to ABI3/VP1) gene family, which is a plant-specic type of transcription factor containing AP2 (APETALA2) DNA binding domain and B3 DNA binding domain, is the focus in this research. Results: There more RAV genes/proteins (RAVs) were present in Gymnospermae and Angiosperm, a few RAVs in Bryophyta and Lycophyte, and no RAV gene was predicted in Chlorophyta. RAVs could be dened as three main groups, based on our phylogenetic analysis. The members of clade I included all plant stages from Chlorophyta to Angiosperm except Monocots. Clade II only contained eudicots plant species. Clade III, as a branch consisting of monocots plant species and Marchantia polymorpha, may help us explore the evolutionary relationship of RAVs between Bryophyta and Monocots. According to our results, it is shown that RAVs rst appeared in bryophytes. While the vasculature continues to develop and evolve, RAV gene family also has functional differentiation and has formed new functions due to the existence of gene replication events. Conclusions: Our ndings indicate that the RAVs originated from Bryophytes, and RAVs of these plant species had obvious divisions in the evolutionary tree, which revealed the evolution of RAVs. These results have guiding signicance for further research.


Background
As plants continue to evolve, various gene families are constantly evolving. In particular, the numbers of TF families in plants usually grows explosively because of the rapid changes of the environment, which will lead to the formation of family members with new functions or the formation of new transcription factor families to adapt to the new surrounding ecological environment [1][2][3]. Algae, as the ancestors of plants, are usually aquatic plants, are closely related to the land plants, and play an extremely important role in plant evolution [4]. Affected by the ecological and environmental conditions, the transformation of plants from aquatic to terrestrial is extremely challenging for physiological and genetic adaptation. Studying the evolution of plants will provides an important basis for understanding the growth and reproduction of plants [5].
The research on TFs in the plant kingdom is relatively rich, but is mostly limited to the study of seed plants [6].
In order to advance our knowledge to the evolution of TFs in plants, research on early plants is essential. For example, compared to other land plants, the early plant Marchantia polymorpha has different levels of TF diversity [7]. Since the release of the whole-genome sequences of increased numbers of plant species, including algae, moss and other species [6], we can use these data to study and speculate on the evolution and gene function of the transcription factor families in plants.
Plants are continuously affected by various environmental factors such as temperature variation, high salinity, and drought during their growth process. The study of transcription factors (TFs) that can improve the stress resistance of plants by regulating the expression of multiple stress-responsive genes has become a popular research. The members of the AP2/ERF superfamily which is one of the largest TF families are widely involved in regulating plant growth and development as well as abiotic and biotic stress responses [7,8], and have one thing in common, that is, they contain at least one AP2 domain. AP2/ERF superfamily can be divided into 4 types: ethylene response factor (ERF), AP2, RAV, and soloists. The RAV gene family is a family of TF genes with two characteristic domains, AP2 and B3, where the AP2 domain is at the C-terminus and the B3 domain is at the N-terminus [8][9][10]. The RAV family, which is unique to plants, also belongs to the B3 superfamily because it contains a B3 domain composed of approximately 110 amino acids [11]. RAV1 and RAV2 in Arabidopsis thaliana are the earliest found members of the RAV family [12].
There are RAVs in Arabidopsis [13], and this family has been studied in many plant species. AtRAV1 takes an important part in actively regulating Arabidopsis leaf senescence [14]. In soybean (Glycine max), GmRAV is related to photosynthesis and regulates plant development, especially in terms of regulating senescence of plants [15]. In short-day (SD), soybean GmRAV and Arabidopsis RAV1 negatively regulate the owering and hypocotyl elongation, and the overexpression of soybean GmRAV will cause the leaves, roots and stems of soybean grow slowly, implying that they act in controlling plant growth [15][16][17]. RAVs in tomato (Solanum lycopersicum) may act as an intermediate TF, somehow linking AtCBF1 to the genes related to pathogenesis, thereby enhancing tolerance to bacterial wilt (BW) [18]. The results of RT-PCR under Verticillium dahliae stress indicated that RAVs are related to the stress response in cotton (Gossypium hirsutum) [19]. The overexpression of AtRAV1/2 may enhance the drought resistance in cotton [20]. The overexpression of CARAV1 of pepper (Capsicum annuum) in Arabidopsis improved the resistance to high salt stress [21]. In addition, the expression levels of AtRAV1 and RAV1L in Arabidopsis were decreased under 24-epibrassinolide (epiBL) and ABA treatments, respectively, and transgenic Arabidopsis overexpressing AtRAV1 is not sensitive to ABA [16,22].
These research results revealed that RAVs may take a part in growth and development of plants and resistance to stress. In view of the importance of RAVs in plants, in this study, the RAVs in 25 plant species (including each evolutionary stage) will be identi ed, and the gene structures, protein features, evolution relationships, gene duplications, and Gene Ontology (GO) annotations of RAVs will be analyzed to reveal the evolution of RAVs in the plant kingdom.
Some information about RAVs was provided, which included the gene information (gene ID, gene location, and numbers of exons and duplication events; Additional le 1a), protein information (protein length, molecular weight, pI, etc.; Additional le 1b), protein features (conserved domain location, numbers and diagrams of motifs, predicted subcellular localization, transmembrane domains, and signal peptide; Additional le 1c). These data play an important role in understanding and researching proteins for us.
We analyzed the location of motifs, B3 and AP2 domains, and the number of exons in each RAVs. While for the most part exons numbered 1-3, and in C. rubella, CbRAV4 contained 11 exons, which was obviously different from other genes (Additional le 1a). Generally, the number of exons of the same family member in any plant is relatively stable, and is not related to the length of the sequence [23].

Investigation of gene duplication events
The formation of new functional genes and the function differentiation of any gene are inseparable from gene duplication events [25]. We analyzed 97 RAVs of each plant species, and the gene duplication events were identi ed in 10 species (Fig. 3). There were no duplication events from Bryophytes to Gymnospermaes. A. trichopoda, C. sativus, S. tuberosum, and T. aestivum exhibited no duplication events, either. Six plant species only had tandem or segmental duplications, and 4 species had both. The highest number of duplication events and segmental duplication events existed in B. rapa, and that of tandem duplication events occurred in G. raimondii (Table 1, Additional le 1a). The duplication events only occurred in Eudicots and Monocots. The number in Eudicots was higher than in Monocots. The tandem duplications were less than the segmental duplications. Additionally, the duplication events of eudicot species in Clade II were more than that in Clade I, which suggested that RAV protein in Clade II may act a pivotal part in the growth and development in eudicots plant species.
It is necessary and useful to calculating the ratio of non-synonymous nucleotides to synonymous nucleotide substitutions (Ka/Ks) for us to understand deeply the evolution of gene families. In total, 47 gene pairs were analyzed using DnaSP6 [26]. Ka/Ks ratio values were much below 1.0 in all gene pairs (Additional le 2). We also calculated Ka/Ks ratio values of all RAVs in each plant species (Additional le 3). Overall, in addition to the Ka/Ks values which could not be calculated, the highest Ka/Ks value was 2.70 which appeared between PaRAV13 and PaRAV16 in P. abies (Additional le 3), and that of the other gene pairs were all less than 1.0.
Ninety-seven proteins were annotated with speci c reference, including biological process (BP), molecular function (MF), and cellular component (CC). According to the results of gene annotation, eight aspects of BP were annotated to 97 RAVs that were linked to (negative) regulation of transcription and DNA-templated. Six proteins in Arabidopsis thaliana participated in ethylene-activated signaling pathway. AtRAV1, BrRAV9, CbRAV1, GrRAV3, MpRAV1, OsRAV3, PhRAV2, SmRAV2, and StRAV1 were related to response to brassinosteroid, negative regulation of ower development, leaf development, and lateral root development.
AtRAV3, BdRAV1, BrRAV13, GmRAV1, and SlRAV3 were associated with photoperiodism and owering. While, CbRAV4 in C. rubella was also predicted be related to proteolysis. Six kinds of MF were annotated to these RAVs. Among them, all 97 RAVs were associated with transcription factor activity (sequence-speci c DNA binding) and DNA binding (or transcription regulatory region DNA binding). Further, CbRAV4 were also related to metal ion binding, and metalloexopeptidase and aminopeptidase activity. Only one aspect of CC was annotated to 97 RAVs. Fourty-nine RAVs, which were distributed in various plant stages, might involve in CC, with the location in nucleus (Fig. 4, Additional le 5).

Analysis of RAVs
RAVs belonged to the AP2/ERF superfamily [27,28], we took several representative plants in different evolutionary stages as the research objects to analyzed the evolution of RAVs. The RAV family has been identi ed in many species, such as sweet orange, Vitis vinifera, rice, maize, wheat, soybean, and so on [13,[29][30][31][32][33]. We integrated these results, and we searched and identi ed RAV family again using the methods described in this article. The result we identi ed was almost the same with that in other articles. However, 13 RAVs were identi ed in soybean [32], several gene which did not contain the AP2 domain should be excluded. In maize, 4 RAVs were screened by Zhou et. al using more kinds of database, which was different with the method we used [13].
The RAVs are widely associated with the plant growth and development and the regulation against adverse reactions. The overexpression of AtRAV1 and AtRAV2 in cotton can enhance the ber length differentially under drought stress and delays owering [34]. The overexpression of CARAV1 which is response to the pathogen infection from pepper increases the tolerance to drought and salt, and enhancing the sensitivity to ABA in transgenic Arabidopsis [35]. A RAV-like protein from soybean is involved in the process of photosynthesis and senescence. Transgenic Nicotiana tabacum (tobacco) plants overexpressing the RAV-like gene will have a slow growth rate, delayed root elongation, delayed owering time, and a signi cantly reduced chlorophyll content in the leaves [15]. The RAV family has been identi ed and studied mainly in eudicots plant species and monocots plant species, especially in Eudicots, but rarely in other plant stages. In Bryophytes and Gymnospermaes, no research related to RAVs is found, while the genome sequences of S. moellendor i in Lycophytes and A. trichopoda are reported, respectively [36,37]. Although RAVs have been identi ed in several other early plants, there has been no further research, either [38]. While we can predict the function of more RAVs using the database of the Phytozome website with the previous conclusion [40]. Because the RAV protein structures and features in Bryophyta did not vary greatly (Fig. 1, Additional le 1b,c), we inferred that the intron losses that occurred during plant evolution was the reason of the number differences of exons between different species. The phenomenon has been con rmed in previous study [41]. However, CbRAV4 had 11 exons which is completely incompatible with the plant evolution condition, and we have not been able to nd relevant studies to con rm this phenomenon.
Combined with the results of conserved domain analysis, we suggest that CbRAV4 may have evolved into a protein of other families (MetAP family) or formed new functions [42], so that it has formed a signi cant difference from the RAV family.

Evolution of RAVs
RAVs are divided into 3 main groups, which is consistent with previous research [32]. Phylogenetic results showed that most non-vascular plants (Bryophytes), A. trichopoda, P. abies and some eudicots plant species were composed of Clade I. Clade II only contained eudicots plant species. The only RAV gene in A. trichopoda was located in Clade I, which could be explained that because in A. trichopoda, which belongs to the Angiosperm and is a sister of owering plants, six exogenous genomes constructed the mitochondrial genome, one from moss, two from other owering plants, and three from green algae (Chlorophytes did not contain RAVs) [43]. The RAVs of eudicots plant species may be derived from the Bryophytes, and have formed the proteins belonging to Clade II with special functions during the evolution process. However, the RAV members of the monocots plant species and M. polymorpha belong to Clade III, which indicates that the RAVs in Monocots may also have evolved from Bryophytes and have common ancestors with Eudicots, but the protein functions are differentiated during the evolution process in Angiosperms. Moreover, there are the most RAVs in Gymnosperms, which may mean that there are more obvious evolutions in Gymnosperms than that in Angiosperms. In general, phylogenetic trees suggest that vascular plants may have a common ancestor, but the ancestor of Monocots is inherited from the MpRAV1 protein of Clade III in M. polymorpha, while the ancestor of other seed and non-seed plants may be inherited from the proteins from Bryophytes which are located in Clade I (Fig. 2). With the development of vascular plants, the number of RAV members has increased dramatically (Fig. 3). In addition, due to the use of different protein databases, selection of different species to study, and using a different website/program to analyze, all will affect the analysis results, so the above views need be con rmed by more evidence.
Different groups of RAVs may have different functions in different plant species. The overexpressing of AtRAV1 in Arabidopsis results in a retardation of lateral root and rosette leaf development, and the expression inhibition will lead to an earlier owering phenotype, indicating that AtRAV1 may function as a negative regulatory component of growth and development [16]. The mRNA level of AtRAV1 from Arabidopsis showed an increase, the highest level, and a decrease in the three leaf stages of late maturity, early senescence and late senescence, which suggested that AtRAV1 has an signi cant role in regulating leaf senescence [14]. In transgenic tomato plants, with the overexpression and silencing SlRAV2, the expression levels of SlERF5 and PR5 increased and decreased, which enhanced or weakened the BW tolerance, respectively [18]. The BnaRAV-1-HY15 TF in B. napus is highly identical to AtRAV2 in Arabidopsis. The expression of BnaRAV-1-HY15 will be induced by cold, salt and PEG treatments, and is insensitivity to ABA [44]. The more similar the distribution of genes in the same family on the phylogenetic tree, the more similar their functions in different plants [45]. The same class probably means similar characteristic of RAV family. Therefore, we can speculate the function of a RAV gene in any species based on the existing research results, thus providing effective ideas and directions for further research [39].

Gene duplication events, Ka/Ks ratio, and GO annotation
In all living organisms, gene duplication events are a very common matter, which provides a basis for the formation of new functional genes and the function differentiation of genes in organisms. This also leads to species differences and species speci city [46,47]. Our nding indicated that gene duplication events appeared only present in Eudicots and Monocots with the evolution of plants. There was no gene duplication in other species because of a small number or none of RAVs. But in P. abies, no gene duplications were identi ed from 16 RAVs, either. The evolution of RAVs was also related to gene duplications. The increasing of the duplications of RAVs following the evolution of plants, providing evidence for con rming the roles of this family in plant evolution. In addition, this phenomenon that gene duplication events only exist in Eudicots and Monocots also is consistent with the appearance of the function differentiation of RAVs.
The Ka/Ks values calculated by DnaSP software indicated all gene duplication events were suffering purifying selection, which meaning that these plants eliminate deleterious mutations during evolution process, and keeping the protein as it is [48]. We calculated Ka/Ks values in each of the plant species' RAVs, and found that except for PaRAV13 and PaRAV16 in P. abies with a Ka/Ks value of 2.70, the other genes were all stable and were all under purify selection (Additional le 3). Overall, RAV gene family is relatively stable through the process of plant evolution.
GO annotation is one way to predict the function of genes in terms of BP, MF and CC [49]. Based on these results of GO annotations, about a half of RAVs clustered in the nucleus, the main molecular functions of RAVs were in transcription factor activity and DNA binding, and the main biology process was regulation of transcription and DNA-template (Fig. 4, Additional le 5). While, CbRAV4 in C. rubella was also predicted be related to proteolysis, metal ion binding, metalloexopeptidase activity, and aminopeptidase activity, which was the same as the result of conserved domain and exons analysis, and showed the speci city of the gene CbRAV4.

Conclusion
In recent years, with the completion of genome sequencing of various species, there have been many data and support for the study of basic lineages and the evolution of gene families. In this study, we rst identi ed the RAVs and predicted the structure and functions of RAVs from 25 plant species using these existing databases. These results showed that RAV protein did not exist in Chlorophytes, which implied RAV protein was not formed until vascular plants appeared. Secondly, the phylogenetic relationships among the members of the whole RAV family showed that 97 identi ed RAVs were clearly divided into three clades.

Phylogenetic analysis of RAVs
The phylogenetic inference was carried out using the MEGA7 software. Twenty-ve plant species were included in the tree. The Neighbor-Joining (N-J) method was used to calculate genetic distance with the parameters of the Jones-Taylor-Thornton (JTT) model and pairwise deletion option to handle missing data, and 1000 replicates for bootstrap analysis [50]. The N-J method was especially well-suited for datasets comprising lineages with largely varying rates of evolution. It can be used in combination with methods that allow for correction of superimposed substitutions [51].

Investigation of gene duplication events, Ka/Ks values and annotation information
Gene tandem duplication events of RAVs were analyzed following the methods of Gu et al. (2002) [60], the major criteria were that the length of alignable sequence covers > 75% of longer gene, and similarity of aligned regions > 75%. Two genes were regarded as tandem pairs if they were located on the same chromosome and were separated by no more than 10

Availability of data and materials
All data generated or analyzed during this study are included in this article.

Competing interests
The authors declare that they have no competing interests.   Phylogenetic tree of RAVs. Ninety-seven RAVs from Chlorophyta to Angiosperms were distributed in the unroot tree. The RAVs from different stages was shown in different colors. The three main parts were showed by pink, orange and green colors' curve lines, respectively.