Analysis of AAP proteins
AAP proteins belonged to the AAAP family and some proteins functioned were absorbing amino acid from roots and leaves and transported to other organs through the phloem. These findings based only on vascular plants and Tegeder and Ward’s research showed that this protein family was predicted in Bryophyta . In the present study, we expanded the plant species investigated in predicting the function of AAP proteins. We blasted the target proteins in Chlorophytes and these results were not reported. We then selected some representative plants in various evolutionary stages to explain the evolution of AAP proteins.
The FPKM protein families with biased distribution in Coccomyxa from Blanc et al.  showed that 9 chlorophytes which they studied all contained Aa_trans domain. However, in the present study, AAPs just existed in C. subellipsoidea belonging to the class of Trebouxiophyceae. From this discovery, we inferred the AAPs might originate from Chlorophyta, but we could not find out some other evidences. On the other hand, the studied of Tegeder and Ward  showed AAP might only tract back to Bryophyta and Bowman et al. finally indicated that the GH3 protein from M. polymorpha which could belong to group I from Zhang’s research , but actually it proved that the protein was not related functions . Thus, these hypothesis just depened on the protein prediction and structure analysis. Despite the fact that Chlorophyta are single-celled aquatic eukaryotes with no vascular structure, Blanc presented several protein families which were overrepresented in C. subellipsoidae, including those involved in lipid metabolism, transporters, cellulose synthases, and short alcohol dehydrogenases . Work by Tegeder and Ward, as well as the present study, both identified AAP proteins in Bryophyta. As we used the database from the Phytozome V12 website, we were able to predict the function of more proteins than Tegeder and Ward (2012). For example, in P. patens and S. moellendorffii we found 12 and 15 proteins, respectively, which is much greater than the number of proteins identified by Tegeder and Ward .
We predicted 154 AAP proteins and analyzed Aa_trans and transmembrane domain in each protein. Each species contained an Aa_trans domain, with the only difference being that some proteins exhibited a 2 or 3 domain segment sequence. Not only early plants but also other plant species had a phenomenon which was the location of transmembrane domains might locate in Aa_trans domain. This condition was more common in A. trichopoda, S. fallax and C. subellipsoidea. We also labeled these proteins as ‘Beyond’ in Additional file 10. Additionally, we used the MEME website to acquire the distribution of motifs in each protein. Non-vascular and vascular plants all contained these 10 motifs in the same position and order (Additional file 2, 11 and 12). This structural information validated the potential existence of these predicted proteins.
Exons and introns constituted a genetic sequence and exons which were part of transcript sequences played an important role in gene function. According to the number of exons contained in each plant’s AAPs it could be inferred that some introns may have been lost from Chlorophyta in subsequent evolutionary stages. Introns might be lost or gained over evolutionary time, as shown by many comparative studies of orthologous genes . Due to the AAP genes in Chlorophyta all displaying the same transcript sequences, the structure of proteins did not vary greatly. Thus, we suggest that the differences in the number of introns/exons between different species is due to a large number of intron losses occurring during plant evolution. This phenomenon has been confirmed by Roy and Penny .
Evolution of AAP proteins
The evolution and relationship between AAP proteins is clearly depicted in the phylogenetic tree. These AAPs can be divided into 2 main groups. Interestingly, AAP proteins in majority of non-vascular plants (Chlorophyta and Bryophyta) and sister group of flowering plants (A. trichopoda) made up group II. Only A. trichopoda existed in group II which because six exogenous genomes constructed A. trichopoda mitochondrial genome, one from moss, three from green algae, and two from other flowering plants . Interestingly, we could not find out any AAP proteins belonged to group II in angiosperms. Group II could be divided into closely related 2 clusters. The phylogenetic tree also suggests that chlorophytes could be the origin of this protein. Due to the fact that the group of proteins all belonged to non-seed plants, it is likely that the function of this group is unrelated to amino acid transport in seeds. This suggests that the function of this protein group could disappear in evolution and the reason for this situation needed to be verified before the function of these genes could be further explained. On the other hand, the duplication events of these plant genes occurred mostly in this group which could mean some functionally redundant proteins were also predicted.
In group I, P. patens and S. moellendorffii AAP proteins were identical to those identified in Tegeder and Ward . Group I contained proteins of each evolutionary plant stage, and this main group was divided into 5 clades (Fig 3). Clade 1 contained non-seed plants and Gymnospermae, and separated into 2 clusters based on the bootstrap values. The other 4 clades comprised seed plants, and Gymnospermae were located in clade 3, 4 and 5. This revealed that the AAP proteins’ functional differentiation might occur in Gymnospermae and the distribution of A. thaliana AAP proteins in each clade also supports this supposition. This group might contain the primary proteins which are associated with amino acid transport. The phylogenetic tree of group I also indicated that clade 2 was closely related to clade 3 and 4 (Fig 6). However, some of our clade proteins were divided into different groups than Tegeder and Ward’s phylogenetic tree. These differences might be due to various factors, including the use of a different website to download the protein sequences, adding Gymnospermae and A. trichopoda AAP proteins into the analysis, and using a different website/program to analyze phylogenetic relationships. In our tree, we could infer that the functional AAP proteins originated from Chlorophyta.
The phylogenetic tree indicated that bryophytes and vascular plants might had a common ancestor that was inherited from C. subellipsoidea AAP protein in group I (Additional file 1, Fig3). All non-vascular plants and mosses were clustered together, and the familial division started from P. abies. In addition, we found one duplication event in both S. fallax and S. moellendorffii. The evolutionary history of gene duplication events in mosses and lycophytes were independent from those in seed plants. It was not until A. trichopoda that duplicated information appeared and was conserved in angiosperms. Two additional duplication events were inferred before or early on in the evolution of flowering plants, since they were already present in the genome of A. trichopoda, which is considered a basal flowering plant . Angiosperms proteins were lost from our research in clade 1 (Fig 6), and none Angiosperms were matches which we searched these proteins via NCBI blast. Conversely, this clade was not closely related to the other 4 clades and the specialization of P. abies AAPs might lead to divisions. Based on these phylogenetic inferences, we concluded that AAP group I genes have a complex evolutionary history with several specific duplication and loss events. The duplication of genes increased with plant evolution as the AAP genes went from one copy in Chlorophyta to dozens in eudicots. With the development of vascular plants, AAP members underwent a drastic increase (Fig 4).
Gene duplication events, Ka/Ks values, and GO annotation information
Gene duplication is a common phenomenon in all life forms and provides resources for novel gene functions . The most obvious contribution of gene duplication to evolution is the provision of new genetic material for mutation, leading to specialized or new gene functions, and contributed to species divergence and origins of species-specific features . Our analysis of the duplication events showed the AAP family gene duplications were present in bryophytes. Following the evolution of plants, duplication events appeared in each evolutionary stage except P. abies (belonging to Gymnospermae). We blasted some other gymnosperms and acquired no results through the NCBI database. It is possible that there were few sequences for gymnosperm species, and duplication events might be analyzed in future research. Analysis of duplication events in group I revealed that the evolution of AAPs was also based on gene replication. With the evolution of plants, duplication of AAPs gradually increased, providing evidence for the increasingly important role of this family in plant evolution. There was one duplication event in non-vascular plants and following the development of vascular plants, a drastic increase of duplication events appeared, which confirmed the important role of AAP as a transport-related protein.
Through calculating the Ka and Ks of duplication gene pairs in S. fallax, the Ka/Ks value of 3 gene pairs were found to be close to 1, meaning that these genes were not suffering natural selection pressure. The Ka/Ks values for the other duplicated genes were all consistent with purifying selection which were less than 1. And it was because a mutation that changes a protein is much less likely to be different between two species than one which is silent; that is, most of the time selection eliminates deleterious mutations, keeping the protein as it is . In general, AAP duplications were not change protein within a species, as suggested by Arcadi and Barton . The collinearity gene pairs also showed no one was from group II and the Ka/Ks value also indicated the evolution was stable (Additional file 3). Group I and group II had not significant evolutionary relationship.
The Ka/Ks ratio values of each species showed that most genes were stable and that they were all under purify selection or neutral evolution (Additional file 6). Even though some species exhibited distinct Ka/Ks values, the majority did not, which may have been affected by variable sequence alignment. In order to eliminate these distinctions, we separately compared the CDS sequences to calculate their Ka/Ks ratios. However, this produced very similar results to the original analysis. In general, the AAPs were a relatively stable gene family through the process of plant evolution.
Functional annotation of sequences is a key requirement for the successful generation of functional genomes in biological research. GO annotation is one of the ways to predict the function of genes in terms of cellular components, molecular function and biological processes . In our study many plant species were not model organisms and therefore some GO information could not be acquire from website databases. Blast2GO software conveniently assisted with this problem. Based on the results, many proteins clustered in the plasma membrane and the AAP proteins main molecular function was in transmembrane transporter activity. These validated AAPs were integral membrane proteins involved in the transport of amino acids into the cell. Interestingly, OsAAP13, ZmAAAP09, and ZmAAAP69 responded to stress, and only 2 proteins participated in transmembrane transport. The protein structure and phylogenetic tree confirm that these proteins belonged to the AAP family (Fig 5).