Identification, phylogenetic analysis, and classification of 364 AP2/ERF TF family memebers in Salix matsudana
By HMM profile search against the Salix matsudana protein database, a total of 364 full-length AP2/ERF family proteins containing at least one AP2/ERF domain were identified as AP2/ERF superfamily members of Salix matsudana (Fig. 1). The name, protein length, molecular weight, and isoelectric point of individual genes are listed in Supplementary Table S1.
The phylogenetic relationships of SmAP2/ERF proteins were inferred by multiple sequence alignment of the AP2 domain, which included approximately 50–60 amino acids. The sequence alignment of all AP2/ERF genes showed some conserved amino acids at specific positions, as previously reported (Fig. S1). For example, the WLG element (58th–60th amino acids; 58–60AA) was highly conserved in in the ERF and RAV families; alternatively, in the AP2 family, the conserved sequences from 58–60AA were converted into YLG elements [24]. In many species, these conserved amino acid profiles contribute to convincing classification of AP2/ERF genes. Basing on multiple sequence alignments of 48 AP2/ERF proteins from Arabidopsis and Populus trichocarpa with known categories and 364 Salix matsudana AP2/ERF proteins, we constructed a phylogenetic tree using the neighbor-joining method to explore the phylogenetic relationships of Salix matsudana AP2/ERF proteins. The phylogenetic tree showed that there were 55 AP2/ERF genes that belong to the AP2 family, with 47 genes that encode proteins with two AP2 domains and eight genes (SmAP2-20, SmAP2-25, SmAP2-29, SmAP2-35, SmAP2-36, SmAP2-40, SmAP2-41 and SmAP2-55) that encode proteins with a single AP2 domain (Fig. 1). Additionally, 301 genes that were predicted to encode proteins with a single AP2 domain were members of the ERF family. The ERF family could be further classified into two subfamilies, ERF and DREB. Of the 301 members, 166 and 135 genes belonged to the ERF and DREB subfamilies, respectively. The ERF family genes from Salix matsudana were distributed in B1–B6 subgroups; the DREB family genes from Salix matsudana were classified into A1–A6 subgroups. The gene number and percentage of each subgroup are listed in Fig. 2 and Table S2. Six putative genes were classified as RAV subgroup genes that encode proteins containing one AP2/ERF domain and one B3 domain (Fig. 1). Two genes were designated as Soloist genes, whose AP2/ERF-like domain sequences had lower homology compared with other AP2/ERF genes (Fig. 1).
The AP2/ERF genes number, classification and percentage of different subgroups from five plant species, including the model plant Arabidopsis, Populus, and two Salix plants, are listed in Table S2. The gene name of AP2/ERF genes from Populus trichocarpa and Salix purpurea are listed in Table S3. As a tetraploid plant, the total number (364) of AP2/ERF genes was much larger in Salix matsudana than in the other four species. The number of AP2/ERF genes in Salix matsudana was 2.5-, 1.8-, 1.9-, and 2.1-fold higher than those in A. thaliana [1], Populus trichocarpa [3], Salix purpurea, and Salix arbutifolia [23], respectively. For DREB and ERF subfamilies, the percentage of all AP2/ERF genes in Salix matsudana was similar to those of A. thaliana, Populus trichocarpa, and Salix purpurea, and the percentages of DREB and ERF subfamilies were 38% and 45%, respectively. In Salix arbutifolia, the percentage of DREB (33%) was lower than that of the other four species, whereas the percentage of ERF (50.8%) was higher. In Salix matsudana, the percentage of the AP2 subgroup was highest among all five species (15%) and the numbers of most of gene sub-classifications were doubled, including the Soloist gene; there were two Soloist genes in the Salix matsudana genome. However, no duplications were observed in the RAV subgroup, and only six RAV genes were found in the Salix matsudana genome.
Gene Structure And Conserved Motif Analysis
To understand the structural diversity of SmAP2/ERF genes in different clades, the intron and exon structures of SmAP2/ERF genes were revealed by inputting Gff3 files into TBtools (Fig. 3b). A total of 55 genes of the AP2 subfamily had more exons than ERF and other subfamilies. Apart from three exons in the SmAP2-29 and four exons in the SmAP2-20, other members of the AP2 subfamily contained more than seven exons. The intron number was less than three in many members of the ERF and RAV subfamilies. In total, 215 gene members did not have introns (Fig. 3b). The exon/intron structures of genes that were classified in the same clade were similar. Many gene pairs were found in the phylogenetic tree that potentially originated from allotetraploid or autotetraploid evolution of Salix matsudana. Many gene pairs (approximately 70%) maintained the same or similar gene structure during Salix matsudana formation, which indicated that the SmAP2/ERF genes were conserved at the DNA level during polyploidization.
TF proteins always contain many conserved motifs to activate gene expression. A total of 10 conserved motifs were detected in 364 SmAP2/ERF proteins using the online MEME software, and a block diagram was constructed to characterize SmAP2/ERF protein structure (Fig. 2c). Motif-4, Motif-1, Motif-2, Motif-3, Motif-5, Motif-7, and Motif-9 were found in the AP2 domain regions. The Motif-5 region covered the region of Motif-4 and Motif-1, whereas Motif-7 included Motif2 and Motif3. Motif-9 is a specific motif that is only found in the second AP2 domain of the AP2 subgroup. Motif-1, Motif-2, Motif-3, and Motif-4 were detected in 90% percent of the ERF subfamily proteins. Thirty proteins of the ERF subfamily lacked one or two motifs of Motif-1–4. An extreme example is SmERF B2-13, which only had Motif-2. Motif-6, Motif-8, and Motif-10 are motifs located outside of the AP2 domain. Motif-6 was primarily found in the AP2 subfamily with only one exception, SmERF B4-4, which was in the ERF-B4 clade. In the AP2 subfamily, members with two AP2 domains had Motif-6 located between the two AP2 domains. Motif-8 was found in 69 proteins of the AP2/ERF family, and its location was adjacent to the carboxyl terminal of Motif-3. Many proteins from the DREB-A1, DREB-A4, DREB-A5 clades had Motif-8. Motif-10 was found in 62 proteins of the AP2/ERF family, with 61 proteins from the ERF subfamily and only one from the AP2 subfamily. Motif-10 was mostly distributed on the proteins from the ERF-B3, DREB-A2, and DREB-A4 clades. The functions of these three motifs need to be elucidated by further experimental analysis.
Besides protein SmAP2-20, the entire AP2 domain was distributed in the amino terminal or in the middle position of the proteins. In the two Soloist genes, only one motif, Motif-2, was found.
The conserved motif composition and gene structure of the same subfamily were similar, thus verifying the reliability of the phylogenetic tree classification.
Chromosome distribution and duplication of SmAP2/ERF superfamily genes
The chromosome location of the identified SmAP2/ERF genes was constructed using TBtools. In total, 310 genes from the AP2/ERF superfamily were unevenly distributed on 38 chromosomes (Fig. 4); 54 other genes located on scaffolds were not illustrated in Fig. 4. The chromosome with the largest number of AP2/ERF genes was Chr21, which had 22 genes. Only one AP2/ERF gene each was located on Chr14 and Chr36. On the four chromosomes Chr1, Chr3, Chr22, and Chr27, only two AP2/ERF genes were found. In 38 chromosomes, most of the AP2/ERF genes from different subgroups were arbitrarily distributed, such as five of six RAV genes located on Chr15, Chr37, Chr34, Chr31, and Chr11. Moreover, the two Soloist genes distributed on Chr29 and Chr5. However, SmERF B3 subgroup members clustered together with three genes as a cluster unit. We found 12 clusters in 12 chromosomes (Fig. 4), which accounted for 62% of the whole SmERF B3 subgroup.
In addition, we also analyzed the tandem duplication events (TDs) of the AP2/ERF genes located within in the 200-kb range of chromosomal regions of the Salix matsudana genome. Eleven TD regions, which included 23 SmAP2/ERF genes, clustered into 11 linkage groups (LGs) of the Salix matsudana genome (Fig. 4). LGs that contained cluster repeat genes were Chr7, Chr8, Chr10, Chr13, Chr17, Chr19, Chr21, Chr24, Chr28, Chr31, and Chr33. All genes of the repeat clusters were localized within a genomic segment of less than 20 Kb; for example, SmDREB A4-20 and SmDREB A4-19 were localized on a 3.6-Kb segment of Chr24. On Chr8, three genes clusters (SmERF B3-6, SmERF B3-7, and SmERF B3-8) located on a less than 12-Kb segment. In 11 tandem repeats, six came from the SmERF B3 subgroup, two came from the SmDREB A4 subgroup, and one each came from the SmDREB A1 and AP2 subgroups. SmERF B3-40 and SmERF B3-39 tandem repeat pairs had 97% protein sequence identity.
In addition to tandem duplications, many segmental duplication events (SDs) were found in Salix matsudana by MCScanX (Fig. 5, Table S4). We found a total of 28,348 collinear gene pairs (not shown) in the Salix matsudana genome, from which 298 AP2/ERF collinear gene pairs were identified. Then, Ka, Ks, and Ka/Ks ratios of these 298 AP2/ERF collinear gene pairs were calculated to estimate the divergence time (T value) and selection pressure among duplicated SmAP2/ERF gene pairs (Table S5). All of the Ka/Ks values were below 1, which indicated that these genes might have experienced strong purifying selective pressure during evolution. Among the 298 AP2/ERF collinear gene pairs, 198 were located on duplicated segments on 38 chromosomes in Salix matsudana (Fig. 5 and Table S4). The collinear gene pairs in the Salix matsudana genome were visualized by Circos, and the gene pairs were linked by lines (grey lines indicated all gene pairs, red lines indicated AP2/ERF collinear gene pairs.
Synteny analysis of AP2/ERF genes between Salix matsudana and two related Salicaceae species, Populus trichocarpa and Salix purpurea
To further infer the phylogenetic mechanisms of the SmAP2/ERF family, we constructed two comparative syntenic maps of Salix matsudana with two related species, Populus trichocarpa and Salix purpurea (Fig. 6).Collinear AP2/ERF genes pairs between Salix matsudana and other two species are listed in Supplementary Table S6. A total of 263 SmAP2/ERF genes showed syntenic relationships with 183 genes from Populus trichocarpa, and 248 SmAP2/ERF genes showed syntenic relationships with 144 genes from Salix purpurea. The number of orthologous pairs between Salix matsudana and Populus trichocarpa, and Salix matsudana and Salix purpurea were 423 and 292, respectively (Supplementary Table S6). Some PtAP2/ERF and SpAP2/ERF genes were found to be associated with at least four syntenic gene pairs. Interestingly, the number of collinear gene pairs identified between Salix matsudana and Salix purpurea were less than that between Salix matsudana and Populus trichocarpa.
In the comparative syntenic map between Salix matsudana and Populus trichocarpa, syntenic links were found between all 19 Populus trichocarpa chromosomes and all 38 Salix matsudana chromosomes. Alternatively, in the comparative syntenic map between Salix matsudana and Salix purpurea, there were no syntenic links between Chr1, Chr12, and Chr36 from Salix matsudana, and Chr15Z and Chr15W from Salix purpurea.
Specific Expression Of AP2/ERF Superfamily Genes Under Salt Stress
To investigate the physiological roles of SmAP2/ERF genes in salt stress tolerance, we identified the expression patterns of SmAP2, SmERF, and SmDREB subgroup genes from the RNA sequencing data. By inputting the FPKM values (Fragments Per Kilobase of transcript per Million fragments mapped) of these genes in TBtools, three heatmaps were constructed to demonstrate the expression pattern change under salt stress (Fig. 7).
The expression patterns of 286 genes are illustrated in Fig. 7, and included 48 AP2, 108 DREB, and 130 ERF subgroup genes. In the AP2 subgroup, the FPKM values of 31 genes were < 3, which indicated lower expression in the root and no response to salt stress. Five genes had differential expression patterns. The expression levels of four genes (SmAP2-38, SmAP2-4, SmAP2-3, and SmAP2-33) were induced by salt stress, whereas the expression of gene SmAP2-15 decreased after salt stress. In the DREB subgroup, 108 genes were present in the heatmap, and expression levels of 10 genes, such as SmDREB A1-10, SmDREB A1-9, and SmDREB A1-7, were induced by salt stress and remained higher. In the ERF subgroup heatmap, 130 genes were included. The expression levels of 13 genes were upregulated by salt stress, but only the expression of SmERF B4-1 was higher. Three genes were downregulated by salt stress, including SmERF B3-52. In many paralog gene pairs, we found one gene with higher expression, whereas the other gene had lower expression, such as SmDREB-A9/ SmDREB-A10, SmAP2-33/SmAP2-39 and SmERF-9/ SmERF-10 gene pairs. Thirteen Genes with upregulated expression patterns was verified by qRT-PCR (Real-time Quantitative PCR) (Fig. 8).