Characterization and subcellular localization of MeAMY and MeBAM genes in cassava
The whole-genome was scanned to identify protein-coding genes containing the specific domain by using both BLASTP and HMMER search. In total, six MeAMY genes and ten MeBAM genes were identified and renamed. Their characteristics and subcellular localization information were all listed in Table 1. Homology analysis was conducted on 13 transcripts of MeAMY and 20 transcripts of MeBAM (Fig. 1). The size of MeAMY genes ranged 1227 ~ 2691 bp, and the similarities among sequences ranged from 0.53 to 1.00 (Fig. 1a). The length of MeAMY protein sequence ranged from 408 to 896 aa, and the similarities between protein sequences ranged 0.44 ~ 1.00, while their theoretical pIs ranged from 4.71 to 8.22 (Table 1). The largest molecular mass was MeAMY1.1(101.705kDa) and the smallest one was MeAMY5 (46.962 kDa). The range of aliphatic index was 27.07 ~ 45.39. Proteins were all hydrophilic with a negative GRAVY (− 0.576 ~ − 0.226). The range of instability index of proteins was 27.07 ~ 45.39. MeAMY1.1 ~ MeAMY3.5 were unstable with a instability index which larger than 40, while other four sequences were showed as stable proteins because their instability index were all less than 40.
The size of MeBAM genes ranged 1227 ~ 2691 bp, and the similarities among sequences were 0.41 ~ 1.00 (Fig. 1b). The length of MeBAM protein sequence ranged from 429 to 701 aa, and their similarities among protein sequences were 0.27 ~ 1.00, while their theoretical pIs ranged from 5.3 to 8.92. The largest molecular mass was MeBAM5.1 ~ MeBAM5.5 (79.132 kDa) and the smallest one was MeBAM4.3 (48.937 kDa). The range of aliphatic index was 69.6 ~ 84.41. Proteins were all hydrophilic with a negative GRAVY (− 0.485 ~ − 0.249). The range of instability index of proteins was 31.44 ~ 52.03. MeBAM1, MeBAM2, MeBAM5.1 ~ MeBAM5.6, MeBAM8, MeBAM10.2, and MeBAM10.3 were unstable with a instability index which larger than 40, while other eight sequences were showed as stable proteins.
The subcellular localization of 13 MeAMY and 20 MeBAM family genes was predicted using CELLO v.2.5 and the results were listed in Table 1. MeAMY1 and MeAMY2 located in the cytoplasm, while other MeAMY genes located in the Extracellular. The subcellular localization results for MeBAM was divided into five positions. Among them, MeBAM1 and MeBAM2 were located in chloroplast, MeBAM3 and MeBAM9 located in mitochondria, MeBAM4 ~ MeBAM7 in the cytoplasm, MeBAM8 in the plasma membrane, and MeBAM10 in the nucleus.
Table 1
Characteristics and subcellular localization of the MeAMY and MeBAM proteins
Gene name | protein Accession No. | Length of AA | PI | MW(Da) | GRAVY | Aliphatic index | Instability index | Subcellular localization |
MeAMY1.1 | XP_021603448.1 | 896 | 6.3 | 101705.99 | -0.502 | 74.08 | 41.72 | Cytoplasmic |
MeAMY1.2 | XP_021603456.1 | 812 | 6.32 | 91803.7 | -0.529 | 74.45 | 44.78 | Cytoplasmic |
MeAMY2.1 | XP_021611463.1 | 896 | 5.67 | 101574.29 | -0.523 | 75.29 | 41.35 | Cytoplasmic |
MeAMY2.2 | XP_021611464.1 | 812 | 5.63 | 91815.24 | -0.538 | 76.49 | 45.39 | Cytoplasmic |
MeAMY3.1 | XP_021612774.1 | 408 | 8.22 | 46560.08 | -0.576 | 68.85 | 42.85 | Extracellular |
MeAMY3.2 | XP_021612775.1 | 408 | 8.22 | 46560.08 | -0.576 | 68.85 | 42.85 | Extracellular |
MeAMY3.3 | XP_021612776.1 | 408 | 8.22 | 46560.08 | -0.576 | 68.85 | 42.85 | Extracellular |
MeAMY3.4 | XP_021612777.1 | 408 | 8.22 | 46560.08 | -0.576 | 68.85 | 42.85 | Extracellular |
MeAMY3.5 | XP_021612778.1 | 408 | 8.22 | 46560.08 | -0.576 | 68.85 | 42.85 | Extracellular |
MeAMY4 | XP_021625997.1 | 429 | 5.1 | 48230.1 | -0.311 | 84.13 | 31.49 | Extracellular |
MeAMY5 | XP_021631425.1 | 424 | 4.71 | 46962.8 | -0.226 | 83.99 | 29.81 | Extracellular |
MeAMY6.1 | XP_021601526.1 | 762 | 4.89 | 84901.46 | -0.358 | 76.19 | 31.03 | Extracellular |
MeAMY6.2 | XP_021601463.1 | 430 | 4.92 | 48130.31 | -0.279 | 79.63 | 27.07 | Extracellular |
MeBAM1 | XP_021605553.1 | 569 | 8.59 | 64184.29 | -0.303 | 73.88 | 42.52 | Chloroplast |
MeBAM2 | XP_021608087.1 | 581 | 5.67 | 64970.59 | -0.388 | 71.2 | 41.1 | Chloroplast |
MeBAM3 | XP_021608216.1 | 535 | 6.04 | 59219.83 | -0.344 | 76.02 | 37.5 | Mitochondrial |
MeBAM4.1 | XP_021611624.1 | 545 | 5.69 | 61669.84 | -0.281 | 75.69 | 37.54 | Cytoplasmic |
MeBAM4.2 | XP_021611626.1 | 517 | 5.59 | 58563.38 | -0.315 | 77.72 | 35.62 | Cytoplasmic |
MeBAM4.3 | XP_021611627.1 | 429 | 5.52 | 48937.51 | -0.346 | 75.94 | 32.57 | Cytoplasmic |
MeBAM5.1 | XP_021613139.1 | 701 | 5.63 | 79132.18 | -0.47 | 76.5 | 40.8 | Cytoplasmic |
MeBAM5.2 | XP_021613140.1 | 701 | 5.63 | 79132.18 | -0.47 | 76.5 | 40.8 | Cytoplasmic |
MeBAM5.3 | XP_021613141.1 | 701 | 5.63 | 79132.18 | -0.47 | 76.5 | 40.8 | Cytoplasmic |
MeBAM5.4 | XP_021613142.1 | 701 | 5.63 | 79132.18 | -0.47 | 76.5 | 40.8 | Cytoplasmic |
MeBAM5.5 | XP_021613143.1 | 701 | 5.63 | 79132.18 | -0.47 | 76.5 | 40.8 | Cytoplasmic |
MeBAM5.6 | XP_021613144.1 | 697 | 5.63 | 78691.58 | -0.485 | 75.97 | 40.7 | Cytoplasmic |
MeBAM6.1 | XP_021630045.1 | 594 | 6.04 | 67689.21 | -0.432 | 84.41 | 37.18 | Cytoplasmic |
MeBAM6.2 | XP_021630046.1 | 521 | 5.3 | 59078.17 | -0.413 | 82.59 | 31.44 | Cytoplasmic |
MeBAM7 | XP_021594996.1 | 582 | 6.05 | 64837.42 | -0.42 | 69.6 | 36.57 | Cytoplasmic |
MeBAM8 | XP_021595073.1 | 522 | 8.92 | 59418.2 | -0.249 | 82.2 | 52.03 | Plasma Membrane |
MeBAM9 | XP_021594434.1 | 546 | 8.7 | 61339.7 | -0.459 | 70.04 | 34.79 | Mitochondrial |
MeBAM10.1 | XP_021594905.1 | 691 | 5.51 | 77653.08 | -0.446 | 71.74 | 40.13 | Nuclear |
MeBAM10.2 | XP_021594906.1 | 689 | 5.44 | 77325.77 | -0.402 | 73.5 | 40.35 | Nuclear |
MeBAM10.3 | XP_021594907.1 | 686 | 5.5 | 77103.67 | -0.394 | 73.67 | 39.9 | Nuclear |
Analysis of multiple sequence alignment, gene structure, conserved motifs, and alternative splicing of MeAMY and MeBAM
Similar conserved areas were identified of MeAMY and MeBAM protein sequences by comparing the domains of the AtAMY1 and AtBAM1 proteins. Two carbohydrate binding sites and three catalytic residues were found in MeAMY gene family (Fig. S1a), and these amino acid sites were perfectly conserved. Additionally, the MeBAM gene family has two catalytic residues: Glu-186 and Glu-380. The amino acid at Glu-186 was highly conserved, while Glu-380 showed poor conservation of amino acids. A mutation at Glu-380 occurred in five different sequences and it was changed to glutamine (Gln), arginine (Arg), and valine (Val) in MeBAM3, in MeBAM8, and in MeBAM10.1 ~ 10.3, respectivelya9(Fig. S1b). The MeBAM gene family contained 20 carbohydrate binding sites, and two of them had highly conserved amino acids, while 18 carbohydrate binding sites had amino acid mutations. The amino acids in the flexible loop and inner loop showed poor conservation, and there were a large number of mutations.
To better understand the structural evolution of MeAMY and MeBAM, the unrooted evolutionary trees were constructed based on the protein sequences for MeAMY and MeBAM, respectively. Generally, MeAMY and MeBAM that clustered in the same group showed similar gene structures (Fig. 2). For MeAMY, phylogenetic analysis classified 13 MeAMY into three clusters, termed I ~ III, and a total of 15 conserved motifs of MeAMY were identified using MEME software (Fig. 2a). Cluster I contain four MeAMY genes which are alternative splicing of one gene, and they shared the same conserved motifs. The CDS and UTR of MeAMY3.3 and MeAMY3.5 were arranged in the same order. And MeAMY3.1, MeAMY3.2, and MeAMY3.4 were also arranged in the same order, but they had a longer UTR than others at the 5' end (Fig. 2b). The four genes in Cluster II shared the same conserved motifs. MeAMY1.1 and MeAMY2.1 belonged to alternate donor site (ADS) and all had a CDS sequence of 252 bp longer than MeAMY1.2 and MeAMY2.2, respectively. Motif 6, 7, 12, 13, and 15 were unique to Cluster II. Cluster III contains MeAMY4, MeAMY5, MeAMY6.1, and MeAM6.2. MeAMY6.1 has the largest conservative motif and it looks like repeat motifs, while motif 3, 9, and 10 were missing in the repeat (Fig. 2a). The CDS of MeAMY6.1 also repeated and a CDS sequence was also found missing in the gene structure diagram (Fig. 2b). In addition, the 5' and 3' ends of MeAMY4 have the longest UTR in cluster III.
MeBAM gene family can be divided into five clusters (Fig. 2c). The distribution of conserved motifs at the 5' end of the BAM genes in cluster I was inconsistent, and their positions and arrangements of CDS and UTR were also different (Fig. 2d). MeBAM7 contains two motif 9. In cluster II, MeBAM6.1 and MeBAM6.2 belonged to alternative splicing of the same gene and can be treated as alternate promoter (AP). At the 5' end of these two genes, MeBAM6.1 had a motif 9 and a longer CDS sequence with a size of 219 bp than MeBAM6.2. The conserved motifs of MeBAM10.1 ~ MeBAM10.3 genes in cluster III were consistent and belonged to alternate terminator (AT). The gene structure of UTR and CDS sequences of their 3' end were different. In cluster IV, MeBAM4.1, MeBAM4.2, and MeBAM4.3 belonged to AP. The conserved motif of MeBAM4.3 at 5' end had no motif 9 and CDS missing than MeBAM4.1 and MeBAM4.2. The conserved motifs of the six genes in cluster V have the same arrangement and distribution, which CDS are all consistent. MeBAM5.6 lacked a 12 bp CDS sequence, while the UTR sequence at the 5' end are different.
Chromosomal location and collinearity analysis of MeAMY and MeBAM gene families
The location information of MeAMY and MeBAM gene families were obtained from the GFF (Generic Feature Forma) file of cassava genome (Fig. 3). The MeAMY and MeBAM genes were found in eight chromosomes and most of them are distributed at the end of these chromosomes. In the MeAMY gene family, only MeAMY2 is located in the ‘+’ strand, with opposite orientation to the other MeAMY genes. MeBAM1, MeBAM4, MeBAM5, MeBAM7, and MeBAM10 are located in the ‘+’ strand, while MeBAM2, MeBAM3, MeBAM6, MeBAM8, and MeBAM9 are located in the ‘−’ strand.
In order to reveal the expansion and evolution mechanism of MeAMY and MeBAM gene families, the potential gene duplication events in cassava genome were analyzed. Combined with the Arabidopsis genome and the rubber tree genome of euphorbiaceae plants, the collinearity of coding genes was analyzed (Fig. 4, Table S1). There were linear relationships between MeAMY1 and MeAMY2, and MeBAM2 and MeBAM7 genes in cassava genome. MeBAM4 and MeBAM5 were tandem repeat genes. There were 8 pairs of genes with collinearity between cassava and Arabidopsis genome, including 3 pairs of AMY genes and 5 pairs of BAM genes. There were 23 pairs of genes, 10 AMY genes and 13 BAM genes, in collinearity between cassava and rubber tree (Hevea brasiliensis) genome.
Calculation of the gene Ka/Ks value is an effective method to evaluate homologous protein sequence variations in species or taxa with unknown evolutionary status. Based on the collinearity analysis results of MeAMY and MeBAM families in cassava, the Ka/Ks values of homologous gene pairs were calculated. The results showed that the Ka/Ks values of the flanking homologous genes of those three pairs of MeAMY and MeBAM gene families listed in Table 2 ranged from 0.181848632 to 0.30120533. The ratios of these duplicates were all < 1, suggesting that their divergence was driven by purifying selection.
Table 2
homologous gene ka/ks values of MeAMY and MeBAM gene families
Seq-1 | Seq-2 | ka | ks | Ka/ks |
MeAMY1.1 | MeAMY2.1 | 0.080411456 | 0.266965581 | 0.30120533 |
MeBAM2 | MeBAM7 | 0.072314402 | 0.39766261 | 0.181848632 |
MeBAM4.3 | MeBAM5.6 | 0.231696907 | 1.021698702 | 0.226776159 |
Phylogenetic analysis of MeAMY and MeBAM gene families
The AMY and BAM protein sequences of other species were downloaded from NCBI database, and nine rice BAM Protein Sequences from Rice Genome Annotation Project (Table S2). A neighbor-joining tree was constructed based on the alignments of proteins of AMY and BAM families from cassava, rice, Arabidopsis and other species. A total of 54 AMY proteins from 12 species were used to construct the phylogenetic tree and could be classified into eight groups, including monocotyledonous plants (Group A ~ C) and dicotyledonous plants (Group D ~ H) (Fig. 5). The clustering results of monocotyledonous plants and dicotyledonous plants in the eight groups were significantly different. It is speculated that there are similar groupings in rubber trees and castors (Ricinus communis), and all the cassava AMYs were classified into Group A, B, and C and closely related with other Euphorbiaceae plants. Group D, G, and H are the AMY protein sequences of rice, among which Group G and H have only one rice sequence, Group F is the AMY protein sequence of wheat, and two AMY protein sequences of Group E are from rice and wheat, respectively.
The BAM protein sequences can be divided into nine groups (Fig. 6), and each group contains the BAM protein sequences of monocots and dicots, indicating that some BAM protein sequences in monocots and dicots have similar origins and evolutionary relationships. The BAM protein sequences of Arabidopsis, rice, and cassava were evenly distributed in all groups, while the BAM protein sequences of rubber trees and castors, and Jatropha (Jatropha curcas) were closely related to MeBAM among the five groups. It was speculated that these proteins might come from the same ancestors. The evolution occurred when euphorbiaceae species evolved into different plant lineages, and then evolved independently in each genus.
Cis-acting elements in the promoter regions of MeAMY and MeBAM genes
In order to understand the role of MeAMY and MeBAM gene families in plant resistance to stress, the cis-acting elements in the promoter regions of MeAMY and MeBAM genes were analyzed (Fig. 7). Remove core promoter elements CAAT-box, TATA-box, A-box, and AT ~ TATA-box, the cis-acting elements in the promoter regions of MeAMY and MeBAM genes were classified according to Yue method [1].
The number of cis-acting elements responding to abiotic stress is the largest and most widely distributed among all the cis-acting elements, such as MYB, TGACG-motif, ERE, MYC, as-1, ARE. In the promoter region of MeAMY gene, the number of MYB elements was the largest (Fig. 8 and Fig. 9). Nine MYB elements were enriched in MeAMY6.1 and seven ERE elements were enriched in both MeAMY1.1 and MeAMY1.2, while six MYC elements were enriched in MeAMY4 (Fig. 8). In the promoter region of MeBAM gene, the number of ABRE elements was up to 14 and 13 in MeBAM2 and MeBAM7, respectively. In addition, seven MYB e and six MYC elements were enriched in MeBAM5.1 ~ MeBAM5.6. ERE elements are enriched in the promoter regions of MeBAM1, MeBAM2, MeBAM4.1 ~ MeBAM4.2 and MeBAM6.1 ~ MeBAM6.2 genes (Fig. 9). On the other hand, the number of WUN-motif and CGTCA-motif are highest among the cis-acting elements induced by biotic, while TGA-box is the lowest. In the promoter region of MeAMY gene, 76.9% of the sequences contain 2 ~ 3 CGTCA-motifs, and TGA-box only exists in MeAMY6.1. In the promoter region of MeBAM gene, 80% of the sequences contain WUN-motifs, of which five WUN-motifs in MeBAM4.3, while there is only one AT-rich sequence in each of MeAMY6.1, MeAMY7, and MeAMY9.
The promoter regions of MeAMY and MeBAM genes also contain a large number of cis-acting elements related to light response, such as Box 4 and G-box (Fig. 7). Box 4 is the most abundant element, containing 50 elements in the promoter region of MeAMY gene, and 61.52% of the sequences have this element, which enriched in MeAMY1.1 ~ MeAMY1.2, MeAMY2.1 ~ MeAMY2.2 and MeAMY5. And 90% of MeBAM genes contain 91 Box 4 elements, which are mainly enriched in MeBAM1, MeBAM2, MeBAM4.1, MeBAM4.2, and MeBAM4.3. In addition, there are 92.3% of MeAMY promoter region sequences have G-Box elements, while 80% of sequences contain G-Box elements, which are mainly enriched in MeBAM2 and MeBAM7. GA-motif was only found in MeAMY4.
Furthermore, abundant O2-site and CAT-box were also found in the promoter region of MeAMY and MeBAM(Fig. 8 and Fig. 9), while few GCN4-motif, AC-I and MSA-like were found. GCN4-motif associated with the expression of storage protein gene were only found in MeAMY6.2. AC-I were found in MeAMY4 and correlates with vascular tissue gene expression. MSA-like is involved in cell cycle regulation and only exists in MeBAM2. The circadian cis-acting element is found in MeBAM9.
Gene ontology classification analysis of MeAMY and MeBAM genes
GO annotation of MeAMY and MeBAM proteins were separated into two categories: biological processes and molecular functions (Fig. 10). All proteins were involved in plant metabolic process. MeAMY proteins were involved in both single-organism and cellular processes, while only MeAMY6 was involved in biological regulation. MeAMY and MeBAM proteins were further classified and then annotated to 111 GO IDs (Table S3). Their molecular functions were primarily enriched in amylase activity, hydrolase activity, and catalytic activity, according to GO enrichment analysis (Fig. 11a). All MeAMY and MeBAM proteins were concentrated in carbohydrate metabolic process, polysaccharide catabolic process, polysaccharide metabolic process, macromolecule catabolic process, and organic substance catabolic process in biological processes (Fig. 11b).
Expression analysis of MeAMY and MeBAM gene families
The expression patterns of MeAMY and MeBAM gene families were analyzed using RNA-seq data from various cassava tissues (Fig. 12). MeAMY2, MeAM3, and MeBAM3 were found high transcription abundance in all tissues, while MeAMY5 was only detected in 100-day-old stems. MeBAM2, MeBAM6, and MeBAM9 had the highest transcription abundance in 100-day-old stems, while MeBAM2 and MeBAM7 both displayed higher transcription abundance in leaves and MeAMY6 was highly expressed in roots. MeAMY1 and MeBAM2 both had high transcription abundance in 180-day-old roots, while MeAMY1 was highly expressed in 300-day-old roots, MeAMY6 was highly expressed in 300 and 340-day-old roots, and MeBAM7 was highly expressed in 140, 180, and 220-day-old roots.
qRT-PCR was used to analyze the expression of MeAMY and MeBAM genes in cassava during the seedling stage without drought stress (CK). All of the genes were detected in all samples (Fig. 13). MeAMY5 and MeBAM6 showed low expression levels in all tissues, while MeBAM3 showed higher expressions. MeBAM2 showed high expression in leaves and stems, while MeBAM7 showed high expression only in stems and MeBAM9 was mainly expressed in leaves.
The expression patterns of MeAMY and MeBAM genes in cassava under different drought stress were also analyzed. Compare to CK, the expression levels of MeAMY1, MeAMY2, MeAMY4, MeBAM1, MeBAM3, MeBAM5, MeBAM8, and MeBAM10 genes in leaves were considerably up-regulated in the moderate drought (MD) treatment, whereas MeAMY3, MeAMY5, and MeBAM9 genes showed down-regulated (Fig. 14a). The expressions of MeAMY1, MeAMY2, and MeBAM3 genes in leaves of the severe drought treatment (HD) were significantly higher than in the MD treatment, while the expression of MeAMY4, MeBAM1, MeBAM5, MeBAM6, MeBAM8, and MeBAM10 genes decreased under HD treatment. Expression levels of MeAMY1, MeAMY2, MeAMY4, MeAMY5, MeAMY6, MeBAM3, and MeBAM8 in stems increased with drought stress level (Fig. 14b). MeAMY5 in MD treatment increased significantly and had the highest expression level among all genes, and then decreased in HD treatment in stem, whereas MeAMY2 and MeBAM3 in HD treatment had the highest expression levels. MeAMY5 showed similar trend in root as in stem (Fig. 14c). The expression of MeAMY1, MeBAM6, and MeBAM7 genes in roots showed a downward trend among all treatments, while the expression of MeAMY2 and MeBAM10 was significantly increased. MeAMY3, MeAMY4, MeAMY5, MeAMY6, MeBAM1, MeBAM2, MeBAM3, MeBAM4, MeBAM5, MeBAM8, and MeBAM9 genes were highly up-regulated in root under the MD treatment, but down-regulated under the HD treatment.
Variation of carbohydrate contents and amylase activity in cassava
The soluble sugar content and amylase activity in cassava leaves under HD treatment was higher than MD and CK, while its starch content was lower than CK and close to MD (Fig. 15a). Carbohydrates and amylase activity in stems varied similar with leaves (Fig. 15b). In roots, the soluble sugar content of HD treatment was significant higher than MD and CK, while its starch content was lower than CK and slightly higher than MD (Fig. 15c). In contrast, the amylase activity of HD treatment was significant lower than MD and CK.