ABC Transporter and HMA Genes in Flax and Their Physicochemical Properties
A total of 198 ABC transporter and 12 HMA genes were identified in the flax genome reference sequence of CDC Bethune [31]. The ABC transporter and the HMA genes were classified into eight and four subfamilies, respectively. These genes were denoted as LuABCA1-LuABCA8, LuABCB1-LuABCB48, LuABCC1-LuABCC19, LuABCD1-LuABCD5, LuABCE1-LuABCE2, LuABCF1-LuABCF9, LuABCG1-LuABCG85, LuABCI1-LuABCI22, and HMA1-HMA12. The basic information on these genes based on their subfamilies, including the protein identifier, coding sequence (CDS) length (bp), and protein properties such as the number of amino acid (aa) residues, molecular weight (kDa), isoelectric point (pIs), and grand average of hydropathicity (GRAVY), is listed in Table S1. The CDS length ranged from 663 to 7,313 bp. The proteins had from 220 to 2,438 aa and a molecular weight of 25.54 to 273.69 kDa. The pIs ranged from 4.93 to 11.67 while the GRAVY values varied from -0.606 to 0.619. Most genes (132/210) had positive GRAVY values, indicating hydrophobic properties.
LuABC and LuHMA proteins were predicted to localize to many subcellular compartments such as the plasma membrane, vacuole, endoplasmic reticulum, nucleus, cytoplasm, chloroplast, Golgi apparatus and mitochondrion. Further, several studies have also confirmed almost an identical localization of ABC transporter and HMA genes [7, 44].
Annotation and Phylogenetic Analysis of ABC Transporter and HMA Genes in Plant Species
The gene annotation analysis of LuABC and LuHMA provides putative function(s) for each gene products (Table S2). In brief, based on the predicted function(s), most of the LuABC and LuHMA genes confirmed the presence of ABC transporter, ATP binding, and heavy metal ATPase domain functions. In addition, the LuABC subfamily were also involved in other important functions. For example, the subfamily LuABCC had multidrug resistance functions; the LuABCE encode to RNAse l inhibitor protein, whereas LuABCG were involved in regulating pleiotropic drug resistance. Other functions are also assumed to be assisted by LuABCs (Table S2). Thus, the annotation clearly shows the functional diversity of ABC transporters and HMAs in flax.
Unrooted phylogenetic trees were constructed using the protein sequences for each of the eight ABC transporter and four HMA gene subfamilies of Linum usitatissimum, Arabidopsis thaliana, Populus trichocarpa, Vitis vinifera, and Brachypodium distachyon (Figure 1 and Table S3). The phylogenetic relationships within LuABC, AtABC, PtABC, VvABC, and BdABC were highly conserved. Based on the phylogenetic relationships of flax and other species, the ABC transporter proteins were divided into eight subfamilies: ABCA-ABCG and ABCI (Figure 1a-h). Subfamilies ABCB and ABCG were the largest across all species, while ABCD and ABCE were the smallest based on the number of gene members per subfamily. With 81 members, ABCG was the largest subfamily and the dominant ABC transporter gene subfamily in flax.
The HMA genes of different species were, like flax, divided into four subfamilies based on their phylogenetic relationships (Figure 1i and Table S4). Subfamily IV was the largest one with 20 members, of which five belonged to flax. Subfamily I was the smallest with only six members across all the species studied. In short, the ABCG and HMA IV subfamilies had the highest number of genes in flax compared to other species except Populus trichocarpa in HMA. The distribution patterns of both ABC transporter and HMA genes and their subfamilies in five species are given in Table 1.
Based on previous reports on Arabidopsis, rice, and maize, several ABC transporter and HMA genes are associated with Cd tolerance, including AtABCC1, AtABCC2,, AtABCG36, AtHMA3 and AtHMA4 in Arabidopsis [29, 45-47] , OsABCG31, OsABCG36 and OsHMA2 in rice [20, 48], as well as ZmHMA2 and ZmHMA3 in maize [34]. We identified seven ABC transporter genes (LuABCC9-LuABCC10, LuABCG58-LuABCG59, and LuABCG71-LuABCG73), and two HMA genes (LuHMA3-LuHMA4) which were orthologous with the above Cd-related genes (Figure 1c, g and h). These genes are the most likely candidates for Cd accumulation in flax.
Chromosomal Localization, Syntenic Relationships, and Duplication of ABC Transporter and HMA Genes
A total of 196 LuABC and 11 LuHMA genes were located on the 15 chromosomes of flax and three of these genes (LusABCG10, LusABCG15, and LuHMA5) were found on scaffolds that have not been positioned onto any of the chromosomes (Table S1). LuABC gene subfamilies and genes per se are distributed unevenly across flax chromosomes. The largest number of ABC transporter and HMA genes was on Chr3 (23), followed by Chr11 (18), and Chr1 (17). The nine predicted Cd-accumulation candidate genes were scattered on multiple flax chromosomes: LuABCG71 on Chr1, LuABCG73 on Chr3, LuABCG59 on Chr6, LuABCC9 and LuHMA3 on Chr7, LuABCC10, LuABCG58 and LuHMA4 on Chr12, and LuABCG72 on Chr14 (Figure 2). The gene collinearity analysis revealed high conservation among subfamilies of both ABC transporters and HMA with Arabidopsis orthologues (Figure 2).
Four different types of gene duplications were observed from the identified ABC transporter and HMA genes, including 162 WGD (segmental), 29 dispersed, 14 tandem, and 4 proximal duplications. Only one ABC transporter gene (LuABCI16) was a singleton. Eight of the nine flax Cd candidate genes were of the segmental type and one (LuHMA4) was a tandem duplication. Thus, segmental duplications played a dominant role in the expansion of the ABC transporter and HMA gene families in flax and confirmed our hypothesis.
Synonymous and Non-synonymous Substitution Rates, Gene Structure Analysis and Motif Composition
The synonymous (Ks) and non-synonymous (Ka) values were estimated based on the duplicated pairs of genes across the flax genome. The Ka/Ks ratios of five pairs (LuABCG71/LuABCG72, LuABCG61/LuABCG64, LuABCG80/LuABCG69, LuABCG4/LuABCG3, and LuHMA6/LuHMA8) exceeded one, suggesting positive selection. The remaining gene pairs underwent purifying selection with a Ka/Ks ratio of less than one. The estimated duplication time of LuABC and LuHMA gene pairs ranged from 1.53 to 28.27 million years ago (MYA), with an average of 8.59 MYA (Table S5).
Conserved motifs and gene structure organization of LuABCs and LuHMAs were analyzed to better understand the global conservation and diversification of these two gene families. A total of ten distinct conserved motifs were identified. Several motifs were highly conserved; for instance, motifs 2 and 5 commonly occurred among subfamilies LuABCA-LuABCI members as well as in HMA proteins (Figure 3a). Of the ten motifs, motif 6 was prevalent in both ABC transporter and HMA proteins except in ABCB, ABCD, and ABCF subfamilies. Of the nine flax Cd candidate genes, three (LuABCG71, LuABCG72, and LuABCG73) consistently exhibited 9-10 of these motifs and similar gene lengths. However, distinct motif compositions existed among most of the subfamilies.
The exon structure analysis based on the coding sequences of LuABCs and LuHMAs showed diversification between and within subfamilies. Specifically, 19 ABC transporter genes, including LuABCC7, LuABCC9-11, LuABCD1, LuABCF4, LuABCG59-62, LuABCG70-71, LuABCG75, LuABCG78-81, LuABCG88, and LuABCG85, contain a variable number of exons ranging from 20-31 (Figure 3b). The number of exons was highly conserved within ABC transporter gene subfamilies. The large number of exons observed in LuABCA6 (38) and LuABCA5 (40) was unique because the remaining ABC transporters and HMA had 1-19 exon(s).
Gene Ontology (GO) and Expression Profiling
To predict the regulatory functions of the LuABC and LuHMA genes in flax, we performed GO analyses. The GO terms were categorized into three subgroups: molecular function (MF), cellular component (CC), and biological process (BP), as described in Table S6. The LuABC and LuHMA proteins were enriched for MF such as ATP binding (GO:0005524), ATPase activity (GO:0016887), transporter activity (GO:0005215), catalytic activity (GO:0003824), kinase activity (GO:0016301), and metal ion binding (GO:0046872). CC GO terms associated with LuABC included the integral component of membrane (GO:0016021), intracellular (GO:0005622), membrane (GO:0016020), and integral component membrane (GO:0016021). BP terms comprised transport (GO:0006810), transmembrane transport (GO:0055085), signal transduction (GO:0007165), GTPase-mediated signal transduction (GO:0007264), and cation transport (GO:0006812). Taken together, GO terms indicated roles in central processes involving ATP binding and metal ion transport but also a wide range of other processes and activities in flax.
The expression patterns of the LuABC and LuHMA genes in the root, seed, ovary, and five different stages of embryo development (heart, globular, torpedo, mature, and cotyledon) from RNA-Seq data are presented in a heatmap (Figure 4). In general, the highest number of genes up-regulated for both LuABC and LuHMA genes was in seed (84/152), followed by root (75/152), and ovary (70/152) tissues. Among the five different embryo development stages, both LuABC and LuHMA genes showed a relatively weak expression considering that only 47, 41, 39, 37 and 36 of the 152 expressed genes were up-regulated in mature, cotyledon, torpedo, globular, and heart stages of embryo development, respectively. The remaining LuABC and LuHMA genes were not expressed or displayed a low-level expression in these different organs (Figures 4a and b). The high expression level of LuABC and LuHMA genes in root and seed suggests that these genes might play a ubiquitous (housekeeping) transport role in these tissues in flax.
Eight of the nine flax Cd candidate genes were highly expressed in different tissues, including LuABCG71, LuABCG72, and LuABCG73 in root and seed (Figure 4c). Additionally, multidimensional scaling (MDS) also revealed differential gene expression between organs and high consistency of expression data between biological replicates (Figure S1).
Functional Evolution of Duplicated Genes and Interaction Network
Expression profiling can be mined to predict the functional fate of genes, and here, investigation of the mode and tempo of duplicated genes was performed to assess their functional evolution. We utilized and took advantage of RNA-Seq data to calculate the Pearson correlation coefficient (r) of the syntenic pairs across the eight different tissues used in our research. The significance of each expression level was tested based on the r values. Positive expression correlation was inferred when r values exceeded 0.156 (at a significance level of α=0.05). Using this cut-off value, 42 of the 51 pairs had positive expression correlations, likely indicating functional conservation or sub-functionalization after duplication. The remaining nine pairs had correlation values below 0.276 or a negative correlation, suggesting putative neo-functionalization of at least one of the syntenic pairs (Table S7).
The protein-to-protein interaction networks of LuABC and LuHMA were examined and a dense network formed among the protein subfamilies (Figures S2a and b). However, a few of the ABC proteins did not interact: LuABCG6, LuABCG66, LuABCG68, LuABCG80, and LuABCG83 (Figure S2a). When the different subfamilies were compared, the LuABCG genes were preferentially retained in flax throughout evolution.