Identification of ABC Transporter, HMA Genes and Gene Duplication
Two methods were used for identification of ABC transporter genes in flax. First, the HMA genes were identified based on the eight reference gene sequences (Table S4) of Arabidopsis by BLAST search (BLASTP) method. Similarly, we performed the same Blast search method for ABC using the 129 reference sequences [54] from the Arabidopsis genome (version 10.0, http://www.arabidopsis.org/) against the flax genome at an E value of 1.0E-10.
Then, we performed a Hidden Markov Model (HMM) search against the flax genome to further confirm the presence of ABC genes with default option using HMMER (version 3.2.1) [55]. The different ABC transporter domains includes, ABC transporter (PF00005), ABC-2 transporter (PF01061), ABC transporter transmembrane region (PF00664), cytochrome c polymerization (CYT) (PF01458) or mammalian cell entry (mce) related protein (PF02470). These domains were downloaded from the Pfam (version 32.0) database (http://pfam.xfam.org/) [56]. After the results of both methods were merged, ABC and HMA genes were further screened on the basis of domains composition. The duplicated results between two methods were eliminated. A total of 745 ABC transporter and HMA protein were identified among all species except Arabidopsis thaliana.
The Cd associated genes in flax were identified based on previously reports in Arabidopsis, rice, and maize using phylogenetic relationships. Several ABC and HMA genes are responsive to Cd stress including AtABCC1, AtABCC2, AtABCG32, AtABCG36, AtHMA3 and AtHMA4 in Arabidopsis, OsABCG31, OsABCG36 and OsHMA2 in rice, ZmHMA2 and ZmHMA3 in maize.
The flax genome sequences (version 2.0) were obtained from NCBI [45] and protein sequences were downloaded from Phytozome (version 12.1) [57]. Genomic sequences of other species, Populus trichocarpa (version 3.1), Vitis vinifera (version Genoscope12X), and Brachypodium distachyon (version 1.2), were obtained from Phytozome (https://phytozome.jgi.doe.gov/pz/portal.html) [57]. The obtained protein sequences of ABC transporter and HMA genes were further verified for ABC/HMA domain compositions in the NCBI-Conserved Domain database (https://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi) [58] and SMART (http://smart.embl-heidelberg.de/) [59]. The protein sequences with errors, short length (<100 aa), and without ABC and HMA domains were removed before further analysis.
Phylogenetic Characterization of ABC Transporter and HMA Genes, and Synonymous (Ks) and Non-synonymous (Ka) Substitution Rates for Duplicated Genes
Multiple sequence alignment for ABC transporter or HMA protein sequences for five species including Linum usitatissimum, Arabidopsis thaliana, Populus trichocarpa, Vitis vinifera, and Brachypodium distachyon were performed using MUSCLE (version 3.8.1551). Phylogenetic trees were then constructed using the PhyML software [60] with the maximum likelihood (ML) method and the Jones, Taylor, and Thornton amino acid substitution model (JTT model).
The different types of gene duplications in the flax genome were identified by using MCScanX [61]. Synonymous (Ks) and non-synonymous substitution (Ka) rates were also calculated for duplicated gene pairs as previously described [47]. Also, a substitution rate of 1.5 × 10−8 substitutions per synonymous site per year [62] was used to estimate divergence time of duplicated genes.
Conserved Motifs, Gene Structure, and Physicochemical Parameters
Multiple Em for Motif Elicitation (version 5.1.0) [63] was used for scanning of conserved motifs in LuABC and HMA proteins. The maximum number of motifs 10 with a minimum width of 50 aa and a maximum of 100 aa was set as parameters. Moreover, TBtools (version 0.66) [64] was used to visualize both the motif composition and gene structure. The ExPASY PROTPARAM tool (http://web.expasy.org/protparam/) was accessed for various physicochemical properties, i.e., molecular weight (MW), isoelectronic points (pI), and GRAVY values for each gene. The subcellular localization of genes was predicted using the web application WOLF PSORT (https://wolfpsort.hgc.jp/).
Chromosomal Location, Gene Synteny Analysis, and Protein-Protein Interaction (PPI) Analysis
The gene synteny between flax and Arabidopsis was analyzed based on gene annotation data of both species and illustrated using shinyCircos [65]. The PPI analysis for all the ABC transporter and HMA proteins was carried out using an online server STRING (version 11.0) (https://string-db.org/) [66] with the following parameters: medium score (0.400), the number of K means clustering (3), and default values for the remaining options. The results of the interaction network were further visualized using Cytoscape (version 3.4.0) [67].
Plant Materials, RNA Sequencing and Read Data Analysis
Flax cultivar CDC Bethune was planted in greenhouse under growth conditions previously described [68]. Tissues of five different organs (root, seed, anther, ovary, and embryo) and five different stages of embryo development (heart, globular, torpedo, mature and cotyledon embryo) were collected for RNA extraction with two biological replicates for each tissue. Total RNA was extracted from each collected sample following the RNAqueous kit protocol (Ambion, Catalog# 1912) and RNAqueous-Micro kit protocol (Ambion, Catalog# 1931). The samples were homogenized in lysis buffer with polypropylene pestles in 1.5 ml Eppendorf tubes on ice. For RNA-seq profile analysis, Illumina mRNA-seq libraries were prepared using the TruSeq RNA kit (ver. 1, rev A) according to the manufacturer’s instructions. An Agilent 2100 Bioanalyzer was used for quantification and quality determinations of sample libraries. For Illumina HiSeq 2000 sequencing, four indexed libraries were pooled per sequencing lane and paired-end sequencing were performed.
The raw reads were initially trimmed by trimmomatic [69] and the trimmed reads were aligned to the genomic sequences of the flax using the kallisto [30, 70]. The reads with fewer than 5 rpm in at least one library were filtered out. Normalization was performed at trimmed mean of M-values (TMM) using an R package edgeR [71]. A general linear model (GLM) was used with glmLRT function to identify differentially expressed genes with false discovery rate (FDR) less than 0.05 [71]. The anther of the nine organs was used a reference for expression analysis. Thus, expression results were presented in the remaining eight organs.
Heat maps were drawn based on normalized Log2 scale read counts using ClustVis [72]. Bidirectional cluster analysis was conducted using maximum distance and complete linkage method. Pearson correlation coefficient (r) based on expression data among the syntenic pairs of flax was calculated using Rstudio (version 3.6.0).
The GO analysis for ABC transporter and HMA genes in flax was conducted using Phytozome database (https://phytozome.jgi.doe.gov/pz/portal.html) with keyword search options against the flax genome.