Deep expression scrutiny of juxtaposed wild and cultivated lentil furnishes new insight into aluminium tolerance mechanism

Background: Aluminium (Al) stress hinders crop productivity in acidic soils. Lentil contains rich source of protein and micronutrients and cultivated in different parts of world. To enhance knowledge about Al toxicity tolerance, present study emphasizes on mechanistic analysis of genes associated with Al stress through de novo transcriptomic analysis of tolerant (L-4602), wild (ILWL-15) and sensitive (BM-4) genotypes. Result: Illumina HiSeq 2500 platform evaluated contigs ranging from 15,305 to 18,861 for all the samples with N 50 values of 1795 bp. Four annotation softwares revealed differential regulation of several genes where 30,158 genes were specically up-regulated for combinations under Al stress conditions alone. Top up-regulated Differentially Expressed Genes (DEGs) in tolerant cultivar when compared to the sensitive one were found to be involved in protein transport as well as degradation, defences, cell growth and development. Wild v/s cultivar comparison revealed upregulation of wild DEGs that are involved in regulation of transcription in differentiating cells, pre-mRNA splicing, catalysis and protein ubiquitination. Based on assembled Unigenes, 89,722 high-quality SNPs and 39,874 SSRs were detected. Twelve selected genes were validated using qRT-PCR. KEGG pathway analysis extracted 8,757 GO annotation terms within molecular, cellular and biological processes. Pathway analysis indicated that organic acid synthesis and their transportation along with detoxication of ROS, an alternate pathway involving metacaspase-1,4,9 for programmed cell death were also signicantly induced due to Al stress. Conclusion: Present study unveils the characterization of differential transcripts generated under Al stress indicating Al tolerance as a multiplex phenomenon which will directly widen crop improvement programmes for Al toxicity utilizing molecular approaches.

per the growth response method of Singh et al. (2016) with minor modi cations at National Phytotron Facility, IARI, India [7]. Seedlings emerged in hydroponic medium were placed in growth chamber that was programmed for relative humidity (%) and temperature of 55.7/68.2 ± 2, 27/16 °C, respectively. Hydroponic medium containing 148 µM of AlCl 3 .6H 2 O with a pH of 4.8 was considered as treated whereas, its control was deprived of any AlCl 3 .6H 2 O and its pH was maintained at 6.5. Roots of lentil seedlings were exposed to these two conditions for 6 h following a completely randomised design with 3 replicates each having 12 seedlings.

RNA isolation and library preparation
Total RNA was sequestered from roots of control and Al exposed samples of resistant and sensitive genotypes with nomenclature 'C' and 'T' representing 'Control' and 'Al treated' samples and 1, 2 & 3 representing genotypes L-4602, BM-4 & ILWL-15, respectively using TRIZOL reagent (Takara, Japan) where each replicate having twelve seedlings were pooled together. It was followed by RNA integrity test using Agilent Bioanalyzer 2100 and quantitation through Nanodrop. cDNA library was prepared using TruseqRNA sample prep Kit (Illumina, Inc. San Diego, CA, USA) abiding the prescribed directives. Firstly, messenger RNA (mRNA) was fragmented into shorter sequences using a magnetic bead that contained poly-T molecules. Selected small inserts were converted to cDNA using IlluminaTruSeq™ mRNA Library preparation kit. For clusters generation, pair end adapters (Illumina) were ligated to ends of selected fragments followed by cDNA enrichment by fragments having adapter molecules on both ends using polymerase chain reaction (PCR). Constituted cDNA libraries were sequenced using Illumina HiSeq 2500 platform (Illumina Inc. San Diego, CA, USA). Fragments of 120 to 200 bp with a median of 150 bp were produced from puri ed mRNA using TruSeq RNA fragmentation protocol. For proper selection of reads, these fragments were eluted corresponding to bead volume and incubation time at successive stages which resulted in paired end reads of 100 bp from each sample arising from total raw reads ranging between 19,500,000 to 32,185,339.
2.3 Quality control, de novo assembly and differential gene expression Low quality reads (phred quality < 20) from raw Fastq les were ltered using FastXToolkit. Read quality after pre and post ltering was checked using FASTQC tool and the quality ltered reads were assembled using Trinity de novo assembler generating de novo contigs. In course of this process, unigenes i.e. non-redundant transcripts that cannot be extended further at their pair end and had passed redundancy exclusion course using sequence clustering software were identi ed. Mapping of clean reads to assembled transcripts was performed using Bowtie software. Differential gene expression was conducted using different packages like Deseq, Deseq2, Cuffdiff and EdgeR. Statistically signi cant differentially expressed genes (DEGs) in different comparison groups viz.
1C v/s 1T, 2C v/s 2T, 3C v/s 3T, 1T v/s 2T v/s 3T were identi ed based on FDR value < 0.05, respectively so that in ation caused by type-1 errors can be avoided. Most signi cant DEGs obtained from EdgeR package were sorted based on descending absolute log 2 fold change. Circos plots were plotted to display comparison between contigs of different comparison groups. For graphic visualization of top signi cant DEGs in different comparison groups, heat maps were prepared.

Gene ontology (GO) and pathway analysis
Different Public reference databases including National Centre for Biotechnology Information (NCBI) non-redundant (nr), Swiss-Prot and (SWISSPROT), UniProt Reference Clusters (UNIREF) were used to x unigene identity through Basic Local Alignment Search Tool (BLAST) that has sequence similarity index up to an E value < 1.0E − 5 . Annocript program was employed for Unigene classi cation and assignment of GO terms to assembled transcripts. Functional categories were based on cellular components, biological processes and molecular functions. Web Gene Ontology Annotation Plot (WEGO) tool was used for constructing GO plot representing sorted transcripts. Enrichment analysis of top 50 DEGs was conducted using Shinygo database. Cluster of Orthologous Groups (COG) database was employed to achieve operational cataloguing of unigenes and their role in different metabolic pathways were deduced through Kyoto Encyclopedia of Genes and Genomes (KEGG) database. DEGs of different comparison assemblies were represented by means of numerous plots like HeatMap, Volcano Plot, Circos and MapMan (v3.51R2).

Quantitative Real Time PCR (qRT-PCR) validations
RNA initially extracted for library preparation from all the samples was also used for qRT-PCR validation of gene speci c primers. cDNA was synthesized from RNA using verso cDNA kit (Thermo scienti c). Ampli cation was done using One Step SYBR PrimeScript RT-PCR Kit II (Takara Biotechnology Co. Ltd, Dalian, Japan). Each sample was run in 4 biological replicates. Total cDNA of 100 ng was used in 10 µl reaction mixtures and CFX96 RT-PCR (Bio-Rad) was used for analysis of differential gene expression by calculating and calibrating expression levels of targeted genes by 2 −(∆∆CT) method. PCR reaction of volume 20 µl was prepared using template cDNA of 6.8 µl, 2x SYBR Green mix of 10 µl, 5 µl each of forward and reverse primers and nally making the volume with nuclease free water. To normalize expression data, β tubulin was selected as reference gene for internal expression. Sequence information of primers used for qRT-PCR which were designed using primer 3 plus (https://primer3plus.com/cgi-bin/dev/primer3plus.cgi) is presented in Addition le 1. Table 1.  Total  number  of contigs   44991  46229  45163  52160  47855  46168  28469  28428  28300  44991  44991  42795  47175  44870   Maximum  length of  contigs   15764  16470  15764  17511  14061  14099  15305  15305  15305  16705  18861  16569  18569  18569   Minimum  length of  contigs   500  500  500  500  500  500  500  500  500  500  500  500  500

Differential gene expression annotation
Signi cant DEGs with FDR/q value/adjusted p value < 0.05 and log 2 fold changes above 1.5 were found to differ for different comparison groups. Table 2 highlights the number of DEGs found for different combinations using different packages. A total of 39,515 up-regulated and 37,890 down-regulated transcripts were detected in different comparison groups using EdgeR. Sharing of contigs between different combinations along with individual contigs is presented graphically by Venn diagrams (Fig. 1). When tolerant, sensitive cultivars (L-4602; BM-4) and wild genotype (ILWL-15) were matched with their corresponding controls, 2329, 2380 and 5130 up-regulated and 1609, 1562 and 2420 down-regulated DEGs were acknowledged, respectively. Table 2 also represents total as well as signi cantly up and downregulated DEGs deduced from four different softwares. Circos plots representing distribution of overall DEGs for comparison group 1T-2T-3T is represented in Fig. 2.

qRT-PCR endorsement
Validation of differential gene expression was done using a total of top 12 DEGs, 4 from each comparison group viz. 1C-1T, 2C-2T and 3C-3T (Fig. 6). IRX9, UPBEAT1 were checked for 3C-3T where rst two were up-regulated and rest two were downregulated. qRT-PCR expression of these genes is represented in Fig. 6. When log2 transformed data from qRT-PCR analysis was compared with RNA-seq data, a close similarity was revealed which further endorses differential expression of genes underneath Al stress condition.

Functional allocation of DEGs
GO terms for transcripts were extracted for functional characterization of DEGs. A total of 2857, 3403 and 2872 GO annotation terms were found in combination 1T-2T, 1T-3T and 2T-3T (Fig. 7). When all the three genotypes viz. L-4602, BM-4 and ILWL-15 were matched to their respective controls, signi cantly enriched GO annotation classes for DEGs which were found in all the three comparison groups were in cell, organelle, membrane, protein containing complex, supramolecular complex, membrane enclosed lumen, extracellular region, cell junction and symplast within cellular component category.
In molecular function group, it was for catalytic activity, transporter activity, binding, molecular function regulator, molecular transducer, transcription regulation, antioxidant and structural molecule activity. Metabolic process, biological regulation, cellular process, localization, response to stimulus, developmental process, multi-organismal process, reproductive process, signalling, biogenesis, immune system process, growth and rhythmic process were signi cantly enriched in biological process category. Apart from these biological processes, cell proliferation was found to be enriched only in 2C-2T and 3C-3T comparisons. Similarly, in molecular function category, obsolete signal transducer activity was found to be lacking in 1C-1T group. Also in wild, it was present only in Al treated ILWL-15 genotype and not in its control. Further enrichment analysis was also conducted separately for top 50 DEGs in all the comparison groups and GO annotation tree was prepared (Fig. 8) which revealed three major separate enrichment categories viz. Category I included 4 interconnected processes viz. developmental processes (1666 genes), Cell wall macromolecule metabolic process (144 genes), carbohydrate metabolic process (1233 genes) and anatomical structure development (1639 genes). Category II was involved with localization (2373 genes), establishment of localization (2343 genes) and vesicle mediated transport (283 genes). Category III had most of the interconnected functional processes with a total of 66,289 genes. This category included several metabolic processes including cellular nitrogen compound, protein, RNA, heterocycle, nucleobase-containing compound, cellular protein, cellular amide along with several biosynthetic processes related to cellular, cellular nitrogen compound, cellular macromolecule, aromatic compound, heterocycle, nucleobase-containing compound, RNA, Amide and Peptide together with regulations of gene expression, cellular biosynthetic process, macromolecule biosynthetic process and nucleobase-containing compound metabolic process. Also, gene expressions, response to stimulus and macromolecule modi cations were found to be enriched in this category.
3.8 Pathway analysis for aluminium stress 3.8.1 Overall pathway analysis software (Fig. 9). BINs or sub-BINs are visual representation of genes pertaining to a functional pathway.

Discussion
To understand Al tolerance mechanisms in lentil, present study sheds light on differential gene expression among cultivars as well as between cultivars and wild genotypes of lentil via de novo transcriptomic examination. Contigs were sequence aligned to publically available databases and functional characterization of DEGs along with pathway prediction was done. When differential gene expression among cultivars was accomplished, top DEGs which were up-regulated between tolerant and sensitive genotypes under Al stress condition were found to be involved in protein transport, protein degradation, defence, cell growth and development. Speci cally, PX domain-containing protein EREX is engaged in vacuolar transport of storage proteins. It controls membrane tra cking to protein storage vacuoles (PSVs) and attaches explicitly to phosphatidylinositol 3-monophosphate [21]. Protein Light-Dependent Short Hypocotyls 1 is a plausible transcription regulator that performs as a developmental regulator by stimulating cell growth in reaction to continuous red, far-red and blue light in a phytochrome-dependent mode [22]. BTB/POZ domain-containing protein NPY1 operates as a substrate-speci c adapter of an E3 ubiquitinprotein ligase complex (CUL3-RBX1-BTB) which facilitates the ubiquitination and consequent proteasomal degradation of target proteins. It regulates cotyledon development through control of PIN1 polarity and was found to be involved in root gravitropic responses in Arabidopsis [23][24][25]. ATP-dependent zinc metalloprotease FTSH 7 chloroplastic is a part of a complex that functions as an ATP-dependent zinc metallopeptidase. It is engaged in thylakoid formation and in exclusion of damaged D1 in the photosystem II, averting cell death under high-intensity light conditions [26,27]. Signi cantly down-regulated DEGs also have different functions which were suppressed in tolerant cultivar as compared to sensitive one. Among down-regulated DEGs, Zinc-nger homeodomain protein 4 is a putative transcription factor and is implicated in the regulation of oral induction [28]. Carboxylesterase and Aldehyde dehydrogenase family member have catalytic functions whereas protein kinase SD1-13 represses disease resistance signalling pathways [29].
Top up-regulated DEGs in tolerant control v/s treated group have catalytic activity (TNT 1-94) and antifungal activity (Defensin-like protein 39). DEMETER is involved in gene imprinting and catalyzes the discharge of 5-methylcytosine from DNA by a glycosylase/lyase process. It allows expression of the maternal copy of the imprinted MEA gene before fertilization and acts by nicking the MEA promoter. This transcriptional activator is entailed for stable reproducible patterns of oral and vegetative development [30][31][32]. Threonine synthase chloroplastic catalyzes removal of phosphate from L-phosphohomoserine and addition of water to yield L-threonine whereas ABC transporter A family member 2 helps in nucleotide binding. DEGs whose functions were receded have roles in catalysis (TNT 1-94, ATX4), RNA and ubiquitin binding (Protein RAE1), disease resistance (Protein RPP5) and plant defence signalling (protein kinase PBL23).
When sensitive cultivar was compared to its control, topmost DEGs whose functions were elevated have roles in cell expansion, root meristem patterning, auxin-transport (SAUR72 protein). Ty5-1 protein YCL074W has been reported as a truncated part of POL protein in mutated non-functional yeast transposon [33]. Sodium Potassium Root Defective 2 is a key up-regulated protein which is involved in metal binding thereby affecting the growth of sensitive genotype under Al stress. Down-regulated DEGs has function in DNA repair and mitotic recombination (CHROMATIN REMODELING 25) [34], chloroplast biogenesis and cell division (Amido phosphoribosyl transferase 2 chloroplastic) [35]. Calcium uniporter protein 2 mitochondrial constitutes a pore-forming and calciumconducting subunit which is involved in calcium uptake into mitochondria whereas UDP-galactose/UDP-glucose transporter 3 is an essential sugar transporter required for pollen development and embryo sac progress [36].
When wild (ILWL-15) and tolerant cultivar (L-4602) was compared, DEGs which were signi cantly up-regulated in wild are involved in regulation of transcription in differentiating cells; pre-mRNA splicing; catalysis and protein ubiquitination. For example, Ycf49-like protein is a part of cell membrane protein.
Similarly, when expression pro les of BM-4 (sensitive cultivar) and ILWL-15 (wild) were compared, signi cantly up-regulated DEGs in wild have their functions in catalysis [37], cellular copper and redox homeostasis [38], processing of poly-ubiquitin precursors as well as that of ubiquitinated proteins and in resistance to the arginine analog canavanine [39], cell surface adhesion along with endopeptidase activity. As plants lacks centrosomes, so they have acentrosomal microtubule arrays. Protein TPX2 which is down-regulated in wild genotype under Al stress has microtubule binding capability and regulates pro-spindle assembly during late prophase and at the onset of mitosis [40,41]. Other down-regulated DEGs involves EXO70B1, which is encompassed in Golgiindependent membrane tra c to the vacuole and is a positive regulator of both abscisic acid (ABA)-promoted and mannitol (drought)-endorsed stomatal closure [42,43] Similarly, Vacuolar protein sorting-associated protein 54 chloroplastic is involved in pollen tube elongation and other polar growth [4].
Octanoyltransferase is essential for de novo plastidial protein lipoylation during seed development [45]. Protein WEAK Chloroplast Movement Under Blue Light 1 is requisite for chloroplast avoidance response under high intensity blue light. This eschewal response results in the repositioning of chloroplasts on the anticlinal side of exposed cells. It sustains the rate of chloroplast photo relocation movement via cp-actin laments adjustments [46] DEGs which were up-regulated in wild genotype when compared to its control involves U-box domain-containing protein 13 which acts as a E3 ubiquitin ligase and 60S acidic ribosomal protein P1-2 which is involved in protein synthesis. Another up-regulated DEG was Auxin response factor 6 which is a type of transcriptional activator that binds precisely to the DNA sequence 5'-TGTCTC-3' located in the auxin-responsive promoter elements (AuxREs). Con guration of heterodimers with Aux/IAA proteins alters their aptitude to modify initial auxin response genes expression. It synchronizes both stamen and gynoecium maturation and fosters jasmonic acid stimulation [47,48] Top down-regulated DEGs included TMV resistance protein N which is a disease resistance protein and Cyclin-dependent kinase F-1 that modulates CDKD-2 and CDKD-3 activities by phosphorylation of the T-loop [49]. Thought-provokingly, in all the comparison groups involving cultivars as well as wild genotypes, top DEGs were associated with down-regulation of disease resistance genes.
In present study, result for sequence alignment with databases is way ahead of Kaur et al. (2011), which suggest that advancement in the sequencing chemistry has increased the quantity and quality of acquired data that has improved de novo assembly of non-model crop plants [50]. Further, generation of huge number of molecular markers from acquired data will help in improving the development of lentil reference genomic pool i.e. knowpulse.usask.ca. Top 12 DEGs from different comparison groups when evaluated using qRT-PCR, data were validated precisely. ABR 18 and SAUR-72 was up-regulated in 1C-1T comparison group whereas CSA-UDP and Expansin were down-regulated. Study on precocious germination of cultured immature embryos of Pisum sativum has showed that addition of ABA increases production of ABR-18 protein and is accumulated in testa during early seed development [51]. SAUR-72 has role in regulation of cell expansion, root meristem patterning and auxin transport. In a study conducted on tissue-speci c and developmentally regulated expression patterns of SAUR genes, it was found that SAUR72 was highly expressed in the steles of Arabidopsis roots and hypocotyls [52]. In 2C-2T comparison group all the four genes which were selected for qRT-PCR validation, viz. 30S5, Isocitrate Lyase, PCBP3 and Peroxidase-15 were up-regulated. Isocitrate lyase is an enzyme of the glyoxylate cycle which accelerates the cleavage of isocitrate to succinate and glyoxylate. Bradyrhizobium japonicum isocitrate lyase has been reported to have an important functional role in desiccation tolerance [53]. Poly(C)-binding proteins (PCBPs) are commonly acknowledged as RNA-binding proteins that interact in a sequence-speci c manner with single-stranded poly(C). PCBP1-4 including PCB-3 proteins are concerned chie y in innumerable posttranscriptional regulations [54], have reported that transcriptional regulation of PCBPs might itself be standardized by their localization within the cell and it can work as a signal-dependent and coordinated regulator of transcription in eukaryotic cells. In previous studies, impediment of peroxidase was linked with Al tolerance as it helps in maintaining H 2 O 2 levels desirable for non-enzymatic wall loosening [55,56]. But similar to our study, peroxidases were up-regulated during transcriptomic pro ling of wheat near isogenic lines under Al stress and were reported to be connected to decreased growth rate [6]. Since, in our study this has been up-regulated in sensitive genotype under Al stress, we cannot associate it with Al stress tolerance. Instead similar to study conducted in wheat NILs, here also it might be due to reduced growth rate under Al stress. In 3C-3T comparison group, CYP45081E8, HSP 17.1, were up-regulated and IRX9, UPBEAT1 were downregulated. CYP45081E8 is a probable monooxygenase which exhibit activity with iso avones such as formononetin, biochanin, pseudobaptigenin, daidzein, genistein, isoformononetin and prunetin, or with avonoids including naringenin, liquiritigenin, apigenin, luteolin, or kaempferol [57]. Up-regulated HSP 17.1 is a part of small heat shock protein (HSP20) family. These genes signify the highly profuse class amid the HSPs in plants. Hsp20 genes have been reported to be connected with stress triggered by HS and other abiotic in uences in soybean [58]. Down-regulated IRX9 gene encrypts a putative family 43 glycosyl transferase. It was reported to be co-ordinately articulated with the cellulose synthase subunits in the course of secondary cell wall formation [59]. Cell wall examination exposed a reduction in the richness of xylan in the irx9 mutant, demonstrating that IRX9 is essential for xylan production acts as channels for organic acid exudation. These include members of Al activated Malate transporter (ALMT) and multidrug export protein AcrE (MATE) families which encrypt membrane proteins that expedite organic anion e ux crosswise cell membrane. ALMT genes are involved in organic acid transportation which were signi cantly up-regulated in tolerant genotype when compared to its control. These genes are classi ed in the Fusaric acid resistance protein like superfamily, CL0307 and it was the rst Al tolerance gene to be recognized in case of plants represented by wheat [70]. ALMT 9 is a chloride channel which is activated by physiological concentration of cytosolic malate and mediates vascular malate uptake [71]. VvALMT 9 has facilitated the accumulation of malate and tartrate in the vacuole of grape berries [72]. ALMT 12 functions as quickly activating anion channel mainly for chloride and nitrate ions [73]. The action of AtALMT12 is controlled by phosphorylation through kinase Open Stomata 1, which in turn is activated by ABA under stress conditions to regulate stomata closure [74]. . Al stress also induces production of callose, which is a cell wall associated polysaccharide whose synthesis is not only an early sign of Al stress in plants, but it also occupies a central position in toxicity pathway that leads to hampering of root growth [86]. Al induces callose sedimentation adjacent to plasmodesmata that could obstruct cell to cell transportation and communication [87]. Callose synthesis was also increased in tolerant genotype. Callose synthase is multi subunit membrane associated enzyme complex whose activity is primarily under post-translational control but transcriptional regulation can also ensue [88]. Most of the Al tolerance pathways described above is under control of one or the other regulation system, one being sensitive to Proton rhizotoxicity 1 (Stop 1) regulation system, which in this study has signi cantly contributed to Al stress response in tolerant genotype by substantial upregulation of its protein. have isolated and characterized VuSTOP 1 gene from rice bean and found that it was mainly involved in H + tolerance [90].
When wild genotype was compared to tolerant cultivar, ROS mediated antioxidants signalling pathway gene were signi cantly upregulated. APX and GPX are key antioxidant enzymes of scavenging systems which in uences maximally to hydrogen peroxide detoxi cation. APX is one of the highly regulated enzymes [91]. In our previous studies also, we have identi ed signi cant increase in GPX activity in wild lentil genotypes [7]. For organic acids synthesis pathway, several genes were signi cantly up-regulated in wild genotype, one being that of Aconitatehydratase (Aco). Aco is an enzyme highly sensitive to irreversible oxidative inactivation by H 2 O 2 . This enzyme catalyzes isomerization of citrate to isocitrate via cis-aconitate in the TCA cycle. Aco have been shown to be involved in regulating tolerance to oxidative stress and cell death in Arabidopsis and Nicotiana benthamiana [92].

Conclusion
Transcriptome analysis reveals several differentially expressed genes in tolerant, sensitive as well as wild lentils in response to Al stress. De novo assembly and annotation of these DEGs has helped in elucidating prominent pathways involved in rendering tolerance toward Al toxicity in tolerant and wild genotypes.
DEGs involved in ROS detoxi cation, exudation of organic acids, callose synthesis, phytohormones signalling were identi ed to be involved in stress regulation. Further, MapMan plot showed that the role of cell wall and production of secondary metabolites in tolerant genotype and wild accession, respectively has an edge for Al tolerance over the sensitive genotype under Al stress conditions. Insights of these genes, pathways and identi cation of different markers involved in Al tolerance will help in strengthening breeding programs for Al tolerance.

Data Availability
Transcriptomics data reported in this article has been deposited in Sequence Read Archive (SRA) repository of NCBI database with accession number SAMN08211543.

Consent for publication
Not applicable.

Figure 2
Top up and down regulated DEGs represented through heat map in combination 1T-2T.     Blast hits of overall contigs using a. Species b. Database.