Transcriptome analysis of two contrasting genotypes of pearl millet to gain insight into heat stress responses

Background Pennisetum glaucum (L.) Br. is mainly grown in arid and semi-arid regions. Being naturally tolerant to various adverse condtitions, it is a good biological resource for deciphering the molecular basis of abiotic stresses such as heat stress in plants but limited studies have been carried out till date to this effect. Here, we performed RNA-sequencing from the leaf of two contrasting genotypes of pearl millet (841-B and PPMI-69) subjected to heat stress (42 °C for 6 h). Results Over 274 million high quality reads with an average length of 150 nt were generated. Assembly was carried out using trinity, obtaining 47,310 unigenes having an average length of 1254 nucleotides, N50 length of 1853 nucleotides and GC content of 53.11%. Blastx resulted in annotation of 35,628 unigenes and functional classification showed 15,950 unigenes designated to 51 Gene Ontology terms, 13,786 unigenes allocated to 23 Clusters of Orthologous Groups and 4,255 unigenes distributed into 132 functional KEGG pathways. 12,976 simple sequence repeats were identified from 10,294 unigenes for the development of functional markers. A total of 3,05,759 SNPs were observed in the transcriptome data. Out of 2,301 differentially expressed genes, 10 potential candidates genes were selected based on log2 fold change and adjusted p -value parameters for their differential gene expression by qRT-PCR.

include, reduction in chlorophyll content, changes in membrane fluidity and protein stability, reactive oxygen species (ROS) production, changes at transcriptome level etc. In some crops occurrence of heat stress during the flowering period leads to poor grain setting (3,4). Increased water stress due to heat throughout the growing cycle can reduce the crop yield (5). However, the impact of temperature beyond the optimum required for growth and underlying heat tolerance mechanisms are not clearly understood in many crops.
Pearl millet [Pennisetum glaucum (L.) R. Br.], is widely grown in African and Indian subcontinent, since prehistoric times. The crop's main center of diversity is known to be Sahel zone of West Africa. Pearl millet is a C4 species having diploid number 2n = 14, genome size around 1.79 Gb (6) and is mostly grown under drought-prone semi-arid and arid tropics in the regions with 200 to 800 mm of annual rainfall. Pearl millet is grown in an area of about 31 mha worldwide (http://exploreit.icrisat.org/profile/Pearl%20Millet/178) with 27.83 mtonnes production accounting for 50% of global millet production. Optimum temperature required for its growth is about 30-35 °C (7). It is well known for its tolerance to extreme environmental conditions and limited genomic resources are available as compared to other crop species.
Next-generation sequencing (NGS) based technology for analysis of transcriptome is more powerful and accurate as compared to, the Sanger based EST sequencing, suppression subtractive hybridization and hybridisation based microarrays (8,9). Moreover, over the last decade several NGS platforms including the Illumina are becoming more affordable and efficient for transcriptome sequencing. Additionally, transcriptome sequencing has emerged as an alternative core technology for discovery and understanding of genes associated with desired traits, where full genome sequencing is not economically feasible especially in case of non-model plants. RNA-sequencing (RNA-Seq) has become a benchmark tool for whole transcriptome gene expression quantification and identification of differentially expressed genes (DEGs). It provides scope for the identification of probable candidate genes involved in abiotic and biotic stress tolerance and further for the development of molecular markers (10,11,12).
Enormous amounts of ESTs generated from various transcriptomic studies and other genomic sequences are available in public databases for many model plant species. However, limited research emphasis has been given to the non-model crops including pearl millet, as evidenced by the presence of only 75,493 ESTs (https://www.ncbi.nlm.nih.gov/nuccore/?term=pearl+millet+ESTs) of this crop in GenBank (13). It is now possible to assemble transcripts without reference genome via de novo assembly using trinity (14) and/or one of several other available software tools. Recently, genome wide expression profiling in various non-model plant species growing under abiotic or biotic stresses, for various biosynthetic pathways or developmental stages has been carried out using RNA-Sequencing as a tool. These plants includes Raphanus sativus, (Scrophularia ningpoensis, Camellia sinensis, Lilium, Brassica rapa, Sesamum indicum, Asparagus officinalis and Agrostis (15)(16)(17)(18)(19)(20)(21)(22).
Flag leaf acts as an immediate source for panicle development during reproductive stage in plants (23), RNA-Sequencing of flag leaf subjected to heat stress during flowering stage was carried out using Illumina sequencing platform. De novo assembly resulted 47,310 unigenes and further, functional annotation (gene ontology, corresponding metabolic pathways) was carried out. The aim of the present study was to unravel the gene pool responsible for conferring heat tolerance to pearl millet. Being the first transcriptome report of pearl millet in response to heat stress, the data presented here will be a primary source of information for the research on genomics and functional genomics in this orphan crop.

Results And Discussion
Determination of physio-biochemical properties of P. glaucum genotypes The MDA (Malondialdehyde) content and MSI (Membrane Stability Index) analysis as observed in P.
glaucum genotypes 841B and PPMI69 revealed that genotype 841B was more heat tolerant as compared to genotype PPMI69 (Supplementary Figs. 1 and 2, Additional File 1).

Illumina Sequencing And Raw Data Pre-processing
The whole transcriptome sequencing was performed using Paired end (PE) 2 × 150 bp library on Illumina HiSeq 2500. The sequencing run produced a total raw data of 288,876,956, details are shown in Table 1. After removal of low quality sequences, ambiguous bases and adapter sequences by Trimmomatic tool (24), a total of 274,721,009 high quality clean reads, containing 39,782,593,275 nucleotides (nts) were generated having an average length of 150 nt and GC content of 57.17%. The sequencing data has been deposited to NCBI (National Centre for Biotechnology Information) Sequence Read Archive (SRA) database under the accession number SRP151237.    1A). 83.35% of the aligned unigenes showed more than 80% of similarity distribution (Fig. 1B). Blast top hits analysis showed that 57.07% of the annotated sequences correspond to Setaria italica, followed by Sorghum bicolor (6.61%), Cronobacter sakazakii (4.72%), Zea mays (4.33%) and Oryza sativa japonica (2.74%) (Fig. 2). The  Identification of heat responsive genes involved in biological pathways during flowering To identify potential heat responsive genes and understand their role in various biological pathways, KEGG (Kyoto Encyclopedia of Genes and Genomes) analysis was performed with a cut-off E-value of 1.0E-05. In total, 4,255 unigenes (11.94% of total unigenes) were categorised into 132 KEGG pathways (Table 5; Fig. 4A). The most represented pathways were the ones related to "biosynthesis of antibiotics" (10.86%), "purine metabolism" (6.46%), "starch and sucrose metabolism" (3.45%), "pyrimidine metabolism" (3.2%), "phenylpropanoid biosynthesis" (2.70%) and "glycolysis/gluconeogenesis" (2.63%). A total of 147 transcripts of starch and sucrose metabolism (3.45%) and 112 transcripts of "glycolysis/gluconeogenesis" (2.63%) were identified. Our study will provide, better understanding of molecular mechanisms that are prevailing during flowering stage of pearl millet under heat stress. Interestingly, this analysis may help in specifying pathways related to synthesis and turnover of compounds, which have favourable effects in grain filling and yield. Identification of simple sequence repeats (SSRs) and single nucleotide polymorphisms (SNPs) from P.
glaucum transcriptome Assembled transcriptome of P. glaucum was used for identification of SSRs. Based on criteria with a minimum of (5-10) repetitions of mono to hexa-nucleotide motifs, MISA software was used to search SSR markers in all unigenes (27). A total of 12,976 SSRs from 10,294 unigenes were detected, of which 2,116 unigenes had more than one SSRs (Table 6). Among all the identified SSRs, 50.88% fall under tri-nucleotide repeats, followed by mono-nucleotides (31.88%) and di-nucleotides (13.7%) but tetra-nucleotide, penta-nucleotide and hexa-nucleotide were represented only as a small fraction ( Fig. 4B). SSRs are simple motifs of nucleotides (1-10 nucleotides), which may occur as tandem or interspersed repeats and are abundant within genome of prokaryotes and eukaryotes (28). Genetic variability for heat stress tolerance in P. glaucum is still unexplored. Therefore mining the SSR markers from P. and rice (29) and in Gossypium raimondii (30). It has been reported that occurrence of GC-rich trinucleotides SSRs were frequent in exon regions, whereas distribution of AT-rich trinucleotides SSRs were found throughout the genome (coding sequences, untranslated regions, introns and intergenic spaces) (31). SSRs are codominant, multi-allele genetic markers that are highly reproducible and transferable among related species (32). As a result, it has been the most widely used marker for genotyping and other various breeding purposes (33 (Fig. 5).
The hierarchical cluster analysis of two different genotypes with heat stress treatment shows that the differential gene expression pattern of the transcriptome readily differentiates P. glaucum genotype PPMI-69 6 h from others (841-B control, 841-B 6 h and PPMI-69 control) (Fig. 5), possibly indicating the variation in differential gene expression occurring between genotypes in response to heat stress.
The maximum number of differentially expressed genes (1934 genes) was observed between PPMI-69 6 h and PPMI-69 control (Table 7). Differential gene expression patterns across the treatments and comparison of differential expression patterns between the two different genotypes, were analysed by cluster analysis with hierarchical clustering method (Fig. 5). These differentially expressed genes between samples 841-B 6 h and PPMI-69 6 h are grouped into two clusters, representing that genes involved in thermotolerance have different level of differential gene expression. This suggests that 841-B 6 h has better ability to maintain homeostasis during heat stress as compared to PPMI-69 6 h. A comparative analysis between differential expressed genes in 841-B and PPMI-69 genotypes was performed to identify common differentially expressed genes in both the genotypes under heat stress and those that are unique to each genotype of the P. glaucum. A total of 2,301 genes were differentially expressed, 20.99% (483 genes) of which were shared common by both genotypes. 15.95% (367 genes) of differentially expressed genes were unique to 841-B genotypes and 63.06% (1451 genes) of differentially expressed genes were unique to PPMI-69 (Fig. 5). However, the differential expression of genes between the two genotypes cannot be attributed exclusively to the treatments alone, as the two genotypes have inherently different levels heat stress tolerance and therefore could have different gene expression profile. The DEGs were also visualized by volcano plots (Fig. 6) In order to investigate the temporal expression patterns by qRT-PCR analysis, 10 target genes were selected based on the fold changes and adjusted p-values. The selected genes displayed varying patterns in response to different durations of heat stress (Fig. 7). Validation of selected 10 genes (7 known and 3 uncharacterized) by qRT-PCR deciphered their significant role in heat stress management in P. glaucum. These genes might play an essential role in amelioration of heat stress in P. glaucum.

DnaJ
Heat stress induces changes in protein conformation and increase improper folding of native proteins.
As a result, accumulation of many heat shock proteins is triggered to counter balance the negative effect of heat stress in plants (35). Among many heat shock proteins, DnaJ or Hsp40 is known to play  (42,43). Among these NAC genes, NAC67 has a role in imparting tolerance to multiple abiotic stress such as drought, salt and cold stresses. NAC67 has been reported to be involved in conferring tolerance to abiotic stresses in rice (44). However, so far there has been no report indicating its role in heat stress tolerance. In this study, PgNAC67 expression was found to be significantly upregulated (27 folds

EXP
PgEXP expression shows 5 folds significant upregulation in 841-B in response to 6 h of heat stress.
Association of expansin genes and heat stress tolerance in some plants has been reported (45,46).
Over-expression of the EXP1 gene exhibits low electrolyte leakage, decrease in membrane lipid peroxidation but higher chlorophyll content, net photosynthetic rate, relative water content, activity of antioxidant enzyme in transgenic plants (47).

Hd1
Heading 1 (Hd1) and early heading 1 (Ehd1) are mainly known for the regulation of flower development and flowering, leading to either induction or suppression corresponding to the particular photoperiod (48). Environmental factors such as day length, abiotic and biotic stresses regulate the expression of these genes. Earlier studies show that inhibition of early heading 1 (Ehd1), in response to drought stress delays flowering in rice (49). In our studies, PgHd1 expression shows significant downregulation in both genotypes in response to 30 min and 6 h heat stress.

LTP
PgLTP expression shows significant (19 folds) upregulation in response to 6 h of heat stress in PPMI-69 but significantly downregulated in 841-B indicating involvement of high activity with regard to transfer of lipid molecules in cell. This shows active regulation of membrane fluidity in PPMI-69 in response to heat stress. Lipid Transfer Protein (LTP) are reported to be involved in growth and development, response to abiotic and biotic stresses but their functions remain unclear (50)(51)(52)(53).
Moreover, LTPs has ability to facilitate the transfer of phospholipids between membranes in vitro (54).

Uncharacterized genes
Among differentially expressed contigs, uncharacterised ORFs share maximum proportion in our study. Uncharacterised genes with a predicted protein domain associated with zinc fingers (ZnF),

Conclusion
This study investigated the transcriptome profile of pearl millet flag leaves in response to heat stress.
In this study, high quality 47,310 unigenes were generated and annotated. This EST data will provide the foundation for research on gene expression, genomics, and functional genomics in pearl millet improvement program. Further, the SSRs obtained in this study shall facilitate the research on genotyping, and diversity studies of this important crop. The candidate genes whose expression patterns were validated by qRT-PCR shall serve as important resource for their effective utilization in the development of transgenic crops tolerant to heat stress.

Determination of physio-biochemical characteristics of plants
Malondialdehyde (MDA) content was determined according to Heath and Packer (58). 0.5 g of leaf tissue were taken, and homogenised in 10% trichloroacetic acid (TCA) and 0.65% thiobarbituric acid (TBA). These are incubated at 95°C for 30 min, allowed to cool down to room temperature, and centrifuged at 10,000 g for 10 min. The absorbance was measured using Shimadzu UV-vis

De novo assembly of flag leaves transcriptome
The next generation sequencing run for whole transcriptome was performed using Paired end (PE) 2x150 bp library on Illumina HiSeq 2500. Using FastQC tools (60), quality check was performed.
Trimmomatic was used for pre-processing of the raw reads to eliminate adapter sequences and poor quality reads. Trinity program was used for de novo assembly with default parameters (14,61).
Cluster Database at High Identity with Tolerance (CD-HIT) program (62) was run to remove the similar short sequences based on 90% alignment coverage to longer sequence and produces a set of 'nonredundant' (nr) representative sequences and eliminating short redundant sequences. Sequences were clustered using TGICL tools (63) with default parameters to produce longer, more complete consensus sequences. Gene construction was carried out using EvidentialGene tools with default parameters, to retain the biologically significant transcripts.

Annotation of transcriptome and identification of SSRs and SNPs
The transcriptome structural annotation was performed using TransDecoder tools. The functional annotation was performed using BLAST+ tools, with BLASTx using a translated nucleotide query (unigenes). Gene Ontology mapping was performed using Blast2GO (64), to specify all the annotated unigenes to various categories such as biological processes, molecular functions and cellular components. Pathway mapping of unigenes was performed using KEGG database (65

Expression analysis
Fragments per kilobase of transcript per million mapped reads (FPKM) unit was used to calculate the expression level of unigenes. Read count for each unigenes were calculated and then converted to FPKM using formula (Read count * 10 9 )/(Sum of read count * Length). Differential gene expression was determined using DEGseq (67), a R package. The significant differentially expressed unigenes were filtered based on adjusted p-value < 0.005 and log ratio > 1 and -1 between the samples.
Heatmaps of the significant genes were generated with the heatmap package (68), in R package.

Quantitative RT-PCR
Total RNA was extracted from the treated and control flag leaves as described above to study the differential expression patterns under heat stress (42°C for 30 min and 6 h) of the few selected genes. A total of 10 genes (PgDnaJ, PgGST, PgNAC67, PgTIL, PgEXP, PgHd1, PgLTP, PgUCP1, PgUCP2, PgUCP3) involved in heat stress response were selected for differential expression analysis, from the generated pearl millet transcriptome data based on log2 fold change ≥ 2 (for upregulated transcripts) and adjusted p-values. cDNA was synthesised for selected 10 DEGs with samples of flag leaves originally used for RNA-seq, using SuperScript® III First-Strand Synthesis System (Invitrogen, USA).
qRT-PCR was performed using LightCycler ® 480 System (Roche, Switzerland) and KAPA SYBR ® FAST qPCR Kits (Kapa Biosystems, USA) was used as reaction components. Gene specific primers were designed using PrimerQuest Tool (Integrated DNA Technologies (IDT), USA) ( Table 8). PCR programme was set as: 94°C for 5 min and 40 cycles of 94°C for 10 sec , 60°C for 10 sec. and 72°C for 10 sec.
PgActin was used as internal control to normalize all the data. 2 −ΔΔCt method was used to calculate relative fold change expression (70). Significance level was calculated using two-tailed unpaired ttest. Table 8 List of primers used for qRT-PCR analysis Gene Sequence (5ʹ to 3ʹ)     Real time PCR validation of 10 target genes in A) 841-B and B) PPMI-69 genotypes. Two tailed unpaired t-test was used to calculate p value, *p < 0.05, **p < 0.01, ***p < 0.001

Supplementary Files
This is a list of supplementary files associated with this preprint. Click to download. Supplementaryinformation.pdf