RNA-Seq based exploration of differentially expressed genes (DEGs) in cotton(Gossypium hirsutum. L) upon infection with Aspergillus tubingensis

Differentially expressed genes help in exploring plant defense mechanism under variable stress conditions. In current investigation, RNA sequencing was executed to explore the differential gene expression in resistant and susceptible varieties of Cotton ( Gossypium hirsutum ), upon infection with Aspergillus tubingensis . Comparative RNA-Seq of control and infected plants was performed using Illumina HiSeq 2,500. Overall 79.84 G clean data was generated and 6,558 DEGs were identified in both varieties, in response to pathogen inoculation. Differentially expressed genes were found to be involved in defense, antifungal response, signaling pathways, oxidative burst and transcription. Genes involved in defense responses, MAPK signaling, cell wall fortification and signal transduction were highly induced in resistant variety. Real time PCR also revealed the up regulation of MAPKKK YODA like, L-ascorbate oxidase, late embryogenesis abundant protein (At1g64065) and flavonoid 3',5'-hydroxylase-like, in resistant variety. Elevated accumulation of such DEGs in resistant variety could function as the source for identifying biomarkers for breeding and these can be used as potential candidate genes for transgenic manipulation. Their study also helped in understanding complex plant-fungal interaction and advanced the understanding of plant-microbe interaction. Inclusively, our findings provide an indispensable foundation for advanced understanding of the plant resistance mechanisms of cotton. esterase-like Concurrent up-regulation of genes of phenylpropanoid pathway in response to fungal inoculation coordinated production.


Introduction
Cotton is considered as one of the world's most treasured crops. It provides a significant part of planet's regular fiber to the international yarn production. Cotton fiber signifies about 90% of entire cotton economic worth, while further economic value is secured from seed and other With the developments in the field of transcriptomics, RNA-sequencing has emerged as a major "-omics" technology for reviewing inclusive expression of genes under various stresses 3 . In cotton, stress mechanism had been studied under several biotic and abiotic factors. Currently, in Pakistan, transcriptomic analysis of G. hirsutum was carried out to investigate the DEGs in susceptible G. hirsutum variety in response to whiteflies attack 4 . RNA-seq analysis of infested cotton floral buds has exhibited the differential expression of plant hormone signaling genes, transcription factors and kinase cascades 5 . RNA-seq of G. hirsutum root tissues has also displayed differential genes expression in response to drought condition 1 . Comparative transcriptomic and reverse genetics have been employed for the identification of differentially expressed genes under the stress of Verticillium dahliae 6 . RNA seq analysis divulges variances in the mechanisms of origination and elongation among long-fiber and short-fiber cotton lines 7 . Transcriptomic analysis of G. hirsutum has also been applied for the detection of candidate genes in response to Aspergillus flavus 8 . Innate immune system of plants comprises of cells population and signaling pathways which constitutively function to provide quick response to pathogens, at diseased area 9 . Both pathogen triggered immunity and effector triggered immunity respond to stress by using a joint set of signaling components, which includes ROS, Ca 2+ and multiple protein kinases. These signaling machineries regulate downstream regulatory protein activities including transcription factors, which then lead to vast transcriptional reprogramming and initiates the induction and accretion of various pathogenesis-related proteins, enzymes and pathogen infection-responsive metabolites 10 .
Rapid phosphorylation response against pathogens also involves some downstream mitogenactivated protein kinases (MAPKs) 11 and calmodulin protein kinases 12  This study was aimed to see the differential expression of different genes in tetraploid cotton, in response to A. tubingensis inoculation. Infected cotton cultivars were subjected to RNAseq analysis and obtained sequences were subjected to data analysis, gene ontology, functional annotation and pathway enrichment analysis. We intended to distinguish key DEGs of cotton in response to Aspergillus infection to interpret its resistance mechanisms.

Disease severity analysis
Infected cotton leaves showed symptoms at 4 dpi. Leaves of susceptible variety exhibited infection spots, which later spread on the whole leaf surface. Initial symptoms appeared on leaf veins. The tolerant variety showed reduced leaf spots (2 mm average size) which did not spread rapidly ( Fig.1A, B). Diseased area of fungal inoculated leaves of both varieties was measured in cm  value box line (F) FPKM box-plot for each sample. The X-axis is the sample name, Y-axis is log10 (fpkm+1). The different colors in the figure represent different ranges of fpkm values, the abscissa is the sample, and the ordinate is the number of protein-coding genes.

Illumina Sequencing and Quality Control
The complete analysis was comprised of 12 transcriptomic sequencing samples and a total of 79.84 G clean data was obtained. The effective data volume of each sample was 6.19~6.82 G with 44.37% average GC content. The Q30 base distribution was found to be 91.62~93.46% (Table 1).

Alignment to the Reference Genome
By comparing reads to the reference genome, the genomic alignment of each sample was obtained and the alignment rate was found to be 87.08~94.86%. Based on the alignment results, proteincoding gene expression levels were successfully analyzed (Supplementary Table 1).

FPKM density distribution
The FPKM density distribution reflected protein-coding gene expression pattern of each sample (Fig.1D). In FPKM value Box-whisker plot, five statistics i.e. minimum, first quartile (25%), median (50%), third quartile (75%) and maximum value were used to define information. It efficiently showed the symmetry of data and the degree of dispersion of distribution (Fig. 1F).

Assessment of variation among samples
PCA showed discrete behavior of resistant and susceptible variety and 74% total variance was contributed by all three PCs. PC1 showed 42.95% variance, while the variance of PC2 and PC3 was recorded to be 21.39% and 9.82 %, respectively (Fig. 2). Unsupervised hierarchical clustering of differentially expressed genes also provided useful information ( Figure S1).
Generally, the same types of samples were clustered together with similar biological functions.
Constant with the prospects of the experimental conditions, pathogen infected samples in both varieties were clustered together. The closer was the correlation coefficient to 1; the elevated was the degree of similarity in expression patterns between samples ( Figure S2).

Differential gene expression statistics and profiling
According to the expression levels, protein-coding genes were found in four differential groups (Fig. 4). Numbers of differential protein-coding genes were detected to be 2,409 in group 1, 1998 in group 2, 302 in group 3 and 1,849 in group 4.
Based on the alignment of the obtained cotton reads with reference genome (G.

Validation by Real Time PCR
The transcriptomic data was confirmed using nine genes (with FC expression greater than 5), under inoculated conditions in both varieties. The results exhibited substantial up-regulation of mRNA, in response to A. tubingensis inoculation (Fig. 5).  (Table S2).  . The X-axis is the enrichment score. The larger the bubble, the more the number of differential protein-encoding genes contained. The bubble color changes from purple-blue-green-red. With the color transition the scope of protein encoding genes become more well defined. Smaller enrichment p value represents the greater degree of significance.

Discussion
Plants use sophisticated constitutive and inducible defense mechanisms to cope with the attacks of pathogens 18 . Different transcriptomic analyses have helped us to understand defense mechanism of cotton plant after the inoculation of V. dahliae 19 . Presently, the whole gene profiling of three cotton species (G. hirsutum, G. raimondii and G. arboretum) is accessible 20 . This makes transcription profiling possible and the advancement of RNA-Seq provides insight into the cotton defense mechanism against pathogen infection and also aids in the detection of selected genes, imparting role in disease tolerance.
In the present investigation, defense response of two cotton (G. hirsutum) varieties was compared, following the infection of A. tubingensis. Obtained data was adequate for the quantitative analysis of gene expression. Over-all 6,558 genes were differentially expressed.
DEGs involved in "plant hormone signal transduction" and 'plant-pathogen interaction' pathways were considerably enhanced in the leaves of tolerant cotton variety upon fungal inoculation.
Consistent with our metabolomics data, in which primary metabolites (sugar, amino acids, fatty acids) and secondary metabolites (alkaloids, terpenoids, pheylpropanoid) exhibited higher accumulation in resistant cultivar 17 .

Genes involved in defense signaling
In this study, 7 receptor-like protein genes were highly induced in resistant variety post fungal inoculation. In susceptible cultivar, LRR receptor-like serine/threonine-protein kinase RCH1, cysteine-rich repeat secretory protein 55-like (2 genes) and receptor-like protein 12 (

Genes involved in stress response
In this study, 10 genes of putative disease resistance protein RGA3 were highly induced in resistant cultivar and disease resistance protein RGA2-like was also up regulated in resistant variety. In response to the infection of Phytophthora infestans, resistant genes like TMV and RGA3 have been reported to be upregulated in potato 24 . RPP13 locus of A. thaliana has been described for wide range of resistance against Peronospora parasitica (downy mildew pathogen) 25 . In this study, hypersensitive-induced response protein (1-like) was up regulated in resistant cultivar.
Hypersensitive Induced Reaction (HIR) plays role in development of impulsive HR in leaves, following pathogen attacks. During the spontaneous HR lesions, accumulation of development related genes was noticed in plant leaves 26 .
Late embryogenesis abundant (LEA) proteins were found to be expressed in vegetative tissues and these proteins have been reported to be induced by numerous stresses and abscisic acid 27 . Among them, calcium dependent signaling triggers LEA-type genes, which then possibly functions in plant defense and repair 28

Genes involved in phytohormone signalling
In response to pathogen invasion, differential gene expression of phytohormones had been observed in previous studies 5

Genes involved in Phenyl propanoid pathway
Plants accumulate proline-rich glycoproteins, lignins and phytoalexins, as a part of their defense mechanism against fungal pathogen infection 35 . Lignification strengthens plant cell-wall and thus confers resistance to pathogen invasion 36  Studies have shown that numerous CYPs function in phenypropanoid metabolism 40 .

Genes involved in Oxidative burst
In plants, generation of reactive oxygen species (ROS) is a hallmark of positive recognition of infection and instigation of defense genes by signaling mechanism. Production of reactive oxygen species has been described under both plant triggered immunity and effector triggered immunity 46

KEGG enrichment analysis of DEGs
The alkaloid biosynthetic pathways (isoquinoline alkaloid biosynthesis) was highly enriched in resistant cultivar. Expression of various alkaloids is stimulated by fungal elicitors, UV radiation, osmotic shocks and heavy metal ions. The phenyl alanine metabolic pathway was highly enriched in resistant variety and it was also enriched in susceptible variety, in response to Aspergillus infection. The amino acid phenylalanine plays vital role in the interlinkage of primary and secondary metabolism in plants 49 . In tolerant variety Linolenic acid metabolism pathway was also found to be enriched.
Jasmonic acid and its products, produced from alpha-linolenic acid pathway, are primary modulators in plant resistance against necrotrophic pathogens 50 . Up-regulations of sugars, lipids and amino acids metabolisms in plants control signal transduction cascades in plant defense strategies (51) . Plant hormone signal transduction pathway was also highly enriched in tolerant cultivar. Phytohormones control complicated signaling networks and responses to stresses including biotic and abiotic stresses 52

Preparation of fungus culture and disease assessment
Strain of A. tubingensis was grown on potato dextrose agar (PDA) at 30 °C for 6 days. Conidia were harvested by scrapping the mycelium from PDA after adding Czapek broth in the petri plates.
The suspension was collected in a sterilized flask and adjusted to our desired concentration (10 4 conidia/ml). Cotton varieties (CIM-573 and NIA-Sadori) were grown in the greenhouse.
Greenhouse grown samples were collected in compliance with relevant institutional, national, and international policies and legislation. Leaves of both varieties were used to make small holes and

RNA Seq quality assessment and genome mapping
Raw data (raw reads) was processed and converted to clean reads via Trimmomatic software (57) .
Raw reads and the obtained results were stored in FASTQ (referred as fq file format), which included sequencing. Reads having poly-N, adopter sequence and low quality reads were eliminated via Trimmomatic, to attain clean reads. By using FastQC, quality of both trimmed and untrimmed reads was assessed. Mapping and alignment of the obtained clean reads to the G.
hirsutum reference genome was done using HISAT2 aligner 58 .

Gene-Level quantification
Cufflinks was employed for computing the expression level of protein coding gene (FPKM reads) 59 . Using htseq-count, the read counts of each protein coding gene were attained 60 .

Analysis of differentially expressed genes (DEGs), GO and KEGG enrichment
Differential expression of genes was evaluated using DESeq. Base mean value was used to determine levels of the expression of gene. The difference multiple was calculated and the significant difference test was performed on the number of reads using NB (negative binomial distribution test). For the screening of differentially expressed genes, threshold of P value (less than 0.05) and fold change (more than 2) was set. In order to explore expression pattern of genes,

Real time PCR analysis
Using q-PCR, expression of selected candidate genes was determined for the authentication of our transcriptomic data. Ubiquitin was used as a housekeeping gene. Using Primer 3 software tool, forward and reverse primers were designed (

Data analysis
Results were described as mean ± standard deviation (SD) and investigated by one way analysis of variance (ANOVA, P < 0.05).         The X-axis is the enrichment score. The larger the bubble, the more the number of differential protein-encoding genes contained. The bubble color changes from purple-blue-green -red. With the color transition the scope of protein encoding genes become more well defined. Smaller enrichment p value represents the greater degree of significance.

Conflicts of Interest
We confirm that all the authors in our manuscript have no conflict of interest. FPKM box-plot for each sample. The X-axis is the sample name, Y-axis is log10 (fpkm+1). The different colors in the gure represent different ranges of fpkm values, the abscissa is the sample, and the ordinate is the number of protein-coding genes.

Figure 2
3D PCA exhibiting the variance of DEGs in all four data sets, with an overall variance of 74% contributed by all the three PC.  Differential expression by volcano map in four data sets. The gray color represents genes with a nonsigni cant difference, while red and green are showing signi cantly up and down regulated genes. The horizontal axis is the display of log2 fold change, and the vertical axis is -log10 P value.   KEGG enrichment top20 bubble diagram of RT vs RC (A)ST Vs SC (B) and RT vs ST (C). The X-axis is the enrichment score. The larger the bubble, the more the number of differential protein-encoding genes contained. The bubble color changes from purple-blue-green-red. With the color transition the scope of protein encoding genes become more well de ned. Smaller enrichment p value represents the greater degree of signi cance.