Transcriptome Analysis of Sugarcane in Response to Colletotrichum Falcatum Infection Reveals Repertoire of Transcription Factors and Host Targets

Background - Sugarcane (Saccharum spp hybrid), an important C4 perennial plantation crop, globally grown for white sugar and ethanol production. Red rot caused by Colletotrichum falcatum is one of the most important threats affecting sugarcane productivity in many countries including India. Materials and Methods - Comprehensive understanding is very much needed to dene their transcription level differences and their key regulatory genes during interaction of sugarcane with C. falcatum. To compute and evaluate the molecular mechanism in sugarcane, transcriptome analysis of sugarcane challenged with C. falcatum was sequenced using Hi-Seq 2500 and gene expression proles were generated by qRT-PCR assays in both compatible and incompatible interactions after challenge inoculation of C. falcatum in sugarcane. Results - A total of 15,728,914 reads were aligned to 48,935 unigenes using BOWTIE 2; the unigenes were annotated using BLASTX and found that 39,895 unigenes were annotated and 22,025 were unigenes with respect to host species, 8,830 with respect to Colletotrichum spp and 9,040 were found to be novel genes. A total of 243 transcription factors (TFs) were found to be predicted in sugarcane challenged with C. falcatum and those TFs were divided into 45 specic families. WRKY, MYB, NAC, bHLH and AUX/IAA transcription factors were found to be abundant which are considered to be key regulators in controlling wide range of molecular events such as defense response, oxidative stimuli, host signalling and triggering disease resistance. In addition, a lot of stress related genes and genes involved in gene ontological and KEGG pathway were signicantly affected due to C. falcatum infection. Quantative real time PCR assays carried out to validate reliability of observed expression patterns in sugarcane in response to C. falcatum infection illustrates rst transcriptome wide in planta identication and analysis of TF repertoire in the host pathogen interaction. Conclusion - The results of this study provide a benchmark discovery in nding host targets and provide tissue specic data set of genes that express in response to C. falcatum in sugarcane and also a complete analysis of main group of genes that signicantly enriched under this condition. This is the rst comprehensive work provides basis for the further studies to dissect role of TFs at molecular level in sugarcane defense to fungal pathogens.


Introduction
Sugarcane (Saccharum spp) belongs to Poaceae family; an important cash crop cultivated worldwide for sugar and ethanol. Biotic stresses especially by plant pathogens such as bacteria, fungi and viruses are the major threat for sugarcane production across the globe. Among the pathogens, red rot of sugarcane is considered to be a serious threat and it is caused by an ascomycete pathogen Colletotrichum falcatum Went (Teleomorph: Glomerella tucumanensis [Speg.] Arx & Muller). The emergence of red rot of sugarcane has challenged sugarcane production and cultivation in India resulting considerable economic losses (1). The fungus is responsible for many severe disease outbreaks is one of the most destructive pathogens of sugarcane in India. Outbreak of red rot in sugarcane is most destructive on this annual tall grass due to spread of the pathogen through sugarcane setts that are used as seed material (2). Several studies were conducted on the pathogen ora during the epidemics that identi ed emerging isolates of C. falcatum which is responsible break down of resistance leads to varietal failure (3,4). Unlike the most of the Colletotrichum spp that infect foliar tissues, twig and fruits in the plants, C. falcatum infects stalks tissue in sugarcane and degrade hard nodal tissue which is made of cellulose, pectin and ligninolytic substrates. Several attempts were made to understand host pathogen interaction and initially induction of pathogenesis related proteins in response to C. falcatum interaction (5,6).
To understand molecular basis of the host-pathogen interaction differential display RT-PCR was initially used to identify transcripts differentially regulated during interaction in sugarcane and suspension cells (7,8) Further, subtractive libraries were established to identify time speci c and initial defense responses of sugarcane during C. falcatum pathogenesis (9). Besides to understand the pathogen biology in detail, the genome of C. falcatum was sequenced to be 48.16 MB in size with 12,270 genes (10). The transcriptome of C. falcatum (in vitro) was reported to be 31 MB with 23,136 CDS were predicted (11). Recently we have identi ed the candidate secretory effector proteins (CSEPs) from genome and transcriptome of C. falcatum and virulent strain speci c SSR marker from the genome of C. falcatum (12,13). The proteomicsbased core proteome analysis classi cation was performed during sugarcane challenged with C. falcatum (14). Molecular basis of breakdown of resistance and emergence of virulent pathotypes remains unexplored due to complexities in screening them in large scale. To elucidate the mechanism underlying host pathogen interaction, it is important to determine the host targets during sugarcane interaction with C. falcatum using next generation sequencing (NGS) technology. Sugarcane is a polyploidy crop which is complex to understand moreover the genome of sugarcane is not completely sequenced even after advent of high through put sequencing technology. With this background, transcriptome sequencing technique facilitates to elucidate host pathogen interactions especially in case of sugarcane and C. falcatum interaction.
In this study, we utilised NGS platform to sequence both host pathogen sequence through Illumina Hi-Seq 2500 to identify host targets in sugarcane during interaction with C. falcatum. For the rst of its kind, a cross platform analysis was performed to nd out the host targets in sugarcane during C. falcatum interaction using plant transcriptional factor database and found that 243 transcription factors were induced during this interaction. These results were compared with our previous ndings where we attempted to exploit genes in traditional sequencing platforms.

Plant material and pathogen inoculation
A sugarcane cultivar resistant (Co 93009) and susceptible (CoC 671) to red rot were raised by following standard cultural practices for a tropical sugarcane. The pathogenic C. falcatum pathotype Cf671 was challenge inoculated in both the cultivars as reported earlier. Sterile milliQ water was used in place of conidial suspension in mock samples. C. falcatum inoculated sugarcane stalks along with un-inoculated healthy control and mock inoculated control of both the cultivars were harvested in triplicates (three independent biological samples) frozen in liquid nitrogen and stored at -80 o C till RNA extraction.

RNA extraction and library preparation for transcriptome sequencing
Total RNA was extracted from CoC 671 challenged with Cf671 using TRI Reagent (Sigma-Aldrich, USA) in different time intervals 2,4,6 dpi, and treated with Rnase free DNAse I (Promega, USA) Subsequently, the quality of RNA was checked in 1% denatured agarose gel electrophoresis for the presence of intact 28 and 18S bands and RNA was quanti ed using Nanodrop-8000. The paired-end cDNA sequencing library was prepared using Illumina TruSeq SBS Kit v3 as per the manufacture protocol. Library preparation was started with mRNA fragmentation followed by reverse transcription, second-strand synthesis, paired-end adapter ligation and nally ended with index PCR ampli cation of adaptor-ligated library. Library Cluster generation and sequencing run Paired-End sequencing allows the template fragments to be sequenced in both forward and reverse directions. Cluster generation was carried out by hybridization of template molecules onto the oligonucleotide-coated surface of the ow cell. Immobilized template copies were ampli ed by bridge ampli cation to generate clonal clusters. This process of cluster generation was performed on cBOT using TruSeq PE Cluster kit v3-cBot-HS. The kit reagents were used for binding of samples to complementary adapter oligos on paired-end ow cells. The adapters were designed to allow selective cleavage of the forward cDNA strand after resynthesis of the reverse strand during sequencing. The copied reverse strand was then used to sequence from the opposite end of the fragment. TruSeq SBS v3-HS kits were used to sequence cDNA of each cluster on a ow cell using sequencing by synthesis technology on the HiSeq 2500 done at Nucleome Informatics Pvt. Ltd, Hyderabad, India.

Transcriptome sequencing, assembly and annotation
The cDNA library prepared was sequenced on the Illumina HiSeq 2500 performed by Nucleome Informatics Pvt. Ltd, Hyderabad, India and both ends of the cDNA were sequenced. Clean reads were obtained by removing the empty reads, the adaptor sequences, and the low-quality sequences (reads with unknown base airs 'N') using TRIMMOMATIC V 0.35. The clean reads were then assembled into contigs and transcripts based on pair-end information using Trinity and CD-HIT with default parameters. TGICL performed on 161,701 CD-HIT transcripts for analysis of large expressed sequence tags (EST) and mRNA databases in which the sequences are rst clustered based on pairwise sequence similarity to produce longer and complete consensus sequences which yielded 116,422 transcripts. Further Evidential Gene used in 116,422 transcripts to retain the biological signi cant transcripts.
Gene enrichment analysis and protein family classi cation GO sequence distribution, helped in specifying all the annotated nodes comprising of GO functional groups. CDS of transcripts associated with similar functions were assigned to the same GO functional group. The GO sequence distributions were analysed for all the three GO domains i.e., biological processes,

KEGG pathway identi cation
Orthologous assignment and mapping of the transcripts to the biological pathways were performed using KEGG automatic annotation server (KAAS) (15). All the transcript contigs were compared against the KEGG database (16, 17) using BLASTx with threshold bit-score value of 60. The mapped transcript contigs represented metabolic pathways of major biomolecules such as carbohydrates, lipids, nucleotides, amino acids, glycans, cofactors, vitamins, terpenoids, polyketides, etc. The mapped contigs also represented the genes involved in genetic and environmental information processing and cellular processes.

Distribution of Transcription factors
The transcripts were annotated against all plant transcriptional factor database protein sequences (http://planttfdb.cbi.pku.edu.cn/) to determine the transcription factors in transcriptome samples.

Detection of simple sequence repeats (SSR)
Mining of SSR or microsatellites were predicted using MIcroSAtellite identi cation tool (http://pgrc.ipkgatersleben.de/misa/) with certain number of criteria to avoid interruption of compound microsatellite localization. Sugarcane and C. falcatum transcripts were analysed to mine potential SSR, Compound microsatellites as well as SSR motifs are classi ed into Mono, di, tri, tetra, penta and hexa nucleotides.
qRT-PCR validation for the key genes Based on the transcriptome data, the transcription factors expressed in sugarcane challenged with C. falcatum WRKY, MYB, NAC and bZIP were chosen to quantitative real time polymerase chain reaction (qRT-PCR) with a Step-One plus Real-Time PCR system (Applied Bio systems, USA) Each reaction was carried out in triplicates, the primers used for qRT-PCR were designed based on the transcriptome data except for actin and are listed in Table S3. For qRT-PCR experiments, cDNA concentration was standardized for each sample and dissociation curve analysis was performed to check primer speci city. The cocktail for qRT-PCR reaction contained 50 ng cDNA, 2.5 pmol primers, 10 µl SYBRGREEN Master Mix and with nuclease free water the total reaction mixture was made up to 20 µl. The reaction was performed for 40 cycles (denaturation for 10 min at 95 °C followed by annealing and extension at 1 min for 58 °C).

Results
Transcriptome sequencing and assembly The 15,728,914 clean reads of sugarcane sample assembled using Trinity transcriptome assembler with default parameters resulted to 168,960 transcripts. On the assembled 168,960 transcripts, CD-HIT-EST module from the CD-HIT was run to remove the similar short sequences based on 100% alignment coverage to longer sequence resulted in 161,701 transcripts. TGICL run using default parameters resulted to 116,422 transcripts. Evidential Gene used in 116, 422 transcripts to retain the biologically signi cant transcripts. Evidential Gene runs with default parameters reported 48,935 unigenes ( Table 1). The sequence E-Value distribution ( Fig S1) and top species from annotated unigenes were evenly distributed (Fig 1).

Annotations and species distribution
The transcripts structural annotation performed using TransDecoder tool. TransDecoder identi es candidate coding regions within transcript sequences. It also provides information about each CDS sequences whether it is complete or partial. About 48,935 unigenes were submitted to TransDecoder, and it detected 33,661 (68.78%) unigenes with ORFs, with 8622 presenting complete ORFs. The transcriptome of sugarcane and C. falcatum identi ed 48,935 unigenes, 39,895 were found to be annotated using BlastX and among the 39,895 genes, 8,540 unigenes belonged to the fungus while 9,040 unigenes had no signi cant blast hits. The species distribution from the transcriptome of sugarcane and C. falcatum had 21,169 genes matching with Sorghum bicolor followed by Zea mays (5,000), C. graminicola (3250), C. sublineola (2980), Setaria italica (2400), C. incanum (1960), Saccharum cultivar R570 (1100), C. to eldiae (980) and C. higginisianum (940) (Fig 1).

GO analysis and distribution
The GO sequence distribution for sugarcane in response to C. falcatum infection was analysed for all the three GO domains i.e., biological processes, molecular functions and cellular components (Fig 2). The transcriptome of sugarcane inoculated with C. falcatum had 16,576 unigenes involved in biological process in which 12,445 unigenes belonged to sugarcane ( Fig S2) and 4,921 unigenes with C. falcatum ( Fig S3). The molecular functions of 17,428 unigenes in which 13,531 and 3,777 unigenes were related to sugarcane and C. falcatum, respectively. The cellular components contained 13,745 unigenes, in which 11,638 and 3,685 related to sugarcane and C. falcatum, respectively.

KEGG Pathway analysis and distribution
Sugarcane and C. falcatum unigenes were mapped and screened for the biological pathway predictions using KAAS. The mapped unigenes represented metabolic pathways of major biomolecules such as carbohydrates, lipids, nucleotides, amino acids, glycans, cofactors, vitamins, terpenoids, polyketides, etc.
The mapped unigenes also represented the genes involved in genetic information processing, environmental information processing, organismal systems and cellular processes (Fig 3).

Distribution of transcription factors
A total of 243 TFs were found to be predicted in sugarcane challenged with C. falcatum and those TFs were divided into 45 speci c families. WRKY, MYB, NAC, bHLH and AUX/IAA transcription factors were found to be abundant and are considered to be key regulators in controlling wide range of molecular events such as defense response, oxidative stimuli, host signalling and triggering disease resistance (Fig  4). Among the TFs, WRKY, MYB and NAC were found to present in abundance and also up regulated during interaction of sugarcane and C. falcatum when compared with other hemibiotrophs and necrotrophs ( were up regulated during sugarcane interaction with C. falcatum. MYB transcription factor known for its decisive role in up regulation of glucose and this TF lays foundation for increase in glucose release and abiotic stress tolerance. Several MYB TFs were found to be present in sugarcane transcriptome such as MYB 1,4,7,11,15,21,35,38,43,46 and 108. This WRKY, NAC and MYB genes control wide range of molecular events which are putatively identi ed in various hemibiotrophs and necrotrophic fungal organisms during interaction with respective host systems (Table 5).

Stress related genes
There are several stress related transcription factors expressed during sugarcane and C. falcatum interaction, AP2/ERF transcription factor involved in functional divergence during drought, salt and also for plant hormonal treatment. Production of reactive oxygen species (ROS) signalling is known earlier events in response to microbe intervention. This stress related gene was found to be expressed during sugarcane and C. falcatum interaction which is no wonder because ROS signalling gene triggers mostly during biotrophic necrotrophic switch (BNS) where it acts as antioxidative enzyme. Along with ROS, there are several other genes found to be expressed such as secondary metabolite, defense JA-ET signalling genes, SA signalling and Ca 2+ signalling and chalcone synthase. Pathogenesis-related (PR) proteins are pathogen or stress induced plant proteins PR 1 a primary protein which mediates SA defense response was expressed, PR 2 family (endo-1-3-β-D-glucanase) and PR 5 family (thaumatin -like) associated with pathway regulations, PR 6 (proteinase inhibitor) and PR 9 (Peroxidases) closely associated with JA-ET pathway were also found to be expressed during the interaction. Several stress related genes were manually annotated based on the results generated through various platforms like SAR priming e cacy, DD RT-PCR (Table S1), 2D based defense proteome analysis (Table S2) suggested that only a smaller number of TFs were recorded compared to transcriptome-based screening. Comparative screening revealed that large numbers of Zn2Cys6 TFs were expressed during this interaction ( Table 3).

Functional characterization of transcription factor during compatible and in-compatible interactions in sugarcane
The biotrophy and nectrophy switch in pathogen evokes a differential response in both resistant and susceptible cultivars of sugarcane. C. falcatum interaction with sugarcane triggers several signalling pathways at different phases. This transcriptional screening using qRT-PCR revealed that C. falcatum switches its phases easily in the susceptible cultivars but strategically switches in resistant cultivars (Fig  8). Most of the TFs were up regulated during compatible interactions but WRKY, MYB 2, bHLH 2 were down regulated during C. falcatum interaction with susceptible cultivar (cv CoC 671). Same way the incompatible interactions shows down regulation of multiple TFs such as WRKY 26, MYB, bZIP, bZIP2, bHLH-2 may be due to non-responsiveness of PAMP or ETI. The resistant cultivar tailored for its defense strategy, WRKY 26 and MYB were down regulated during in-compatible interactions (Fig 9). NAC and MYB genes expressed in both compatible and in compatible interaction of sugarcane and C. falcatum. bZIP and bHLH transcription factor putatively involved in regulation of cell wall integrity was down regulated during in compatible interactions.

Discussion
Since, sugarcane is a long duration crop, it faces a number of abiotic and biotic stresses that are considered to be highly vulnerable and impact crop growth and productivity. Complex defense response in sugarcane is mainly due to genome complexity which makes different signal perception and transduction networks. Transcription factors (TF) are known for their regulatory role in signalling pathway related to development or disease resistance in sugarcane. TFs activate different pathways directly or indirectly by targeting key genes involved in plant defense responses which are considered to be the host targets by the pathogen (18) ( Table 3). Our previous host pathogen interaction studies revealed a lot of pathogenesis related (PR) proteins expressed in compatible and incompatible interactions of sugarcane with C. falcatum, early and prominent induction of chitinase, β-1,3 glucanase and thaumatin-like proteins was documented as a defense and induced defense response against C. falcatum (5,6). Further, accumulation of 3-deoxyanthocyanidin phytoalexins at the pathogen infection site was documented as marker for red rot resistance (19,20). The differential expression of the chitinase gene in red rot resistant and susceptible sugarcane cultivars was monitored through qRT-PCR (21). Subsequently molecular tools like differential display, subtractive libraries and NGS platforms were deployed to understand differential expression of defense/resistant genes and upregulation of genes during biotrophic phase (7,8,9,22). Further, the involvement of jasmonic acid (JA), ethylene (ET), reactive oxygen species (ROS), phosphoinositide (PI) and calcium (Ca 2+ ) signals in the red rot resistance was hypothesized (9,23).
Distribution of TFs in response to fungal pathogen interaction with different host systems revealed that sugarcane interaction with C. falcatum expressed large number of TFs (Table 5). WRKY TFs are alleviating and effective in mediating hormones involved in pathogen-associated molecular patterns (PAMP) triggered immunity, effector triggered immunity or system acquired resistance (SAR) (24). WRKY 100 involved in regulation of biotic and abiotic stress in C. gloeosporioides and M. grisea (25,26). WRKY 1 negatively regulates resistance in fruit during infection caused by C. acutatum (27). Several reports are made in monocot and dicot plants were WRKY involved in inducing the multiple signal transduction pathway (Table   2), moreover it is suggested that there are around 39 WRKY TFs are available in Saccharum spp (18) but during interaction with C. falcatum 26 different types of WRKY TFs were found to be expressed under this biotic stress which is considered to be the rst consolidated and novel report in sugarcane. Comparative TFs screened in sugarcane challenged with C. falcatum using various platforms like transcriptome analysis, SAR priming e cacy, DD RT-PCR and 2D based defense proteome analysis and the results suggested that major TFs were properly screened and consolidated in this study (Table 4). WRKY 51 expressed during sugarcane suspension cells interaction with C. falcatum (Table S1). During 2D based defense proteome analysis no WRKY TFs were reported (Table S2).
Another TFs which is equally important is NAC, this NAC TFs plays dual role in triggering innate or hypersensitive response and also ETI. Reports from plant transcription factor databases suggested that Saccharum spp contains 44 NAC and our transcriptome studies revealed that there are 21 NAC TFs involved during interaction of sugarcane with C. falcatum. There are around 11 NACs were found to be expressed during SAR priming in sugarcane with C. falcatum (28) Over expression of NAC in Oryza sativa (Os NAP) TFs has controlled temperature related stresses and also plays a signi cant role in increasing resistance against cold stress at vegetative state (29). MYB TFs are considered to be the major and functionally diverse proteins which are primarily involved in protein-protein interactions, DNA binding and protein regulations (18). There are around 36 MYB TFs were reportedly present in Saccharum spp and during interaction with C. falcatum, sugarcane transcriptome contains 44 MYB TFs were found to be expressed (Table 4). MYB TF 1 were down regulated during in compatible interaction with C. falcatum but during compatible interaction MYB 2 were expressed lower (Fig. 8 & 9). bZIP and bHLH TFs majorly reported as stress tolerance factor during Colletotrichum infection with concern host system. bZIP in C. gloeosporioides and C. coccodes putatively predicted or involved in oxidative stress tolerance, cell wall integrity, host signalling and PAMP triggered immunity ( Table 2). C2H2 zinc nger and wing helix repressor DNA binding TFs were expressed during 2D based defense proteome screening which is found to be expressed during compatible interaction in large number and probable TLP TFs were expressed higher during SAR priming but during transcriptome analysis it is found to be expressed less in number (Table 4).
Several reported PR proteins were screened in sugarcane (Table 4), PR 1 a primary protein which is mediated to SA defense response is expressed, PR 2 family (endo-1-3-β-D-glucanase) and PR 5 family (thaumatin -like) associated with pathway regulations, PR 6 (proteinase inhibitor) and PR 9 (Peroxidases) closely associated with JA-ET pathway (28, 30). FAR-1 transcription factors a key component of phytochrome signalling and regulator of ABA signalling was up regulated in both biotic and abiotic stress (31). This FAR-1 was up regulated even after the pathogen reaches necrotrophic phase. Earlier, 85 stress associated NAC genes regulating important NAC plant transcription factor involved in various biological and molecular process were predicted in sugarcane (32). There are several reports that NAC implicated in plant growth, development and owering (33). R2R3-MYB from MYB family showed that it has been triggered during drought tolerance and alternatively spliced during the plant process (34). ScMYBAS 1 encoding putative transcription factor regulates salinity and water de cit stress (35).
bZIP transcription factor known for its responses during pathogen attack, bZIP genes expressed in velvet leaf during interaction with C. coccodes and reported with temporal expression patterns (36). bZIP transcription factor triggers oxidative stress and cell wall integrity during C. gloeosporioides infection (37). WRKY gene triggers defense response in strawberry fruit accomplished during interaction with C. acutaum (27). WRKY 100 encodes group I WRKY regulates resistance in Malus domestica (Apple) during C. gloeosporioides infection (25). Tea plant (Camellia sinensis) during interaction with C. gloeosporioides expresses several transcription factors such as WRKY, MYB75, ARF, NAC and NFY considered as host targets (38). Overall comparative studies suggested that transcriptome of sugarcane and C. falcatum interaction brought an encouraging result in identifying the host targets such as WRKY, MYB, NAC and AP2-ERF. Further, it will be established in screening the spatial and temporal expression of sugarcane during biotic stress.

Conclusion
Sugarcane TFs are known for its pivotal role in wide range of molecular events such as triggering plant defense, host signalling, regulation of developmental process etc. Here we report the rst comprehensive dataset of TFs expressed during biotic stress in sugarcane and the hemi biotrophic pathogen C. falcatum interactions. Though C. falcatum is a hemi biotrophic pathogen, it always implies successful invasion into host system at biotrophic phase which makes programmed cell death. The transcriptome of sugarcane interaction with C. falcatum attributes to regulation of genes at transcription level which brought new insight that TFs are mediated through signal transduction pathway at different growth stages of sugarcane. Hundreds of TFs were triggered/ expressed during biotic stress caused by C. falcatum in sugarcane. Among them, WRKY, NAC, MYB, AP2-ERF play a major role in their biological process and the transcriptome studies suggested that TFs gets activated both phases (BNS) can be considered as host targets. To the best of our knowledge this is considered to be the rst consolidated report of expressed TFs in a holistic way. This nding unravels new areas of research that allows better understanding of host responses during C. falcatum interaction.  Table -4 Comparative transcription factors screened in sugarcane challenged with C. falcatum using various platforms like transcriptome analysis, SAR priming e cacy and 2D based defense protein analysis    Transcription factor (TF) distribution from the transcriptome of sugarcane interaction with C. falcatum

Figure 5
Page 24/27 Distribution of transcription factor in response to fungal organisms in different host systems List and distribution of different simple sequence repeats identi ed from sugarcane and C. falcatum transcripts Figure 8 qRT-PCR validation of transcription factor during compatible interactions in sugarcane qRT-PCR validation of transcription factor during in compatible interactions in sugarcane

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.