Transcriptomic Profiling of Cotton Leaves in Response to Cotton Aphid Attack

Main conclusion The molecular mechanism of the interaction between cotton and cotton aphids remains unclear currently. The RNA-Seq study of cotton leaves was performed in response to cotton aphid damage at different time points. The transcriptome analysis revealed that a lot of cotton gene transcripts were regulated by cotton aphid damage. Cotton aphids (Aphis gossypii Glover) are regarded as one of the most harmful insect pests for cotton production. They are usually capable of causing severe yield loss through sucking cotton liquids, secreting honeydews and transmitting plant viral diseases. However, the molecular mechanism of the interaction between cotton and cotton aphids remains unclear currently. Therefore, the RNA-Seq study of cotton leaves was performed in response to cotton aphid damage at different time points (0 h,6 h,12 h, 24 h, 48 h and 72 h). A total of 9, 103 new genes were identified, and 7, 510 of them were annotated functionally. Based on the comparison results, the gene expression was analyzed according to the expression amount of genes in different samples. 24,793 differentially expressed genes were authenticate in all and their functional annotation and enrichment analysis were conducted. Compared with 0 h (without aphid damage, CK), the amount of down-regulated DEGs was largely more than that of the up-regulated genes at different time points under cotton aphid attack except for 48h. As revealed by the functional annotation of DEGs, these genes were involved in all kinds of plant biological process, including various resistance to abiotic and biotic stress, hormone metabolism, signaling transduction and transcriptional regulation. These results established a firm foundation for the study of the molecular mechanism of the interaction between cotton and cotton aphids and would facilitate the development of plant aphid resistant cultivars. ribosome carbon metabolism (307), biosynthesis of amino acids pyruvate metabolism (125), carbon fixation in photosynthetic organisms (123), endocytosis (114), glutathione metabolism pyrimidine metabolism (117), pentose and glucoronate interconversions RNA degradation fatty acid metabolism (93), arginine and proline metabolism (98), mRNA surveillance pathway


Background
Aphids are one of the common pests on almost all crops. They feed plant phloem sap with their piercing-sucking mouthparts, which can have a direct impact on the normal growth and development of crops and eventually cause a severe loss of crop yield and quality. Compared with chewing insects, aphid feeding only causes relatively little mechanical injury to crops. However, the everlasting feeding of a large quantity of aphids usually consume a large amount of plant resources. Secondly, a lot of 3 aphid species have the potential to spread plant viruses. For them, the detrimental effects caused by virus transmission are likely to outweigh the direct effects of aphid feeding. In addition, Honeydew secreted by aphids also gives rise to sooty mould on the leave surface to constrain leaf photosynthesis [1,2]. To counter these damages, plants have developed a series of intricated mechanisms to combat against aphids. Plenty of studies have indicated that various changes occur in plants damaged by aphids, such as protein phosphorylation, calcium flux, reactive oxygen species (ROS) generation and phytohormone changes, which result in relevant transcriptional regulation in the early response to aphids. Finally, plants are capable of producing various defense compounds, including nutrient compounds, glutathione S transferases (GSTs), peroxidases, and secondary metabolites for the sake of self defense [3]. Overall, plant susceptibility or resistance to aphids is reliant on their ability to recognize aphid feeding and rapidly initiate defense response [4].
Plant resistance is considered as one of the most effective ways of achieving plant health management at the present time and in the near future [5]. Therefore, constantly exploring new aphid resistant genes has been a long-standing hot topic for scientists. Mi-1.2 from the tomato was discovered to contribute to resistance to Macrosiphum euphorbiae and Bemisia tabaci [6,7]. The melon gene Vat was reported to safeguard against virus aphid transmission and Aphis gossypii Glover [8]. Both of them are the earlier known aphid resistance genes and fall into NBS-LRR family, indicating that the NBS-LRR family members perform a significant part in plant aphid resistance. Lectin was also proved to perform resistance to aphids. For instance, The Galanthus nivalis agglutinin (GNA) gene encoding a monocot mannose-binding is extensively and clearly documented to confer resistance to Myzus persicae [9]. The Amaranthus caudatus agglutinin (ACA) gene confers resistance to Aphis gossypii and Myzus persicae [10,11]. In addition, the other aphid resistant genes, including Rag, chitinase, ɑ-amylase and on the likes, are researched actively and widely [12].
RNA-Seq is a robust technology for analyzing transcriptome and excavating new functional genes.
Therefore, it is taken advantage of for many fields including plant defense response to phytophagous insects [13]. For instance, the changes in gene expression in the whole genome of the cucumber aphid resistant cultivar 'EP6392' were monitored using an Illumina Genome Analyzer platform [14]. Furthermore, a comprehensive insect resistance response mechanism in cotton infested by the phloem feeding insect Bemisia tabaci was discovered through Transcriptome analysis [15]. In addition, the transcriptome analyses of Gossypium hirsutum at 24 h and 48 h in response to aphid and whitefly were conducted comparatively [16]. However, compared to other research areas, RNA-Seq analyses of plant defense response to phytophagous insects have been relatively sporadic up to now.
As an essential economic crop, the recent research on cotton transcriptome characterization is focused on comparative transcriptomic analysis at different growth stages and under all sorts of stresses. For instance, many new genes of cotton fiber were extensively analyzed from different angles using RNA-Seq method [17][18][19][20][21]. Furthermore, cotton RNA-Seq data were also analyzed under various abiotic stress, such as salt stress [22][23][24][25], dry stress [26-27] and high temperature stress [28]. Cotton is also attacked by all kinds of biotic factors, such as Cotton Verticillium wilt. As one of the most destructive cotton diseases, the study on Cotton Verticillium wilt using RNA-Seq is concerned widely. For example, RNA-Seq transcriptional analysis and histo-chemistry have illustrated that lignin metabolism plays an important role in the cotton resistance to Verticillium dahlia [29]. The gene families of resistance gene analogues in cotton and their response to Verticillium wilt is also studied [30]. The analysis of sea-island cotton and upland cotton in response to the damage of Verticillium dahliaewas was conducted using RNA-Seq [31]. However, RNA-Seq has rarely been applied to research the interaction between cotton and herbivorous insects.
It is known that using cotton aphid resistance for aphid control is both most effective and eco-friendly.
Therefore, the analysis of the transcriptome dynamics of cotton defense response to cotton aphid damage was conducted using the RNA-Seq method for a better understanding of cotton aphid resistance mechanism. Our research will lay a solid foundation for further research into the molecular mechanism of the interaction between cotton and aphids, searching for new cotton aphid resistant genes and developing new cotton aphid resistant varieties.

5
Results RNA-Seq outline of cotton leaves in response to cotton aphid attack The damage caused by cotton aphids is rated as the serious at seedling stage during the growth and development of cotton. Therefore, the cotton leaves at the four-leaf stage were selected as the research material. To excavate the crucial genes at the transcriptional level in cotton defense response to cotton aphid damage, the whole-transcriptome RNA sequencings of cotton leaves at 0 (without aphid damage, CK), 6 Table S3). The mapped reads were spliced with Cufflinks software, and compared with the original genome annotation information to explore the original unannotated transcription area and excavate the new transcripts and new genes. A total of 79,581 unigenes were identified by filtering out sequences that were comprised of less than 50 amino acid residues or only a single exon. To obtain the annotation information of the new genes, the databases was analyzed using a combination of BLAST [32], NR, GO, KEGG, COG and Swiss-Prot. A total of 9,103 new genes were discovered, of which 7,510 were functionally annotated (Tab. 1). The normalized FPKM was used to quantify the gene expression level [33]. The transcript expression level (FPKM) in this study ranged from 10 -2 to 10 4 , which was found to be coherent with most of the RNA-Seq results (Fig. 1A). FPKM boxplot analysis demonstrated that the gene transcript expression levels varied in the six RNA-Seq data to some extent (Fig. 1B).
Differentially expressed genes from cotton leaves under cotton aphid attack As gene expression has time specificity, it is quite significant to study the cotton differentially expressed genes (DEGs) at different stages under cotton aphid attack. Each sample damaged by cotton aphids was compared with the control (0 h) to identify differentially expressed genes (6 h vs.  Table S4). The differentially expressed genes were analyzed by means of hierarchical clustering, and the genes with the same or similar expression patterns were clustered (Fig. 2B). In addition, it was discovered that the DEGs between cotton aphid attack (48 h) and CK 0 h is distinct from the other treats. The amount of up-regulated genes was higher than that of down-regulated genes between cotton aphid attack (48 h) and CK 0 h  Supplementary Table S5). Additionally, it is noteworthy that two genes in the "response to insect (GO: 0009625)" and "defense response to insect (GO: 0002213)" was enriched during the biological process. To carry out a further investigation into the biological functions performed by these DEGs, pathway-based analysis was conducted using KEGG. Totally 126 pathways were identified that were  Table   S6). A large majority of them encoded the members of the bHLH, MYB, WRKY, ERF, bZIP, TCP and NAC TF families (Fig. 5A). The bHLH family with 199 DEGs was rated as the largest TF family in cotton leaf response to cotton aphid stresses. In addition, a total of 152 DEGs were identified to belong to the MYB family. The expression levels for many of the differentially expressed TFs were up-regulated under cotton aphid attack, especially at 48 and 72 h under cotton aphid attack. However, the transcripts of some differentially expressed TFs were down-regulated, indicating that the functions carried out by cotton TF were complex in cotton response to cotton aphid attack (Fig. 5B).
Differentially expressed aphid-resistant genes involved in response to cotton aphid stress Aphid damage often triggers the differentially expressed genes related to plant resistance, involving plant morphological resistance, phytoprotective enzyme and SA, JA and ethylene signal pathway.
Plenty of plant aphid resistance genes were identified at first in this study. For example, a total of 422 LRR genes were identified at first, of which 19 belonged to NBS-LRR proteins associated with the detection of bacteria, viruses, fungi, nematodes, insects and oomycetes. Besides, 90 Leucine-rich repeat receptor-like protein kinase were founded (Supplementary Table S7). 140 lectin genes were discovered in our RNA-Seq data and 10 genes of them were regarded as cotton new genes. Compared to the expression levels without cotton aphid attack 0 h), the expression levels of most lectin genes 8 were up-regulated at 6, 12, 24, 48 and 72 h, and only transcription of two lectin gene were downregulated (Fig. 6A B). There were some other aphid resistant genes involved in this study, including 72 form aphid resistant genes related to callose and trichomes, as well as 81 plant protective enzyme gene relative to peroxidase phenylalanine ammonialyase and polyphenol oxidase.
Verification of RNA-Seq data by qPCR To validate the results of the RNA-Seq, ten random DEGs including five up-regulated and five downregulated genes were selected to perform qPCR. The selected genes were successfully amplified and the products were of the expected size, indicating the reliability of the assembly work. The qPCR results demonstrated that the relative expression levels of ten selected genes were consistent with the results of RNA-Seq despite the differential expression folds, suggesting that the RNA-Seq data were highly reliable (Fig. 7).

Discussion
RNA-Seq presents an effective way for the identification of DEGs and their regulatory mechanisms at the transcriptome level [34]. It could provide new insights into the molecular basis for plant response to all kinds of abiotic and biotic stresses [35][36][37].
RNA-Seq in cotton response to cotton aphid damage Aphids are the largest group of the phloem-feeding insects and often cause severe economic losses in crop production. However, the mechanism for the interplay between plant and aphids remains unclear at the moment [1]. Therefore, there is great significance for understanding their interaction mechanism to study the cotton response to cotton aphid damage using RNA-Seq method. Dubey et al.
(2013) carried out the comparative transcriptome analysis of Gossypium hirsutum in response to aphids and whiteflies [38]. However, only the transcriptomes of cotton infested with aphids and whiteflies for 2 h and 24 h were sequenced and analyzed. In this study, the RNA-Seq data of cotton leaves at 0, 6, 12, 24, 48, and 72 h under the attack of cotton aphids was analyzed. The sequencing results suggested that a total of 9, 103 new genes were discovered and 7, 510 were annotated 9 functionally at different time points. Based on the comparison results, the gene expression was analyzed in accordance with the expression amount of genes in different samples. 24,793 DEGs were authenticated in all and their functional annotation and enrichment analysis were performed. The number of down-regulated DEGs was found out to be largely higher than that of the upregulated genes at different time under cotton aphid attack, which was in line with the previous studies that infestations of whiteflies and aphids drove transcriptional suppression over induction [39,38].
However, the number of down-regulated DEGs was slightly smaller than that of the upregulated genes at 48 h under cotton aphid attack in this study. Functional annotation of DEGs led to a finding that these genes were involved in a lot of plant biological processes, including all sort of resistance to abiotic and biotic stress, hormone metabolism, signaling transduction and transcriptional regulation.
However, there were only two genes in the "response to insect (GO: 0009625)" and the "defense response to insect (GO: 0002213)" was enriched in biological process, which suggested that study on plant response to insects should be enforced.

Plant aphid resistant genes
Plant defenses against aphids are initiated at various levels of their interplay with aphids [40]. Firstly, plant surface is the first barrier of aphid feeding and activities, such as thorn and glandular trichomes on the surface of plant [41][42][43]. In our study, 12 DEGs (Protein WAX2 or Protein WAX2 -like protein) were identified, and these DEG expressed levels varied at 6, 12, 24, 48 and 72h after cotton aphid attack in comparison with the CK Control (0 h). Trichomes on the surface of plant could also exert impact on aphid activities by hindering aphid movement. Furthermore, glandular trichomes are usually an origin of sugar esters and secondary metabolites that are harmful to insects [44,45]. Some glandular trichomes even have the potential to release the aphid alarm pheromone, such as (E)-βfarnesene [46]. Our data analysis showed a total of 51 differentially expressed genes on trichomes were identified and 3 DEGs of them were reported at first, suggesting that trichomes were crucial to plant aphid resistance. In addition, plants inhibit the afflux of phloem inclusion into the aphid by facilitating sieving element occlusion (SEO). SEO includes two processes: a rapid formation of proteinaceous plugs that transiently seal sieve plates and a slower callose deposition that causes long-term occlusion of sieve tubes [47]. A total of 20 DEGs on callose deposition were also identified in this study. Three of them were reported at first. These results indicated that callose was involved in cotton response to cotton aphid attack.
Some loci with specific aphid resistance have been stated in many plants, but only the Mi-1.2 gene from tomato and the Vat gene from melon have been cloned. They contribute to resistance against Macrosiphum euphorbiae and Aphis gossypii respectively [6,48,49]. They belong to coiled-coil nucleotide-binding site -leucine-rich repeat type proteins (CC-NBS-LRR). In our study, a total of 422 LRR genes were identified, 19 of which were classed into NBS-LRR proteins. Besides, 90 Leucine-rich repeat receptor-like protein kinase were identified. The carbohydrate-binding protein lectins exist in many plants and perform important functions in guarding plants against insect pests. At the moment, many lectin genes from plant have been transferred to plants to improve resistance to many phloem insects [50]. The agglutinin gene is the most widely studied for plant resistance to aphid as the carbohydrate-binding proteins currently. 140 lectin genes were discovered in our RNA-Seq data and Chrysanthemum by facilitating lignin production [51]. Similarly, three MYB genes were related to the wheat defense against English grain aphid [52]. In addition, overexpression of GsMYB15 from the wild Soybean R2R3 enhanced resistance to Helicoverpa Armigera in Arabidopsis [53]. However, myb102 from Arabidopsis increased plant susceptibility to aphids by substantial activation of ethylene biosynthesis [54]. A total of 152 DEGs belonging to the MYB family were identified in this study. The functions of them need to be further verified through relative experiments. Some members of the 11 WRKY family are reported for aphid resistance. For instance, the overexpression of a chrysanthemum WRKY transcription factor enhanced aphid resistance [55]. However, AtWRKY22 from Arabidopsis increased Arabidopsis susceptibility to aphids and modulated SA and JA signaling [56]. In addition, the rice WRKY53 suppressed herbivore-induced defenses by acting as a negative feedback modulator of map kinase activity [57]. Ca 2+ functions as a secondary messenger in eukaryotes. It is universally thought that Ca 2+ could influence callose deposition, promote phloem protein aggregation and result in phloem occusion [62][63][64]. 253 DEGs on Ca 2+ were identified in this study, and 93 DEGs were associated with inorganic ion transport and metabolism, suggesting that ca 2+ was essential in cotton response to cotton aphids.
PAD4 is obligatory for restricting insect feeding from the sieve elements and hindering callose deposition [65,66]. Furthermore, PAD4 facilitates premature leaf senescence in aphid-damaged leaves, which is harmful for the insect in term of leaf nutrition [67]. PAD4 is relative to plant defense reponse to pathogens. However, PAD4 functions in plant defense to aphids are distinct from their ones in defense to pathogens [68][69][70].

Conclusions
In the present study, RNA-Seq was applied for detection of the global transcriptional changes in seedling cotton leaves in response to cotton aphid attack. A total of 9, 103 new genes were identified and 7, 510 were annotated functionally. Crucially, 24,793 DEGs were commonly identified in response to cotton aphids, suggesting that there were plenty of common and unique molecular mechanisms in relation to the response to cotton aphid stresses in cotton. This study extends the understanding of the molecular mechanisms for cotton leaf resistance to cotton aphids in the seedling stage and facilitates the development of plant aphid resistant cultivars.

Plant materials
Red-leaf cotton (Gossypium hirsutum), an aphid-resistant cotton cultivar, was provided by the Institute of Cotton from Chinese Academy of Agricultural Sciences. At first its seeds were sown directly in 30-cm-diameter plastic pots filled with nutrient soil until the seedling grew to 4 leaves in a growth room (25℃, 16 h light/8 h dark). The leaves were collected respectively without cotton aphid damage (0 h, CK) and at 6, 12, 24, 48 and 72 h under the attack of cotton aphids (Aphis gossypii), transferred quickly into liquid nitrogen for 1 min and stored in the ultra-low temperature freezer.

Library construction and sequencing
Total RNA from each sample was isolated using Trizol method. Transcriptome assembly Raw reads were pre-processed for quality filtering where reads including adapter or ploy-N, and low 13 quality reads were cleaned. According to the reference genome, the mapped reads were spliced using Cufflinks software. The original unannotated transcription area was searched, and new transcripts and new genes were excavated in order to supplement and perfect the original genome annotation information. The sequences that are less than 50 amino acid residues or that contain only a single exon were filtered out.
Differentially expressed gene analysis Gene FPKMs (fragments per kilo-base of exon per million fragments) were calculated by summing the FPKMs of transcripts in each gene group. FPKMs of coding genes in each sample were computed using StringTie (1.3.1). EBSeq was employed for difference analysis [71]. During the course of differentially expressed gene detection, Fold Change ≥2 and False Discovery Rate FDR < 0.05 served as the screening criteria. Fold Change represents the ratio of expression quantity between two treats. FDR is determined through the p-value correction of different significance.

Functional annotations of differentially expressed genes
The functions of differentially expressed genes were annotated with NR [72], swiss-prot [73], GO [74], COG [75] and KEGG [76]. In order to obtain the annotation information about the new genes, the reference Gossypium hirsutum genome and the annotation files were downloaded from the CottonGen database (http://www.cottongen.org).

Quantitative real-time PCR
To ascertain the expression level of ten genes, quantitative real-time PCR (qPCR) was carried out based on the manufacturer's guides for the Bio-Rad CFX 96 Real-Time Detection System (Bio-Rad, Hercules, CA, USA) and the SYBR premix ex Taq II system (Takara perfect real time). Cotton leaves were firstly collected without cotton aphid damage (0 h, CK) and at 6, 12, 24, 48, and 72 h under cotton aphid attack at the four-leaf stage. Total RNA was isolated from each sample and reversely transcribed into cDNA as qPCR template. The expression profiles of the relative genes were examined through qPCR method. All reactions were conducted in triplicate, and controls were included. The 2 -△△Ct method was applied to the calculation of relative gene expression values [77]. The prime 14 sequence was presented in Supplementary Table S1.

Supplementary Files
This is a list of supplementary files associated with this preprint. Click to download.