RNA-Seq outline of cotton leaves in response to cotton aphid attack
The damage caused by cotton aphids is rated as the serious at seedling stage during the growth and development of cotton. Therefore, the cotton leaves at the four-leaf stage were selected as the research materials. To excavate the crucial genes at the transcriptional level in cotton defense response to cotton aphid damage, the whole-transcriptome RNA sequencings of cotton leaves at 0 (without aphid damage, CK), 6, 12, 24, 48 and 72 h under sustained cotton aphid attack were performed respectively. After filtering out reads containing adapter or ploy-N, and low-quality sequences, there were 60.05, 58.70, 57.74, 65.73, 59.80, and 60.26 million clean reads with pair-end (Q30> 85.03%, GC%=41.3-42.9%) respectively produced by RNA-Seq in all the six libraries (Supplementary Table S2). For the clean reads according to the single-end, 89.94–90.78% of them were mapped to the cotton reference genome sequences (Supplementary Table S3). The mapped reads were assembled with Cufflinks software, and compared with the original genome annotation information to explore the original unannotated transcription area and excavate the new transcripts and new genes. A total of 79,581 unigenes were identified by filtering out sequences that were comprised of less than 150 bp or only a single exon. To obtain the annotation information of the new genes, the databases were analyzed using a combination of BLAST , NR, GO, KEGG, COG and Swiss-Prot. A total of 9,103 new genes were discovered, of which 7,510 were functionally annotated (Tab. 1). The normalized FPKM was used to quantify the gene expression level . The transcript expression level (FPKM) in this study ranged from 10-2 to 104, which was found to be coherent with most of the RNA-Seq results (Fig. 1A). FPKM box plot analysis demonstrated that the gene transcript expression levels varied in the six RNA-Seq data to some extent (Fig. 1B).
Differentially expressed genes from cotton leaves under cotton aphid attack
As gene expression has time specificity, it is quite significant to study the cotton differentially expressed genes (DEGs) at different time points under cotton aphid attack. Each sample damaged by cotton aphids was compared with the control (0 h) to identify differentially expressed genes (6 h vs. control, 12 h vs. control, 24 h vs. control, 48 h vs. control and 72 h vs. control). Fold change ≥ 2 and FDR < 0.05 were taken as the screening criteria. From these pairwise comparisons drawn with the control sample, we identified 5,790 (2,580 up- and 3,210 down-regulated) , 9,726 (3,452up- and 6,274 down-regulated), 5,279(1,836up- and 3,443 down-regulated), 4,549(2,477 up- and 2,072 down-regulated) and 8,683(2,769 up- and 5,914 down-regulated) DEGs respectively after 6 h, 12h, 24, 48h, and 72h under cotton aphid attack（Fig. 2A, Supplementary Table S4). The differentially expressed genes were analyzed by means of hierarchical clustering, and the genes with the same or similar expression patterns were clustered (Fig. 2B). In addition, it was discovered that the DEGs between cotton aphid attack (48 h) and CK（0 h）is distinct from the other treats. The amount of up-regulated genes was higher than that of down-regulated genes between cotton aphid attack (48 h) and CK（0 h）(Fig. 3A, B).
DEG functional classification
The DEGs between cotton aphid attack (6 h, 12 h, 24 h, 48 h, and 72 h) and CK（0 h）were subjected to GO term enrichment analysis, involving cellular component, molecular function and biological function. The results indicated that plenty of DEGs were associated with biotic or abiotic stress, including response to salt stress (1734), response to chitin (1245), defense response to bacterium (1230), response to cold (1199), response to fungi (1196), response to wounding (1194), response to water deprivation (1349), response to abscisic acid (1292), and response to cadmium ion (1496) (Fig. 4A，Supplementary Table S5). Additionally, it is noteworthy that two genes in the “response to insect (GO: 0009625)” and “defense response to insect (GO: 0002213)” were enriched during the biological process. To carry out a further investigation into the biological functions performed by these DEGs, pathway-based analysis was conducted using KEGG. Totally 126 pathways were identified that were significantly enriched in comparisons of cotton aphid attack (6 h, 12 h, 24 h, 48 h, and 72 h) versus CK (0 h), such as plant hormone signal transduction (378)，starch and sucrose metabolism (244), ribosome (236), carbon metabolism (307), biosynthesis of amino acids (275), pyruvate metabolism (125), carbon fixation in photosynthetic organisms (123), endocytosis (114), glutathione metabolism (119), pyrimidine metabolism (117), pentose and glucoronate interconversions (99), RNA degradation (94), fatty acid metabolism (93), arginine and proline metabolism (98), mRNA surveillance pathway (97), glycerophospholipid metabolism (98), RNA transport (103), cysteine and methionine metabolism (100), glyoxylate and dicarboxylate metabolism (106), phagosome (104), glycine, serine and threonine metabolism (104), peroxisome (103), photosynthesis (191), plant−pathogen interaction (176), Purine metabolism (169), amino sugar and nucleotide sugar metabolism (168), phenylpropanoid biosynthesis (141), oxidative phosphorylation (155), protein processing in endoplasmic reticulum (156), and glycolysis / gluconeogenesis (155) (Fig. 4B). In conclusion, these results suggested that cotton's defense response to the damage of cotton aphids involved multiple genes and metabolic pathways, and cotton plants end up showing a comprehensive defense potential against the attack of cotton aphids.
Differentially expressed transcription factors involved in response to cotton aphid stress
Transcription factors (TFs) are a specific DNA-binding protein that has regulatory functions at transcription level and performs an essential role in plant growth and development. In this study, a total of 945 differentially expressed transcription factors were identified from cotton leaf response to cotton aphids. They were classed into 27 TF families respectively (Supplementary Table S6). A large majority of them encoded the members of the bHLH, MYB, WRKY, ERF, bZIP, TCP and NAC TF families (Fig. 5A). The bHLH family with 199 DEGs was rated as the largest TF family in cotton leaf response to cotton aphid stress. In addition, a total of 152 DEGs were identified to belong to the MYB family. The expression levels for many of the differentially expressed TFs were up-regulated under cotton aphid attack, especially at 48 and 72 h under cotton aphid attack. However, the transcripts of some differentially expressed TFs were down-regulated, indicating that the functions carried out by cotton TFs were complex in cotton response to cotton aphid attack (Fig. 5B).
Differentially expressed aphid-resistant genes involved in response to cotton aphid stress
Aphid damage often triggers the differentially expressed genes related to plant resistance, involving plant morphological resistance, phytoprotective enzyme and SA, JA and ethylene signal pathway. Plenty of cotton aphid resistance genes were identified at first in this study. For example, 39 differentially expressed genes of NBS-LRR family were identified and 10 genes of them were reported at first. They were associated closely with the detection of bacteria, viruses, fungi, nematodes, insects and oomycetes (Supplementary Table S7). 106 DEGs of leucine-rich repeat receptor-like protein kinase were also found and 6 genes of them were novel (Supplementary Table 8). Besides, 10 genes of 140 DEGs related to lectins were regarded as cotton new genes. Compared to the expression levels without cotton aphid attack（0 h), the expression levels of most lectin genes were up-regulated at 6, 12, 24, 48 and 72 h, and only transcription of two lectin genes were down-regulated (Fig. 6A，B, Supplementary Table 9). Additionally, in the functional analyses of differentially expressed genes, we also obtained many new other genes related to cotton aphid resistance, including 3 genes related to callose (Supplementary Table S10), 3 genes related to trichomes (Supplementary Table S11), 3 genes related to waxes (Supplementary Table S12), 7 genes related to ca2+ (Supplementary Table S13), 1 genes related to peroxidase (Supplementary Table S14) and 1 related to Phenylalanine ammonialyase (Supplementary Table S15).
Verification of RNA-Seq data by qPCR
To validate the results of the RNA-Seq, ten random DEGs including five up-regulated and five down-regulated genes were selected to perform qPCR. The selected genes were successfully amplified and the products were of the expected size, indicating the reliability of the assembly work. The qPCR results demonstrated that the relative expression levels of ten selected genes were consistent with the results of RNA-Seq despite the differential expression folds, suggesting that the RNA-Seq data were highly reliable (Fig. 7).