3.1. Metabolome profiling of Alsophila
After collecting fresh leaves from G. metteniana var. subglabra and A. spinulosa (Fig. 1, A), meta-bolomics was used to analyze the metabolites. To improve the accuracy of analysis, each group designed three replicates. We identified 373 and 399 metabolites in positive and negative modes, respectively (Table S1). In A. spinulosa, the top 10 metabolites with highest metabolite abundance were "2-Caffeoyl-L-tartaric acid", "Isocitric Acid", "Kaempferol-3-O-glucoside-7-O-rhamnoside", "Kaempferol-3-O-neohesperidoside", "Skimmin ", "Luteolin-6-C-glucoside ", "Demethyl coniferin", "γ-Linolenic Acid", "Methylmalonic acid", and "Aromadendrin-7-O-glucoside". In G. metteniana var. subglabra, the top 10 metabolites with highest metabolite abundance were "Skimmin", "Isocitric Acid", "6-Methylmercaptopurine", "γ-Linolenic Acid", "α-Linolenic Acid", "Spermine", "Histidinol", "N-Benzylmethylene isomethylamine", "L-Glutamine" and "L-Lysine" (Fig. 1, B). The quantity of differential accumulated metabolites (DAMs) was statistically analyzed (Table S2). Clustering heat maps of DAMs showed that significant differences in 111 metabolites were found. Of these, 82 DAMs were up-regulated and 29 metabolites were dramatically down-regulated (Fig. 1, C). To verify the function of the DAMs, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment was conducted. The main metabolic pathways involved include “valine, leucine and isoleucine degradation”, “pyruvate metabolism, 2-Oxocarboxylic acid metabolism”, “valine, leucine and isoleucine biosynthesis”, “citrate cycle (TCA cycle)”, “biosynthesis of amino acids”, “biosynthesis of secondary metabolites”, “aminoacyl-tRNA biosynthesis”, “glucosinolate biosynthesis”, “glyoxylate and dicarboxylate metabolism”, “ABC transporters”, and so on (Fig. 1, D). Notably, the pathway with the most DAF enrichment pathways was the biosynthesis of secondary metabolites, which had 21 DAFs.
3.2. Transcriptomics analysis of Alsophila
Six cDNA libraries from fresh leaves of G. metteniana var. subglabra (H) and A. spinulosa (S) were constructed. This generated 231.57 Mb of clean data, of which 38.60 Mb was obtained for each sample (Q30 ≥ 92.6%). The sequencing raw data has been deposited into the Short Reads Archive (SRA) database under the accession number SRR24032773, SRR24032774, SRR24032775, SRR24033065, SRR24033066, and SRR24033067. Sequence comparison between the reads of each variety and the reference genome indicated a similarity level of 67.54-89.30%. The mRNA abundance of each gene in each sample was profiled as fragment per kilo bases per million reads (FPKM). A total of 29,060 genes had an FPKM expression value above 1.0 in at least one sample (Fig. S1). The hierarchical cluster dendrogram showed the expression pattern of biological repeats be clustered together (Fig. S2). Furthermore, a total of 14,639 DEGs were identified in the H_vs_S comparison groups, with 7,764 upregulated genes and 6,875 downregulated genes (Fig. 2, A). Regarding 5 of the most-differentially upregulated genes, three genes had defined functions: indole-3-pyruvate monooxygenase YUCCA1 (YUCCA1), pleiotropic drug resistance protein 1 (PDR1), and cholesterol 22-hydroxylase CYP90B27 (CYP90B27). Correspondingly, all the 5 most-differentially down-regulated genes had defined functions: 14 kDa zinc-binding protein (ZBP14), expansin-A10 (EXPA10), protein MEN-8 (MEN-8), methyl jasmonate esterase 1 (MJE1), and mannose-specific lectin (dfa). In addition, results of KEGG analysis indicated that 38 pathways were enriched (Fig. 2, B, Table S3). Of which, DAMs were also significantly enriched in “pantothenate and CoA biosynthesis”, “stilbenoid, diarylheptanoid and gingerol biosynthesis”, “valine, leucine and isoleucine biosynthesis”. GO enrichment analysis revealed that the top 20 most significant GO categories were "regulation of transcription, DNA-templated", "protein phosphorylation", "carbohydrate metabolic process", "defense response", "transmembrane transport", "membrane", "transmembrane transporter activity", "copper ion binding", "iron ion binding", "ADP binding", "sequence-specific DNA binding", "ATPase activity", "monooxygenase activity", "protein kinase activity", "microtubule binding", "carbohydrate binding", "protein serine/threonine kinase activity", "DNA binding", "DNA-binding transcription factor activity", "oxidoreductase activity, acting on paired donors, with incorporation or reduction of molecular" (Fig. 2, C).
3.3. Identification of genes involved in disease resistance
Most of the proteins encoded by the characterized RGAs, including nucleotide-binding site (NBS)-containing proteins, receptor-like protein kinases (RLKs), and receptor-like proteins (RLPs), contain conserved domains, such as NBS, leucine-rich repeat (LRR), and Toll/interleukin-1 receptor (TIR) [29]. Using a genome-wide scanning pipeline [25], a total of 1,290 RGAs were identified in the A. spinulosa genome (Table S4). 62.2% (803) genes of the RGAs belonged to the RLK category, and there were 110 NBS-related RGAs, of which 25 were of the TIR type. 839 RGAs were found to be expressed in leaf tissue (Fig. S3). Of which, there were 606 RGAs overlapped with DEGs, including 426 RLK-encoding RGAs (Table 2). Ten of the most-differentially genes were as follows: disease resistance protein L6 (L6), receptor-like protein kinase At1g49730 (At1g49730), disease resistance protein RUN1 (RUN1), leucine-rich repeat receptor-like serine/threonine-protein kinase BAM1 (BAM1), leucine-rich repeat receptor-like protein kinase At5g63930 (At5g63930), salt tolerance receptor-like cytoplasmic kinase 1 (OsI_16820), G-type lectin S-receptor-like serine/threonine-protein kinase SD2-5 (SD25), wall-associated receptor kinase 2 (WAK2), disease resistance protein RPS2 (RPS2) and L-type lectin-domain containing receptor kinase VIII.1 (LECRK81). Notably, of the differentially expressed 606 RGAs, 551 genes could be further supported by known resistance genes reference from the latest PRGdb [30]. In KEGG enrichment analysis, these genes enriched in “Plant-pathogen interaction” and “MAPK signaling pathway - plant” pathway. In the GO enrichment analysis, these genes were assigned to 16 GO terms, such as "protein kinase activity," "ATP binding," "protein phosphorylation," "defense response," "protein binding," and more (Table S5). Gene network plays an important role in various organisms and systems, which effectively help us reveal the essential rules of a large number of biological processes and reactions in organisms. Using these 551 genes’ protein sequences as queries, 250 interactions were identified by STRING database (Table S6). The first four key genes that calculated by sytoHubba with MCC algorithm were BAM1, inactive leucine-rich repeat receptor-like protein kinase At5g48380 (BIR1), receptor-like protein kinase BRI1-like 3 (BRL3), and leucine-rich repeat receptor-like serine/threonine/tyrosine-protein kinase SOBIR1 (SOBIR1) (Fig. 3, A). The maximum-likelihood phylogenetic tree showed that the closest relative genes of BAM1 were LRR receptor-like serine/threonine-protein kinase FLS2 (FLS2) and receptor-like kinase TMK (TMK) (Fig. 3, B). PPI network also revealed that BAM1 had interaction with FLS2 and TMK (Table S6). We inferred these genes may play essential role in the molecular mechanism of insect resistance in Alsophila.
3.4. Correlation between DEGs and DAMs associated with R genes
The expression levels of 14,097 DEGs were significantly correlated with the abundance of 111 DAMs. Further analysis showed that 585 significantly differentially expressed RGAs were significantly positively correlated with 111 DAMs, while 574 significantly differentially expressed RGAs were significantly negatively correlated with 111 DAMS (Table S7, S8). Subsequently, interaction networks were constructed for significantly positively correlated RGAs between DAMs, and significantly negatively RGAs between DAMs, respectively. According to the results of cytoHubba, in the significantly positively correlated network, the top 10 core genes are probably inactive leucine-rich repeat receptor-like protein kinase At5g48380 (BIR1), receptor-like protein kinase BRI1-like 3 (BRL3), BAM1, phytosulfokine receptor 1 (PSKR1), leucine-rich repeat protein kinase family protein (AT1G27190), leucine-rich receptor-like protein kinase family protein (At5g46330), SOBIR1, lysm-containing receptor-like kinase 1 (LYK1,also named as CERK1), leucine-rich repeat transmembrane protein kinase family protein (AT1G68400), and Leucine-rich repeat protein kinase family protein (AT1G67510). The first 10 core metabolites are N-Benzylmethylene isomethylamine, 1-O-p-Coumaroylquinic acid, Neochlorogenic acid (5-O-Caffeoylquinic acid), Ferulic acid-4-O-glucoside, Scopoletin-7-O-glucuronide, 1-O-Feruloylquinic acid, 3-O-Feruloylquinic acid, Luteolin-4'-O-glucoside, Kaempferol-7-O-glucoside, and Kaempferol-4'-O-glucoside. In the significantly negatively correlated network, the top 10 genes in the core are BIR1, BRL3, BAM1, At5g46330, AT1G27190, SOBIR1, Leucine-rich receptor-like protein kinase family protein (PSY1R), STRUBBELIG-receptor family 8 (SRF8), LRR receptor-like serine/threonine-protein kinase HSL2 (HSL2), and PSKR1. The first 10 core metabolites are Demethyl coniferin, Luteolin-8-C-glucoside (Orientin), Aromadendrin-7-O-glucoside, 4-O-(6'-O-Glucosylcaffeoyl)-3,4-dihydroxybenzoic acid, LysoPC 20:5, Procyanidin B2, Kaempferol-3-O-glucoside-7-O-rhamnoside, Kaempferol-3-O-neohesperidoside, Quercetin-7-O-rutinoside, and Cinnamtannin B1 (Fig. 4A, Fig. 4B).
3.5. Verification of RNA-Seq gene expression
Real-time quantitative PCR (qRT-PCR) was used to validate the results and expression profiles of the genes identified via Illumina sequencing analysis. We selected 10 DEGs, all of which were RGAs, including wall-associated receptor kinase-like 1, leucine-rich repeat receptor-like serine/threonine-protein kinase BAM2, FLS2, and others (Table S9, S10). All 10 RGAs displayed the same expression pattern, with the pearson correlation coefficient of fold change between the PCR experiment and RNAseq being 0.78 (Fig. S4). These results verify the reliability and accuracy of our transcriptome analysis.