Target gene insertion site analysis
We transformed the insect-resistance genes CRY1C and CRY2A into rice cultivar Minghui63 (MH63), forming three transgenic lines MH63(1C), MH63(2A), and MH63(1C+2A). We then created three transgenic restore lines in the BC4F5 generation, CH891(1C), CH891(2A), and CH891(1C+2A), by backcrossing with the hybrid traditional rice variety CH891. Thus, we performed two independent transgenic events using the same vectors harboring different CRY1C and CRY2A genes to elucidate the random insertion effects of targeted genomic regions (Fig. 1). The CRY1C flanking sequence in CH891(1C) and CH891(1C+2A) was 1276 bp, with 394 bp derived from the T-DNA region and the remaining 832 bp located on rice chromosome 11 in a stretch of the noncoding sequence region (by BLAST alignment). The CRY2A flanking sequence of CH891(2A) and CH891(1C+2A) was 948 bp, of which 356 bp was derived from the T-DNA region, and the remaining 543 bp of the flanking sequence was located on rice chromosome 12 by BLAST alignment. T-DNA was inserted in the 5′-noncoding region of a gene encoding an unknown functional protein (Fig. 2a).
Intended effects analysis
qRT-PCR showed that CRY1C and CRY2A genes were significantly enriched in the three transgenic lines than recurrent parents, and the concentrations of CRY1C and CRY2A genes in the three transgenic lines were significantly higher at the heading stage than at the tillering stage (Fig. 2b-d, Table S2). The content of Cry1C and Cry2A protein in stems of transgenic rice lines at the tillering and heading stages was measured by ELISA. ELISA showed that Cry1C and Cry2A proteins were significantly enriched in the three transgenic lines. A rising trend of the Cry1C protein concentration was found between vegetative growth and reproductive growth (Fig. 2e-g, Table S3). The three transgenic lines CH891(1C), CH891(2A), and CH891(1C+2A) of the rice variety BC5F3, with two transgenic events (CRY1C and CRY2A), exerted stronger resistance to insects than CH891 at the tillering and heading stages (Fig. 2h, j). The rate of deadheart stems was significantly less in the three transgenic lines than in CH891 at the tillering stage, and the whitehead panicles of the three transgenic lines were significantly lower than those of CH891 at the heading stage; thus, the insect-resistance genes CRY1C and CRY2A were responsible for the death of C. suppressalis larvae. Finally, we found that the polymeric strain CH891(1C+2A) had the best expected effect and insect resistance (Fig. 2i, k, Table S4).
Genetic background analysis
To identify the genetic background response rate, we conducted a genetic background response rate analysis using three BC4F5 populations. Genotyping was performed using 512 molecular markers, and the linkage map of SSR fingerprint markers was created (Fig. 3, Table S5). There were 22 polymorphic markers between CH891(1C) and its parents, and the percentage of polymorphic markers was 4.29%. There were 15 polymorphic markers between CH891(2A) and its parents, and the percentage of polymorphic markers was 2.93%. were 19 polymorphic markers between CH891(1C+2A) and its parents, and the percentage of polymorphic markers was 3.71%. The low rate of polymorphic markers indicates that the three transgenic recovered lines had a genetic background similar to their recurrent parents. The response rate of the actual genetic background of CH891(1C), CH891(2A), and CH891(1C+2A) selected was 97.85%, 98.54%, and 98.14%, respectively (Table S6). The actual genetic background response rate was higher than the theoretical genetic background response rate.
Differentially expressed proteome analysis
We analyzed the proteome of the three transgenic lines using LC–MS. Protein were extracted from isolated leaf tissue at heading for LC–MS analysis, where insect resistance is the most significant. In total, 6650 proteins were detected and identified, and 6324 of the proteins were quantified. Correlation cluster heatmap showed positive repeatability of protein data during the overall analysis of the three transgenic lines and recurrent parents (Fig. 4a). Based on our PCA results, the three transgenic lines of two transgenic events showed similar clusters of differentially abundant proteins, whereas recurrent parents were similar to the three transgenic lines. The PCA of CH891(1C) was slightly different from that of the other three lines, which might be because of the relatively low recovery rate of the genetic background (Fig. 4b). The relative standard deviation of protein quantitative values between repeated samples was relatively small, which was consistent with the correlation analysis and PCA results (Fig. 4c). Proteome heatmaps showed distinct patterns of protein abundance in four lines. All quantified proteins were compared between transgenic lines and recurrent parents, and the threshold for differential expression was padj<0.05 and |log2 fold-change|>0. Four clusters corresponding to the four leaf tissue samples were identified for the proteome (Fig. 4d). Overall, the number of differentially expressed proteins (DEPs) between the three transgenic lines and their recurrent parents were within the normal range of gene expression changes (Fig. 4e). Among DEPs, proteins were differentially expressed between CH891(1C) and CH891, of which 239 were upregulated and 119 were downregulated in CH891(1C) (Fig. 4f). Proteins were differentially expressed between CH891(2A) and CH891, of which 168 were upregulated and 131 were downregulated in CH891(2A) (Fig. 4g). We also found that 198 and 198 DEPs were downregulated and upregulated between CH891(1C+2A) and CH891, respectively (Fig. 4h). This study focused on the concentration of different genes in the upper left and upper right corners of the volcano plot, and functional classification and enrichment analyses were performed for these genes.
Cluster analysis of identified proteins
We next performed Clusters of Orthologous Groups (COGs) pathway enrichment analysis of DEPs. We mainly focused on COGs enriched by significant DEPs in the three groups. The vast majority of proteins were positively correlated with transcripts, but some showed partial negative correlation. The COG pathways were mainly enriched in carbohydrate transport and metabolism, coenzyme transport and metabolism, post-translational modification, and signal transduction mechanisms. We also considered some unknown pathways (Fig. 5a, Table S7). Some genes enriched in the COG pathways were also significantly enriched in the KEGG pathways. The KEGG enrichment pathways mainly included glycine, serine, and threonine metabolism and cyanoamino acid metabolism (Fig. 5b, Table S8). Among the KEGG pathways significantly enriched in upregulation and downregulation of DEPs were the glycine, serine, and threonine metabolism and cyanoamino acid metabolism pathways, and both showed enrichment of the three transgenic lines and recurrent parents (Fig. 5c). DEPs in the glycine, serine, and threonine metabolism pathway were significantly upregulated in CH891(1C), CH891(2A), and CH891, and significantly downregulated in CH891(1C+2A) and CH891 (Fig. 5d). Most DEPs in the glycine, serine, and threonine metabolism pathway were significantly upregulated in three clusters (Fig. 5e). We identified several enzymes among the DEPs associated with the conversion reactions of glycine, serine, and threonine. Most of these enzymes were downregulated. Information on these key enzymes include enzyme symbol, EC number, and fold change. They included zinc finger matrin-type protein (ZMAT, 2.06, 1.59, 1.32), ubiquitin family protein (UBE, 4.12, 2.33, 4.18), EF-hand domain-containing protein (EFHC, 1.53, 1.56, 1.95), mitogen-activated protein kinase (MAPK, EC:2.7.11.25, 3.28, 2.40, 3.73), rhomboid-like protein (RHBD, 1.76, 2.11, 1.63), glutathione synthetase (GSS, EC:6.3.2.3, 2.45, 0.71, 2.75), C-factor (predicted) (CF, 3.69, 2.98, 1.47), and UDP-glucose 6-dehydrogenase (UG6D, EC:1.1.1.22, 3.61, 0.96, 4.99). Also upregulated were enzymes of the cyanoamino acid metabolism pathway, such as UBX domain-containing protein (PUX, 0.64, 0.75, 0.91), protease Do-like 5, chloroplast precursor (PDI, 0.64, 0.66, 0.76), plant intracellular Ras-group-related LRR protein (PIRL, 0.62, 1.01, 0.60), zeta-carotene desaturase (ZDS, 0.63, 0.85, 0.87), tyrosine-protein phosphatase (PTP, EC:3.1.3.48, 0.49, 0.58, 0.60), serine hydroxymethyltransferase (SHMT, EC:2.1.2.1, 0.54, 0.59, 0.65), nicotinate phosphoribosyltransferase (NAPRT, EC2.4.2.11, 0.51, 0.69, 0.42), and hydroxy acid dehydrogenase (PGM, EC:5.4.2.2, 0.51, 0.99, 0.74). These DEPs were significantly enriched in some basic biologically conserved COG and KEGG pathways, and belonged to constitutively expressed proteins (Table 1). Under the two insertion methods of CRY1C and CRY2A genes, the upregulation/downregulation trends of these DEPs were the same between the three transgenic lines and recurrent parents, and the changes at the proteome level were universal and representative, and could be used for subsequent evaluation of unexpected effects.
PRM validation
For proteome data validation, we selected genes encoding fifteen DEPs (MAPK, UBE, EFHC, SHMT, RHBD, GSS, CF, UG6D, PUX, PDI, PIRL, ZDS, PTP, NAPRT, and PGM) from the proteome data screened by cluster analysis. We performed PRM analysis on peptides representing these fifteen DEPs that were successfully quantified in the proteomic work (Table 5). Compared with the recurrent parents, ZMAT, UBE, EFHC, MAPK, RHBD, GSS, CF, and UG6D were significantly upregulated, and PUX, PDI, PIRL, ZDS, PTP, SHMT, NAPRT, and PGM were significantly lower in the three transgenic lines. On the other hand, except for ZDS, no significant difference was observed in DEPs among the three transgenic lines. The PRM results also correlated well with the proteomics data (Fig. 6, Table 2). Therefore, the proteome data were reasonable and reliable.
Grain yield and quality performance
CH891(1C), CH891(2A), and CH891(1C+2A) restore lines were planted in the experimental plots to evaluate agronomic traits (Table S9). Yield traits included panicles per plant, grains per panicle, 1000-grain weight, and yield per plant. Compared with the recurrent line CH891, CH891(1C), CH891(2A), and CH891(1C+2A) showed no significant difference (Fig. 7a-f). However, significant differences were found in panicle length and seed set rate between CH891(1C) and CH891. With the exception of gel consistency, significant differences were observed in other quality traits. Significant differences were found in brown rice rate, head rice rate, and chalkiness degree between CH891(1C) and CH891, while significant differences were observed in grain length to width ratio and amylose content between CH891(2A) and CH891. Significant differences were observed in amylose content between CH891(1C+2A) and CH891 (Fig. 7g-l). Previous experiments showed that no significant difference was observed in DEPs among protein groups. Therefore, the difference in grain yield and quality traits among different varieties was not caused by the change in protein composition affected by the insertion of target genes. However, the difference might be determined by the genetic background of the remaining donors in the backcross. In addition, we compared the relationship between donor chromosome fragments and agronomic traits. CH891(1C) had a polymorphism at the RM1026 locus on chromosome 9, and the qPL9 gene related to panicle length was reported near the band locus. CH891(1C) had a polymorphism at the RM538 locus on chromosome 5, and the PTB1 gene related to seed set rate was reported near the band locus. CH891(1C) had a polymorphism at the RM17804 locus on chromosome 5, and the Chalk5 gene related to head rice rate and chalkiness degree was reported near the band locus. CH891(2A) had polymorphisms at RM289 and RM1364 loci on chromosomes 5 and 7, respectively, and the GL5, GW5, GL7 genes related to grain length to width ratio was reported near the band locus. CH891(2A) and CH891(1C+2A) had a polymorphism at the RM190 locus on chromosome 6, and the Wx gene related to amylose content was reported near the band locus (Table 3).