To identify the molecular events occurring in different tissues of GM and non-GM plants, 28 RNA-seq libraries were constructed using RNA from ten pooled RNA samples. After Illumina sequencing and the removal of adaptors and bad-quality reads, approximately 4,216,226 to 14,932,616 reads were obtained for the 28 libraries (Table 1). Clean reads were then mapped to the soybean reference genome, with the mapped ratio ranging from 43.13% to 89.27%. Among the mapped reads, the frequency of unigenes ranged from 41.21% to 87.95% (Table 1).
Table 1
Summary of sequence information for the 28 DEG libraries.
Sample
name
|
Total reads
|
Total
mapped
|
Multiply mapped
|
Uniquely
mapped
|
Non-spliced reads
|
Spliced
reads
|
ZH13_G
|
4216226
|
2671012 (63.35%)
|
95558
(2.27%)
|
2575454
(61.08%)
|
2024284 (48.01%)
|
551170 (13.07%)
|
ZH13_C
|
9858652
|
8797768 (89.24%)
|
180882 (1.83%)
|
8616886
(87.4%)
|
5747294 (58.3%)
|
2869592 (29.11%)
|
ZH13_H
|
7294396
|
5569502 (76.35%)
|
133888 (1.84%)
|
5435614
(74.52%)
|
4398715 (60.3%)
|
1036899 (14.22%)
|
ZH13_R
|
9377594
|
7105662 (75.77%)
|
180228 (1.92%)
|
6925434
(73.85%)
|
5715708 (60.95%)
|
1209726 (12.9%)
|
A3525_G
|
6410296
|
2765020 (43.13%)
|
123130 (1.92%)
|
2641890
(41.21%)
|
2079551 (32.44%)
|
562339
(8.77%)
|
A3525_C
|
12225988
|
10422694 (85.25%)
|
168208 (1.38%)
|
10254486
(83.87%)
|
7224718 (59.09%)
|
3029768 (24.78%)
|
A3525_H
|
5415494
|
3291948 (60.79%)
|
101724 (1.88%)
|
3190224
(58.91%)
|
2506364 (46.28%)
|
683860 (12.63%)
|
A3525_R
|
7299716
|
4275516 (58.57%)
|
109630
(1.5%)
|
4165886
(57.07%)
|
3222708 (44.15%)
|
943178 (12.92%)
|
JACK_G
|
5257778
|
3299682 (62.76%)
|
129240 (2.46%)
|
3170442
(60.3%)
|
2467896 (46.94%)
|
702546 (13.36%)
|
JACK_C
|
10150190
|
9061060 (89.27%)
|
134416 (1.32%)
|
8926644
(87.95%)
|
6397850 (63.03%)
|
2528794 (24.91%)
|
JACK_H
|
6325954
|
3977316 (62.87%)
|
161790 (2.56%)
|
3815526
(60.32%)
|
2864239 (45.28%)
|
951287 (15.04%)
|
JACK_R
|
6545170
|
5176872 (79.09%)
|
139338 (2.13%)
|
5037534
(76.97%)
|
3852475 (58.86%)
|
1185059 (18.11%)
|
MON87705_G
|
5801422
|
3121920 (53.81%)
|
77746
(1.34%)
|
3044174
(52.47%)
|
2477649 (42.71%)
|
566525 (9.77%)
|
MON87705_C
|
10040618
|
8706092 (86.71%)
|
132424 (1.32%)
|
8573668
(85.39%)
|
6366618 (63.41%)
|
2207050 (21.98%)
|
MON87705_H
|
7964642
|
5135186 (64.47%)
|
114408 (1.44%)
|
5020778
(63.04%)
|
4078999 (51.21%)
|
941779 (11.82%)
|
MON87705_R
|
6089780
|
3419908 (56.16%)
|
84256
(1.38%)
|
3335652
(54.77%)
|
2778434 (45.62%)
|
557218
(9.15%)
|
MON87708_G
|
6247388
|
3615618 (57.87%)
|
108156 (1.73%)
|
3507462
(56.14%)
|
2646103 (42.36%)
|
861359 (13.79%)
|
MON87708_C
|
14932616
|
13280240 (88.93%)
|
187898 (1.26%)
|
13092342
(87.68%)
|
9160767 (61.35%)
|
3931575 (26.33%)
|
MON87708_H
|
6905856
|
4829608 (69.93%)
|
120672 (1.75%)
|
4708936
(68.19%)
|
3438106 (49.79%)
|
1270830 (18.4%)
|
MON87708_R
|
5449430
|
4080382 (74.88%)
|
96646
(1.77%)
|
3983736
(73.1%)
|
3029095 (55.59%)
|
954641 (17.52%)
|
M0188_G
|
5518118
|
2498370 (45.28%)
|
119158 (2.16%)
|
2379212
(43.12%)
|
1911268 (34.64%)
|
467944
(8.48%)
|
M0188_C
|
10115880
|
8611796 (85.13%)
|
134268 (1.33%)
|
8477528
(83.8%)
|
5871812 (58.05%)
|
2605716 (25.76%)
|
M0188_H
|
6016450
|
3436352 (57.12%)
|
129058 (2.15%)
|
3307294
(54.97%)
|
2589384 (43.04%)
|
717910 (11.93%)
|
M0188_R
|
6720942
|
3955694 (58.86%)
|
129572 (1.93%)
|
3826122
(56.93%)
|
2956415 (43.99%)
|
869707 (12.94%)
|
FG72_G
|
13193514
|
11067818 (83.89%)
|
243210 (1.84%)
|
10824608
(82.04%)
|
7460539 (56.55%)
|
3364069 (25.5%)
|
FG72_C
|
14086268
|
11984834 (85.08%)
|
165238 (1.17%)
|
11819596
(83.91%)
|
8591902 (60.99%)
|
3227694 (22.91%)
|
FG72_H
|
9768542
|
7866696 (80.53%)
|
184778 (1.89%)
|
7681918
(78.64%)
|
5334051 (54.6%)
|
2347867 (24.03%)
|
FG72_R
|
13992384
|
11333216
(81%)
|
200028 (1.43%)
|
11133188
(79.57%)
|
7652831 (54.69%)
|
3480357 (24.87%)
|
A PCA was performed on all 28 transcriptomic datasets to obtain a global view of gene expression across the seven soybean lines. The first two principal components (PCs) explained 46.07% (PC1) and 25.46% (PC2) of the total variance (Fig. 1). The two PCs could not separate the GM lines from their non-GM donor parents. For example, MON87705, MON87708 and A3525 were clustered together. The natural soybean lines showed significant genetic distance; for instance, Zhonghuang13 was positioned far from the other two natural lines, JACK, and A3525. Notably, the seven datasets from the cotyledon tissues clustered discretely from the other three tissues.
Analysis of differentially expressed genes in the non-GM lines
Gene-expression data were obtained using the gene-expression formula. Genes that were significantly differentially expressed between different parental lines were identified according to normalized gene-expression levels. Some differentially expressed unigenes were expressed at higher levels in certain lines, whereas others were expressed at lower levels. In total, 7,491 DEGs were detected in the two-group comparisons (non-GM lines/non-GM lines) (Fig. 2A). When using ZH13 was control, a total of 4,384 DEGs were identified between A3525 and ZH13, including 1,888 upregulated genes and 2,496 downregulated genes (Supplementary Table S1), and 6,458 DEGs were identified between JACK and ZH13, including 2,298 upregulated genes and 4,160 downregulated genes (Supplementary Table S2). The combined analysis highlighted 3,351 DEGs that were commonly differentially expressed among the three materials, whereas the remaining 3,107 and 1,033 genes were specifically differentially expressed in the specific material (Fig. 2B).
Analysis of differentially expressed genes in the non-GM and GM lines
The number of DEGs among the four group comparisons (non-GM lines/GM lines) ranged from 1,836 to 4,551, which represented 15.56% to 38.57% of the total number of genes (Fig. 2A). These comparisons revealed 4,551 DEGs between M0188 and A3525, including 1,888 upregulated genes and 2,496 downregulated genes (Supplementary Table S3); 2,518 DEGs were identified between M87705 and A3525, including 1,421 upregulated genes and 1,097 downregulated genes (Supplementary Table S4); 1,836 DEGs were identified between M87708 and A3525, including 1,212 upregulated genes and 624 downregulated genes (Supplementary Table S5); and 2,894 DEGs were present between FG72 and JACK, including 1,697 upregulated genes and 1,197 downregulated genes (Supplementary Table S6). The combined analysis revealed that 587 DEGs were commonly differentially expressed among the four tissues, and that 382, 1,016 and 2,570 genes were specifically differentially expressed in specific comparisons (Fig. 2C).
The numbers of DEGs in each of the two comparisons were then investigated. The total number of DEGs among the three non-GM lines was 7,491 and the total number of DEGs between the GM lines and their donor parents was 6,836. The DEGs were shared by different varieties. The differences in numbers of DEGs among the natural varieties were even larger than those between the GM lines and their donor parents, which is consistent with reports from several previous studies [17, 18].
Gene Ontology (GO) analysis of the identified DEGs
To classify the functions of the potential differentially expressed genes, GO analysis was performed. The majority of the DEGs in the comparison among non-GM lines were assigned to the category biological processes (96/199, 48.24%), followed by molecular functions (70/199, 35.18%) and cellular components (33/199, 16.58%). Within the category of biological processes, “cellular metabolic process” (916, 12.26%) and “macromolecule biosynthetic process” (667, 8.93%) were prominently represented, indicating that important metabolic activities occurred in the analyzed tissues (Fig. 3A, Supplementary Table S7). For the cellular component category, “cellular component” (342, 16.89%) and “intracellular” (177, 8.74%) were the two major classes. The other remaining categories of intracellular part, organelle, intracellular organelle, and cytoplasm, accounted for 27.95% of the DEGs. Within the categories of molecular functions, “nucleic-acid binding” (354, 12.96%) and “RNA binding” (212, 12.96%) were the largest and second-largest classes, respectively.
To compare non-GM lines with GM lines, the majority of the DEGs were assigned to the category biological processes (79/166, 47.59%), followed by molecular functions (52/166, 31.33%) and cellular components (35/166, 21.08%). Within the biological processes category, “biosynthetic process” (1,465, 14.26%) and “cellular biosynthetic process” (1,351, 13.15%) were prominently represented, indicating that important biosynthetic processes occur in the analyzed tissues (Fig. 3B, Supplementary Table S7). For the cellular component category, “intracellular” (1,262, 14.41%) and “intracellular part” (1,173, 13.4%) were the two major sub-categories. The other remaining categories: organelle, intracellular organelle, macromolecular complex, and cytoplasm, accounted for 23.04% of the DEGs. The molecular functions category included “structural molecule activity” (318, 9.39%) and “nucleic-acid binding” (316, 9.33%) as the first and second-largest sub-categories, respectively. The categories of “structural constituent of ribosome”, “RNA binding”, “oxidoreductase activity”, “helicase activity” and “endonuclease activity” accounted for approximately 26.26% of the DEGs.
The GO annotations of the two types of comparisons were subsequently compared. The number of GO-term categories (199) for the non-GM lines exceeded the number for the comparison between non-GM and GM lines (166) (Supplementary Table S7). A total of 81 categories appeared in both types of comparisons. The DEGs among the non-GM lines were functionally more varied than the DEGs between the GM lines and their donor parents. These common categories included different biological functions, such as “cellular biosynthetic process”, “nuclear transport”, and “lipid transport”, which are essential for the maintenance of cellular function. Among the three categories of GO terms in non-GM line comparisons, 85 specific GO items were present, such as “regulation of DNA replication”, “regulation of DNA metabolic process”, “DNA replication origin binding”, and “sequence-specific DNA binding”. This enrichment of DNA-related terms suggests that differences in gene-expression resulting from genetic engineering are primarily associated with changes in DNA-level regulation.
Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis of DEGs
The KEGG pathway database is a knowledge base for the systematic analysis of gene functions in terms of networks of genes and molecules in cells and their variants specific to particular organisms. To analyze further the DEGs between GM and non-GM plants, all the DEGs were analyzed with respect to the KEGG pathway database.
Among the 7,491 DEGs in the non-GM line comparisons, 126 (28.64%) that significantly matched to sequences in the database were assigned to eight KEGG pathways. Among these eight pathways, protein processing in endoplasmic reticulum contained the greatest number of DEGs, followed by the pathways ribosome and ribosome biogenesis in eukaryotes (Fig. 4, Supplementary Table S8). This pattern highlights that active metabolic processes occur in these tissues. Among the 6,836 DEGs between the non-GM and GM lines, the pathway protein processing in the endoplasmic reticulum contained the greatest number of DEGs, followed by ribosome and spliceosome pathways. Comparisons of the KEGG pathways revealed seven significant pathways, including photosynthesis, glutathione metabolism, arachidonic acid metabolism, ribosome biogenesis in eukaryotes, ribosome, RNA transport and protein processing in endoplasmic reticulum (Supplementary Table S8). Most of the DEGs between the two comparisons belonged to the same classes and did not differ greatly.