Summary of gene co-expression network analysis
A total of 26 second term normal AF cfRNA chip results (15 males and 11 females) were selected for WGCNA. The mean gestation ages of male and female fetus were 15 and 19 weeks respectively (Table S1). Raw data from each microarray were pre-processed for background correction, normalization and batch correction (Fig 1A). All the 54675 probe sets on the chip were analyzed by WGCNA. For block-wise network construction, a computationally inexpensive and relatively crude clustering method was adopted to classify pre-cluster probe sets into three blocks. A full network analysis was performed in each block separately. A total of 22 distinct probe set modules were generated (Fig 1 B-D). These 22 modules were shown in different colors and module names were labeled as colors, i.e., a color represented a group of co-expressed genes. The size of modules ranged from 20 to 1198 probe sets (Table S6). Probe sets without obvious co-expressions relationship were labeled as grey. After probe sets were converted to gene symbol, co-expressed genes in 22 modules were obtained and summarized in Table S5.
Most of the co-expressed modules showed no correlation with gestation weeks
Module eigengene (ME) is defined as the first principal component of a given module. For all samples employed in WGCNA analysis, MEs of 22 modules were calculated, and compared to modular expression values of the probe sets. As shown in Fig S1, MEs showed the same tendency with expression values of the probe sets and could be representative for expression values of probe sets in the correspondence modules.
The correlation between MEs and gestation age was evaluated by Spearman correlation analysis. As shown in Table S6, most modules showed no significant correlation with gestation age except for green and turquoise modules. Modules were clustered using hierarchical clustering. Green and turquoise modules showing significant correlation with gestation age were classified into different clusters. Modules with similar correlation coefficient were divided into different clusters (Fig 2A). MEs of major modules (turquoise, blue, brown, yellow, green and red) were smoothed by locally weighted regression and were shown in Fig 2B. A downward trend was observed for MEs of green and turquoise modules with the increase of gestation age. No obvious trend was found for MEs of other modules concerning gestation age. Expression of genes in green and turquoise modules reduced with augment of gestation age in second term.
High expression probe sets showing significant clustering tendencies
Probe sets with higher expression values are more inclined to be used as a marker for fetal organ development. To investigate the expression character of clustered probe sets, mean values and coefficient of variation for all probes in the normal 26 samples employed in WGCNA analysis were calculated. A scatter plot (mean expression value VS coefficient of variation) was drawn to show clustered probe sets distribution in all probe sets, and the clustered probe sets were labeled as modular colors (Fig 3A). Compared to probe sets with low expression values, the higher portions of the probe sets with higher expression values were clustered.
Relationship between modules and tissues via dominant tissues and modules
To establish the relationship between fetal tissues and modules, the distribution of tissue-specific genes in modules were analyzed. Tissue-specific genes were obtained at a different cut-off value conditions. After tissue-specific genes were converted to probe sets, numbers of tissue-specific genes (probe sets) under different cut-off value (5, 10, 15, 20, 25 and 30) were counted (Fig 3B). The number of tissue-specific genes (probe sets) decreased with higher cut-off value. The distribution of tissue-specific probe sets in all 22 modules, was analyzed and summarized in Table S6. As shown in Fig 3C, tissue-specific probe sets were mainly distributed in turquoise, blue, brown, yellow, green, and red modules, which were called major modules.
Considering the difference of gene expression in fetus and adult tissues, the cut-off value was set as 5 to get more gene specific genes. The numbers of tissue-specific probe sets derived from different tissues in major modules were shown in Fig 3D, while cut-off value was set as 5 (detailed data see Table S7). Tissue-specific genes from skeleton, liver, and testis accounted for the largest specific genes in turquoise module. The above three tissues were defined as dominant tissues in turquoise module. Similarly, dominant tissues were calculated in blue module (placenta, skeletal muscle, and testis), brown module (testis, cerebral cortex, and skeletal muscle), yellow module (testis, cerebral cortex, and cerebellum), green module (small intestine, liver, and colon) and red module (esophagus, tongue, and tonsil). Tissue-specific probe sets derived from liver mostly distributed in green and turquoise modules. Green and turquoise modules were defined as dominant modules for liver. Dominant modules for cerebral cortex included yellow and blue modules. Dominant tissues and dominant modules were different for different modules and tissues respectively.
Blue, brown, and yellow modules included the largest neural-specific genes (cerebral cortex, cerebellum, basal ganglia, etc). Green modules contained more digest system specific genes.
Differential expressed genes in DS, ES, and TS
Samples used for DEGs analysis were listed in Table S2 (DS), Table S3 (ES), and Table S4 (TS). For differential expressed probe sets (DEPs) analysis in each group, the expression values of the abnormal fetus were compared to that of normal fetus via a linear regression model. There were 1049, 1507, and 1448 DEPs detected in DS, ES, and TS groups respectively. Volcano map of DS, ES, and TS group were shown in Fig 4A-C. Relationship of DEPs between DS, ES, and TS group were shown in Venn diagrams (Fig 4D). There were 41 common DEPs between DS ES group, 38 common DEPs between DS and TS group, 43 DEPs between ES and TS group. No common DEPs were found in DEPs of DS, ES, and TS.
Disease-specific modules via GO analysis
Numbers of DEPs of DS, ES, and TS in major clustered modules were counted (Fig 5A). Blue module was the most abundant distributed module in TS and DS groups. However, the brown module had the largest number of DEPs of ES. To analyze the function of clustered genes, DEPs were converted to gene symbols. The modular distribution of DEGs in three groups was summarized in Table S8.
Modular specific DEGs, which mean DEGs of every group (DS, ES, and TS) in a certain module, were extracted from Table S8. Modular specific DEGs in major modules (turquoise, blue, brown, yellow, green, and red) were performed functional enrichment analysis (Table S10). The numbers of enriched GO terms were shown in Fig 5B. The yellow, blue, and red modules were specific modules for DS, ES, and TS groups respectively. DEGs of DS in the yellow module included SORBS1 and FFAR4. DEGs of ES in blue module included AKNAD1, SOX9, ZNF395, PID1, PRPF38A, FAM220A, ATP1B1, NME7, CD9, EPC1, and MAML2. DEGs of TS in the red module included S100A8 and IVL. The total number of enriched GO terms in ES, DS, and TS group ranked the first, the second, and the third respectively. In green and turquoise modules, similar numeric distribution of enriched GO term was shown for DS and ES, rather than TS. In green and turquoise modules, a similar number of GO term numbers were enriched in DS and ES group.
To get an overall view of all enriched GO terms, the GO graph was established based on the relationship of GO terms. Go maps of all enriched GO terms were list in Table S9. Based on the GO graph, interrelated GO Terms were classified into the same subsets. As shown in Table S10, subsets contained a certain number of enriched GO terms from different modules and abnormal fetus (DE, ES, and TS). A total of 184 subsets were established. The numbers of GO terms in a subset ranged from 1 to 332. The largest nine subsets were summarized in Table 1. Functions of these subsets included basic physiological processes, absorption and transport of nutrients, response to external stimuli, multi-organ (kidney, lung, and heart) development, protein synthesis process, protein catabolic process and proteolysis, thermoregulation, signal transduction of TGF, and bone development.
Table 1
Subsets of enriched GO terms
ID
|
No
|
Fig 6
|
Function
|
1
|
332
|
A
|
Basic physiological processes
|
34
|
100
|
B
|
Absorption and transport of nutrients
|
20
|
79
|
C
|
Response to external stimuli
|
48
|
60
|
D
|
Multi-organ (Kidney, lung and heart) development
|
11
|
48
|
E
|
Protein synthesis process
|
17
|
38
|
F
|
Protein catabolic process and proteolysis
|
38
|
31
|
G
|
Thermoregulation
|
47
|
15
|
H
|
Signal transduction of TGF
|
106
|
12
|
I
|
Bone development
|
ID: Subsets id shown in Table S9 subset_id column.
No: Number of enriched GO terms in subsets, were shown in Table S9 subset_number column.
Fig 6: Modular distribution of enriched GO term subsets in abnormal fetus as shown in Fig 6.
Function: summarized function according to in corresponding biological GO term subsets.
|
To analyze the relationship between subset function and disease in different modular condition, enriched GO terms numbers in the nine largest subsets (Table 1) in relation to modules and fetus disease were shown in Fig 6. Besides that yellow, blue, and red modules were specific modules for DS, ES, and TS groups, more specific modules were shown in certain subsets. Green, turquoise, and brown modules were specific for ES group in absorption and transport of nutrients subset. The green module was specific to the DS group in multi-organ (kidney, lung, and heart) development subset, protein synthesis process subset, thermoregulation subset and signal transduction of TGF subset.