3.1 microarray data characteristics
Three independent microarray information were included in this study, involving three independent clinical trials, which were taken from geo database, including 277 samples of gse37768, gse76925 and gse130928 (including 151 patients with chronic obstructive pulmonary disease and 126 healthy subjects). All provided clinical information such as age, lung function, smoking and body mass index.
3.2 Eliminate batch effect through cross platform standardization
The combat method is used to process data sets from different platforms and different batches to eliminate the batch effect between different data sets. A total of 15326 genes were detected in three different geo data sets included in this study.Before eliminating the batch effect, the two main components (PCS) of non standardized expression values were clustered (Fig. 1a).In order to eliminate the batch effect, the scatter diagram is drawn after the principal component standardization of the data. The results show that the batch effect of different platforms has been eliminated (Fig. 1b).
3.3 Consensus clustering of COPD cases
The gene expression matrix after eliminating the batch effect and the sample information of chronic obstructive pulmonary disease were analyzed by cluster analysis. The results showed that 151 patients with chronic obstructive pulmonary disease were divided into different subgroups.According to the consistency cluster analysis of gene expression profile, the results were divided into three subgroups. The number of cases in subgroups I, II and III in the three subgroups were 55, 49 and 47 respectively, and the gene expression patterns were different among the three subgroups (Fig. 2a).In contrast, the results based on consistency matrix analysis showed that the gene expression patterns in each subgroup were highly similar. In general, the higher the consistency score, the more stable the grouping is.In this study, when the groups are divided into 2 and 3 groups, the clustering scores of each subgroup are greater than 0.8, indicating that the results of these groups are more stable. Therefore, 151 patients with COPD were divided into 3 groups (Fig. 2b).By describing the clinical characteristics of the three subgroups and analyzing the data of 151 cases of chronic obstructive pulmonary disease, it was found that the age of subgroup I was younger (Fig. 3a), and the age of subgroup II and III was older. Smoking time in subgroups II and III was significantly longer than that in subgroup I (Fig. 3b), indicating that subjects in subgroups II and III were heavier in subgroup I, while subgroup I developed earlier. However, no differences were found in age and smoking time in subgroups II and III.
Table 1 The number of differentially expressed genes by case-control and case‐case comparisons and weighted gene co‐expression analysis modules in each subgroup
Transciptome classification
|
#of upregulated genes by case-control comparison
|
#of downregulated genes by case-control comparison
|
#of subgroup-specific upregulated genes by case-control comparison
|
Modules
|
Ⅰ
|
74
|
5220
|
1076
|
Brown
|
Ⅱ
|
418
|
1257
|
4271
|
Blue and Yellow
|
Ⅲ
|
562
|
28
|
2339
|
Turquoise
|
3.4 Identification of gene coexpression modules of subgroups
In order to reveal the gene differences between COPD subgroups, WGCNA analysis was carried out under the expression level of specifically up-regulated genes in each subgroup.Through pairwise differential expression analysis between each two subgroups, it was determined that 1076, 4271 and 2339 genes were specifically up-regulated in subgroups I, II and III respectively (the corrected threshold was p < 0.05, and the absolute difference of the mean value was > 0.2).Differential expression analysis was performed by comparing the gene expression profiles of each subgroup with those of the normal control group. The results of GSEA enrichment analysis showed that there was also significant difference between the specific up-regulated genes in the subtypes and the normal samples (Fig. 4a-c, FDR < 0.05).It is worth noting that subgroup I has the least subgroup specific up-regulated genes compared with other subgroups, and the number of differentially expressed genes is relatively small by comparing the cases of each subgroup with the normal control group.
Based on the expression levels of 7686 specifically up-regulated genes in the subgroup, a gene heat map expression network was constructed, which identified five WGCNA modules (Fig. 4D、Table 1).By analyzing the relationship between WGCNA module and corresponding subgroups and KEGG enrichment, it was found that PPAR signaling pathway, regulation of actin cytoskeleton and platelet activation pathway were significantly enriched only in brown module (Fig. 5a), and brown module was significantly enriched in sub I, indicating that subjects in sub group I showed airway remodeling characteristics; Glycerol phospholipid metabolism, peroxisome and neuroactive ligand receptor interaction are only significantly enriched in the blue module, trap interaction in vesicle transport and bacterial invasion of epithelial cells are only significantly enriched in the Yellow module, vascular smooth muscle contraction is only significantly enriched in the blue and yellow modules, and yellow and blue modules are significantly enriched in subgroup II, The results showed that the subjects in subgroup II showed the characteristics of metabolic activity; JAK-STAT signal pathway, PI3K Akt signal pathway and HIF-1 signal pathway are significantly enriched in the green module, and the green module is significantly enriched in subgroup III, indicating that the subjects in subgroup III show inflammatory characteristics.However, the gray module did not find significant enrichment in the corresponding subgroups.These findings suggest that each subgroup has its specific functional modules or pathways that can regulate the occurrence or progression of COPD.
3.5 Clinical features and WGCNA module discussion
Through the correlation module, correlation coefficient and corresponding p value of clinical features and WGCNA, the relationship between the characteristic genes of each module and BMI index or FEV1 is analyzed and calculated. It is found that the characteristic genes are represented by the characteristic vector of gene expression matrix.Brown and gray modules were not associated with BMI index or FEV1 (Fig. 5b), indicating that the airway remodeling process in subgroup I was not significantly correlated with BMI index or FEV1. In contrast, blue and yellow modules were negatively correlated with BMI index or FEV1, and the difference was statistically significant, indicating that patients with low BMI index and high FEV1 were metabolically active in subgroup II; The green module was positively correlated with BMI index or FEV1, and the difference was statistically significant, indicating that patients in subgroup III showed a continuously enhanced inflammatory response with the increase of BMI index and the decrease of FEV1.Our results further suggest that the WGCNA module is associated with some clinical features, such as BMI index or FEV1.