Our study collected and merged two ICM expression profiling datasets (GSE1869, GSE5406) and two AF expression profiling datasets(GSE41177, GSE14975), All datasets are standardized and normalized using the “limma” package, and by using the “sva” package batch effects were performed both on the ICM dadasets and AF datasets for subsequent analyses (Figure1 A, B)
3.1 Co-expression modules in ICM and AF
In the ICM combined dataset(GSE1869 and GSE5406), When 0.8 was used as the correlation coefficient threshold, the soft-thresholding power was selected as seven (Figure 1A). Through WGCNA analysis, 7 co-expression modules were identified. And It clearly indicated that the Red module was most significantly associated with ICM(P=0.004), and it include 1307 genes.
In the AF combined dataset(GSE41177 and GSE14975), When 0.8 was used as the correlation coefficient threshold, the soft-thresholding power was selected as five (Figure 1A). Through WGCNA analysis, 7 co-expression modules were identified. And It clearly indicated that the Blue module was most significantly associated with ICM(P= 4*10-4), and it include 3154 genes(Figure2).
3.2 Interaction of genes,interacted correlated genes screened by WGCNA from ICM combined dataset and AF combined dataset by venn. And filtered out 188 correlated Comorbidity genes(Figure3,A), and performed protein-protein interaction network(PPI) analysis and visualized it by Cytoscape(Figure3,B).
3.3 Enrichment analysis of Comorbidity genes. Performed GO(Figure4 A.BP B.CC C.MF) and KEGG(Figure4 D. ) enrichment analysis on these 188 Comorbidity genes. Enrichment analysis suggested that they were mosly involved in pathways of regulation of histone H3-K9methylation(BP), vacuolar membrane(CC), single-stranged DNA binding(MF) and Steroid biosynthesis(KEGG).
than performed Subnetwork extraction, on overlapped genes by venn, By using the “cytohubba” algorithm in cytoscape in cytoscape software(version:3.7.2), Four topological analysis methods are used, and extract the top 10 genes in the every subnetwork. The Four topological analysis methods are EPC(Edge Percolated component), DMNC(Density of Maximum Neighborhood Component)、MCC(Maximal Clique Centrality)and MNC(Maximum neighborhood component). These topology analysis methods rank and select the top 10 according to the attributes of nodes in the network(Figure5 A,B,C,D). Then the every ten genes retained by Four topological analysis methods were overlapped using venn(Figure5 E)and filtered out 7 hub genes.
3.4 Performing Lasso regression analysis, we filtered out the expression matrix of 7 hub genes both form ICM combined datasets and AF combined datasets, and performed Lasso regression analysis with 7 hub genes on ICM combined datasets and filtered out 5 hub genes(Figure6,A,B), samely performed Lasso regression analysis with 7 hub genes on AF combined datasets and filtered out 4 hub genes(Figure6,C,D), than intersected by venn, finally screened ou 3 hub genes(CHD1, MSH2, NIPBL),(Figure6,E).
3.5 Lollipop chart of correlation between single gene and immune cell,and immune process. Calculate the enrichment fraction of 29 immune cells and immune processes in the ICM combined gene set and AF combined gene set separately by using ssGSEA method. Then performed correlation analysis between enrichment fraction and expression level of three hub genes, and displayed the Lollipop chart(Figure7).
3.6 Validation, we drawed ROC curve and calculated the AUC value of hub genes in the ICM combined gene set, the AUC value of three hub genes are 0.846 (CHD1), 0.722 (MSH2) and 0.885(NIPBL) separately(Figure8,A,B,C), The result indicated that these genes has the good diagnostic value in diagnosing ICM.
And in AF combined dataset we also drawed ROC curve and calculated the AUC value of three hub genes, the AUC value of three hub genes are 0.83 (CHD1), 0.869 (MSH2) and 0.882(NIPBL) separately(Figure8,D,E,F), The result indicated that these genes has the perfect diagnostic value in diagnosing AF.
And we validated the discermination ability of these genes by other dataset. We darawed the ROC curve and calculated the AUC value of three hub genes in the ICM genesets, GSE116250 dataset and GSE9128 dataset(Figure8,G,H). and darawed the ROC curve and calculated the AUC value of three hub genes in the AF dataset ,GSE116250 dataset,(Figure8,I), these results indicated that these three hub genes has the good discrimination ability both in ICM and AF.
3.7 we predict and developed the PPI map of the genes which have the Protein-protein interaction with three hub genes and performed functional enrichment analysis of these genes then displayed the top five genes(Figure8,A). And Histogram of the potential transcription factors of the three genes jointly predicted by six databases, and according to the Mean rank displayed top ten transcription factors(Figure9,B).