WGCNA and the significant module identification
The expression matrix containing 5000 genes were used to reconstruct the gene coexpression network using WGCNA method. Pearson correlation matrix among genes were converted into a strengthened adjacency matrix by power β=5 based on scale-free topology criterion with R2=0.9 (Fig. 1a). Topological overlap measure (TOM) of each gene pair was calculated, and 16 gene coexpression modules were identified by average linkage hierarchical clustering according to TOM-based dissimilarity (1-TOM) (Fig. 1b). There were large differences in the number of genes between the modules, and the smallest midnightblue module contained 59 transcripts, while the largest turquoise module contained 914 transcripts. The purple module contained 227 transcripts, and the magenta module contained 254 transcripts (Table S1).
The eigengene adjacency heatmap depicting the interaction of the identified modules was showed in Fig. 1c, while the network heatmap plot of all the genes was showed in Fig. 1d. As shown in Fig. 1e, the correlation analysis between Module eigengene (ME) and sex showed that the purple module was the most significantly representative module (cor=-0.73 and P=5e-07), and the magenta module was the third significantly representative module (cor=0.5 and P=0.002). Gene significance (GS) analysis across modules (P=2.2e-259) showed that the purple module and magenta module had the highest and second highest GS, respectively (Fig. 1f). The module membership (MM) in the purple module possessed the most significant correlation across all modules (cor=0.71 and P<4e-36, Fig. 2a), and the MM in the magenta module possessed the second significant correlation across all modules (cor=0.63 and P<1.7e-29, Fig. 2b). Thus, the purple module and magenta module were selected for further analyses. The expression heatmap of all genes in the purple and magenta module were respectively showed in Fig. 2c and 2d.
GO and KEGG analysis and key genes identification
GO and KEGG enrichment analysis were performed on all genes in the purple module and magenta module using the Database for Annotation, Visualization and Integrated Discovery (DAVID) database. In the purple module, GO enrichment results showed that 9 biological processes (BPs), 7 cellular components (CCs) and 4 molecular functions (MFs) were significantly enriched (P<0.05). KEGG enrichment analysis showed that 6 KEGG pathways were significantly enriched (P<0.05). Top 1 BP, CC and MF terms and Top 4 KEGG pathways were showed in Table 1. KEGG pathway analysis showed that the genes in the purple module were significant enriched in Regulation of lipolysis in adipocytes, Aldosterone synthesis and secretion, Ras signaling pathway and MAPK signaling pathway, and so on. GO analysis showed that the most significant enriched CC term was Extracellular space. The most significant enriched BP term was Muscle organ morphogenesis. The most representative MF term was Heparin binding. In addition, the functional analysis for the magenta module was performed, GO and KEGG enrichment results showed that 23 BPs, 3 CCs, 11 MFs and 12 KEGG pathways were significantly enriched (P<0.05). Top 2 BPs, Top 1 CC and MF terms and top 4 KEGG pathways were showed in Table 2. In KEGG analysis, genes in the magenta module mainly enriched in pathways related to immunity disease, including Influenza A, Hepatitis C, Measles and Herpes simplex infection, and so on. The most significant enriched CC term was External side of plasma membrane, and the significant enriched BP terms were related to immunity process, including Defense response to virus and Immune response. The most representative MF term was Double-stranded RNA binding.
In the purple module, the selection criteria of key genes were that genes participated in two terms, and at least involved in one KEGG pathway. Besides, all the genes involved in the pathway of regulation of lipolysis in adipocytes were selected as key genes. So, 10 key genes, LIPE, NPR1, AQP7, NPY1R, PLA2G16, LOC100522721, FGF10, FGFR2, CACNA1G and IGF1 were identified and showed in Fig. 3a. In the magenta module, the selection criterion of key genes was that genes involved in at least three terms. 12 genes, FAS, TNFSF10, DDX58, MX1, IFIT1, ADAR, OAS2, OAS1, CXCL9, CXCL10, STAT1 and IRF9 were selected as key genes (Fig. 3b).
PPI network construction and hub genes identification
All genes in the purple and magenta modules were used to perform PPI network analysis. At combined score > 0.4, a PPI network including 116 nodes and 169 edges was constructed for the purple module (called the first PPI network), and was showed in Fig. 4a. For the magenta module, a PPI network containing 153 nodes and 814 edges was established and showed in Fig. 4b (called the second PPI network). Top genes with higher scores were identified by centrality analyses in the two modules using the centiScape. The result of centrality analysis for genes with top 10 centrality scores in the first PPI network and top 20 centrality scores in the second PPI network were respectively showed in Table 3 and Table 4. In the purple module, top 10 genes were COL1A2, LOC100738989, MMP2, COL1A1, SPARC, POSTN, SERPINH1, IGF1, FMOD and FKBP10, separately. In the magenta module, top 20 genes were USP18, MX1, STAT1, ISG15, IRG6, MX2, IFIT3, IFIT2, IFIT1, DDX58, OAS1, PARP9, DDX60, IFI44L, LFI35, IFI44, PARP14, IRF9, XAF1 and OAS2, separately.
To further excavate the hub gene in the two modules, the cytoHubba was used to screen out hub genes in the whole PPI network. According the Maximal Clique Centrality (MCC) score, top 10 genes were selected as hub genes in the first PPI network. A sub-network including 10 nodes and 36 edges in the first PPI network was identified and showed in Fig. 4c. These hub genes were COL1A2, MMP2, COL1A1, SPARC, POSTN, SERPINH1, FMOD, ENSSSCG00000017581, ENSSSCG00000029074 and LOC100738989. According to the MCC score, top 20 genes were selected as hub genes in the second PPI network. A sub-network containing 20 nodes and 187 edges in the second PPI network was identified and showed in Fig. 4d. These hub genes were USP18, MX1, STAT1, ISG15, IRG6, ENSSSCG00000027918, MX2, IFIT3, IFIT2, IFIT1, DDX58, OAS1, PARP9, DDX60, IFI35, IFI44, IFI44L, IRF9, XAF1 and OAS2.
Ultimately, 8 hub genes, COL1A2, LOC100738989, MMP2, COL1A1, SPARC, POSTN, SERPINH1 and FMOD in the purple module were identified simultaneously using centiScape and cytoHubba. Function enrichment analysis showed that the 8 genes mainly involved in some pathways including AGE-RAGE signaling pathway in diabetic complications, Relaxin signaling pathway and Proteoglycans in cancer (Fig. 5d). The significant enriched BP terms were Extracellular matrix (ECM) organization and Extracellular structure organization, and so on (Fig. 5a). The significant enriched MF terms were Collagen binding and Extracellular matrix binding, and so on (Fig. 5b). The significant enriched CC terms were Extracellular matrix and Collagen-containing extracellular matrix, and so on (Fig. 5c).
In the magenta module, 19 hub genes, USP18, MX1, STAT1, ISG15, IRG6, MX2, IFIT3, IFIT2, IFIT1, DDX58, OAS1, PARP9, DDX60, IFI35, IFI44, IFI44L, IRF9, XAF1 and OAS2 were identified simultaneously using centiScape and cytoHubba. The functional enrichment analysis was performed for these hub genes. KEGG pathways had Hepatitis C, Coronavirus disease-COVID-19, Measles, Influenza A, Epstein-Barr virus infection, Herpes simplex virus 1 infection, NOD-like receptors signal pathway, RIG-I-like receptors signal pathway and Human papillomavirus infection and so on (Figure 6d). The significant enriched BP terms were Defense response to virus, Response to virus, Immune effector process, Defense response to other organism and Response to type I interferon (Fig. 6a).The enriched MF terms were Double-stranded RNA binding and GTP binding, and so on (Figure 6b). The enriched CC terms were Dendritic spine and Neuron spine, and so on (Fig. 6c).
Differential expression analysis
By the functional analysis and PPI network analysis, 18 genes and 24 genes were selected as key genes in the purple and magenta module, respectively. In the purple module, differential expression analysis showed that 17 genes except for NPY1R (P=0.0502) had significant differences in the purple module between two sex groups (P<0.05 or 0.01, Fig. 7a and 7b). Among 17 genes with significant differences in expression, FMOD, SPARC, POSTN, SERPINH1 and LOC100522721 were significantly different (0.01< P<0.05), and other 12 genes were remarkable significant different (P<0.01). Expression level of NPR1, LIPE, AQP7, PLA2G16 and LOC100522721 in female pigs were significantly higher than that in male pigs, while that of COL1A2, MMP2, COL1A1, SPARC, POSTN, SERPINH1, FMOD, LOC100738989, IGF1, FGF10, FGFR2 and CACNA1G in females were significantly lower than that in males. In the magenta module, 19 genes except for USP18, MX1, IRG6, DDX60 and ADAR including DDX58 (P<0.05), OAS2 (P<0.05), IRF9 (P<0.05), STAT1 (P<0.05), IFIT3 (P<0.05), IFI44L (P<0.05), IFI44 (P<0.05), IFI35 (P<0.05), ISG15 (P<0.05), MX2 (P<0.05), PARP9 (P<0.05), CXCL9 (P<0.05), CXCL10 (P<0.05), XAF1(P<0.05), IFIT1 (P<0.01), IFIT2 (P<0.01), OAS1 (P<0.01), FAS (P<0.01) and TNFSF10 (P<0.01) were significantly differentially expressed between two sex groups, and all the genes were more highly expressed in female pigs (Fig. 7c and 7d).