DEGs identification
6480 DEGs with an adjusted P-value<0.01 were found initially. 761 genes with a minimum 2.0-fold change with FDR<0.05 were identified as definitive DEGs finally. The distribution of the gene expression profile was presented in Fig. 1A. The definitive DEGs were also visualized by a clustered heatmap (Fig. 1B).
Functional enrichment analyses of definitive DEGs
GO and KEGG analyses were performed for the definitive DEGs, and the results were presented in Fig. 2. Biological Process (BP) was mainly enriched by GO analysis of definitive DEGs. The most significant five were cellular process involved in reproduction in multicellular organism, meiotic nuclear division, meiotic cell cycle, meiotic cell cycle process, and meiosis I (Fig. 2A). Five GO terms of Cellular Component (CC) were enriched including synaptonemal complex, synaptonemal structure, condensed nuclear chromosome, lateral element, and condensed chromosome (Fig. 2B) while no GO term of Molecular Function (MF) was enriched. The KEGG analysis showed that only Non-alcoholic fatty liver disease was enriched (Fig. 2C). All the detailed results of GO and KEGG analyses were listed in Supplementary Table 1.
Co-expression network construction
After data preprocessing, a weighted gene co-expression network was produced utilizing 10230 genes of GSE25518. All 23 samples were well clustered into two groups without outliers as presented in the hierarchical clustering dendrogram (Fig. 3A). Then a scale-free network was generated after determining the soft-threshold (power of β=10) (Fig. 3B). 15 co-expression modules were exported after hierarchical clustering (Fig. 3C). There were 3508 genes in the turquoise module, 1523 in the blue module, 1308 in the brown module, 1139 in the yellow module, 660 in the green module, 522 in the red module, 364 in the black module, 315 in the pink module, 286 in the magenta module, 166 in the purple module, 95 in the greenyellow module, 92 in the tan module, 66 in the salmon module, and 60 in the cyan module. The remaining 126 genes in the grey module were not clustered into any significant module actually. Then, module eigengene (ME) was calculated for each module. Dendrogram clusters and heatmaps of ME were utilized to assess the co-expression similarity for each module (Fig. 3D). Finally, the correlation between each module and the cryptorchidism trait was calculated (Fig. 3E). Noteworthily, the turquoise module held the strongest relevancy with the cryptorchidism trait (R2=-0.96, P=9e-13).
Screening of the key module
Firstly, we performed GSEA of the whole microarray based on the hallmark_spermatogenesis gene set. This set contains genes up-regulated during the production of male gametes (sperm). It was significantly enriched, suggesting that biological processes involved in spermatogenesis might be impaired in patients with cryptorchidism (Fig. 4A). Then the overlap of genes in each module with the hallmark_spermatogenesis gene set was explored. These overlapped genes were examined again for intersection with DEGs with P-value<0.01 and definitive DEGs with at least a 2.0-fold change (Fig. 4B). The blue and turquoise modules seemed to contain most of the spermatogenesis-related genes. Among 38 spermatogenesis-related genes in the blue module, 12 were DEGs with P-value<0.01 while 6 were definitive DEGs. Among 20 spermatogenesis-related genes in the turquoise module, 18 were DEGs with P-value<0.01, and 4 were definitive DEGs. Combined with the correlation of each module and the cryptorchidism trait, we chose the turquoise module as the key module for further analysis. Finally, GO and KEGG analyses on the turquoise module were conducted (Fig. 4C). The enriched GO terms involved in BP included regulation of ion transmembrane transport, antimicrobial humoral response, antimicrobial humoral immune response mediated by antimicrobial peptide, and antibacterial humoral response. Three GO terms involved in CC were enriched: collagen-containing extracellular matrix, apical plasma membrane, and 9+0 non-motile cilium. The enriched GO terms involved in MF included peptidase regulator activity, peptidase inhibitor activity, endopeptidase regulator activity, endopeptidase inhibitor activity, and serine−type endopeptidase inhibitor activity. No KEGG pathway was significantly enriched. The GO analysis results were all provided in Supplementary Table 2.
Candidate hub genes
1277 hub genes with MM>0.8 and GS>0.8 were obtained from the turquoise module for subsequent analysis (Fig. 5A). Then, we generated a PPI network of the definitive DEGs (Fig. 5B), from which the top 30 hub genes were identified by the Cytoscape plugin “cytoHubba”. The Venn diagram was drawn to capture the common hub genes obtained by the above two approaches (Fig. 5C). Finally, 12 candidate hub genes (CHRM4, CCR4, LMNB1, MTNR1B, TAC3, TRHR, RAD54L, CCNA1, SMC1B, GPRC6A, SYCE2, TAS2R19) were identified. We also mapped the PPI network and predicted the interaction genes of these hub genes utilizing the GeneMANIA database (Fig. 5D).
Validation of candidate hub genes
GSE16191 was used to validate the expression differences of candidate hub genes. They were all differentially expressed with P<0.001 (Fig. 6A). Only LMNB1 was upregulated in the cryptorchidism group which the remaining genes were all downregulated. Additionally, we evaluated the candidate hub genes’ diagnostic value in cryptorchidism group and normal group by ROC curve. It suggested that these hub genes showed excellent diagnostic value as a biomarker between the cryptorchidism group and normal group (AUC=1) (Fig. 6B).
Correlation of the hub genes with spermatogenesis-related genes
Firstly, correlation coefficient between each hub gene was calculated, and good correlations were found (Fig. 7A). Then PPI network between hub genes and genes in the hallmark_spermatogenesis gene set was mapped (Fig. 7B). All the hub genes and 106 genes from the hallmark_spermatogenesis gene set were involved in the network. Since 33 genes in the hallmark_spermatogenesis gene set were expressed differentially with P<0.01, we draw a correlation overview map of hub genes and these genes (Fig. 7C). Pairs of genes suggested good correlations. Additionally, as two hub genes (CHRM4, CCNA1) were contained in the hallmark_spermatogenesis gene set while three genes of the gene set were not contained in the GSE25518 dataset, we calculated the correlation coefficient between hub genes and the rest 130 genes of the hallmark_spermatogenesis gene set finally. We divided the absolute value of correlation coefficient into four categories (|r|>0.8, 0.8>|r|>0.5, 0.5>|r|>0.3, 0.3>|r|). The distribution of correlation of hub genes with the spermatogenesis-related genes was presented in Fig. 7D.
Function of hub genes in cryptorchidism-related infertility
The single gene GSEA based on hallmark gene sets was utilized to analyze the function of hub genes in cryptorchidism-related infertility. Hallmark_spermatogenesis gene set was significantly enriched in five genes (RAD54L, CCNA1, SMC1B, SYCE2, and TAC3) indicating spermatogenesis-related pathways were impaired when the genes were downregulated (Fig. 8A-E). As LMNB1 seemed to have an opposite regulatory function to the other hub genes which were consistent with the expression dissonance pattern of the hub genes, we only presented the top five enriched pathways correlated with the above six genes positively or negatively (Fig. 8A-F). And we counted and visualized the occurrence frequency of the top five enriched pathways correlated with the hub genes positively or negatively (Fig. 8G-H). Hallmark_oxidative_phosphorylation, hallmark_MYC_targets_V1, hallmark_cholesterol_homeostasis, and hallmark_KRAS_signaline_DN were significantly correlated with each hub gene while LMNB1 showed opposite correlations when compared with others (Fig. 8G-H). Hallmark_reactive_oxygen_species_pathway and hallmark_DNA_repair were positively correlated with LMNB1 while they were negatively correlated with other 10 hub genes (RAD54L, CCNA1, SMC1B, SYCE2, TAC3, TRHR, CHRM4, MTNR1B, TAS2R19, CCR4) and 6 hub genes (SYCE2, TRHR, CHRM4, GPRC6A, TAS2R19, CCR4) respectively. Additionally, hallmark_unfold_protein_response and hallmark_protein_secretion were negatively correlated with GPRC6A and another 5 genes (RAD54L, CCNA1, SMC1B, TAC3, MTNR1B) respectively while hallmark_mitotic_spindle and hallmark_G2M_checkpoint were positively correlated with CCNA1. All the single gene GSEA results of each hub gene were provided in Supplementary Table 3.
Literature review on the relationship between hub genes and cryptorchidism-related infertility
To seek the association of hub genes with cryptorchidism-related infertility, we retrieved the published literature for possible proof. Though no hub gene was found exactly involved in cryptorchidism-related infertility, ten genes (CCR4, LMNB1, LMNB1, MTNR1B, TAC3, TRHR, TRHR, RAD54L, CCNA1, SMC1B, GPRC6A, SYCE2) played a key role in infertility or spermatogenesis[34-50], indicating that these hub genes might be involved in cryptorchidism-related infertility. Additionally, there was no direct evidence of the function of CHRM4 and TAS2R19 in infertility. All the results of literature proofs were presented in Table 1.