Systematic search
After the systematic search, the datasets of the four different human microarray studies were selected for further analyses. After correcting the batch effect, we combined the four GEO data sets of GSE27276, GSE2378, GSE9944 (GPL8300) and GSE9944 (GPL571) into the expression profiles of 110 samples (control group: 67 cases; POAG group: 43 cases) (Fig. 1A-B). The difference analysis was performed by limma package. The screening conditions of difference genes were P value < 0.05 and |logFC| > 0.585. Finally, 49 difference genes were screened, including 30 up-regulated genes and 19 down-regulated genes (Fig. 1C).
Pathway analysis
We further analyzed the pathway of these 49 candidate genes through Metascape database. The results showed that these candidate genes were mainly enriched in structural molecular activity, epidermis development, extractive matrix, oxidoreductase activity, aminoglycan metabolic process, aging and other pathways (Fig. 2A). Moreover, we analyzed the protein-protein interaction (PPI) network of genes in different gene sets by Cytoscape software (Fig. 2B).
Key genes
We analyzed the above 49 differential genes by Random Forest and SVM respectively to screen the key genes (Fig. 3A-3B). According to the comprehensive scores of the two machine learning methods, we obtained the top 5 genes as the key gene sets, which are KRT14, HBB, ACOX2, HEPH and KRT13 (Fig. 3C). The expression of five key genes in the POAG group and normal group is shown in the Fig. 4.
Immune cell infiltration
Microenvironment is mainly composed of immune cells, extracellular matrix, a variety of growth factors, inflammatory factors and special physical and chemical characteristics, which significantly affects the sensitivity of disease diagnosis and clinical treatment. By analyzing the relationship between key genes and immune infiltration in POAG data set, we further explore the potential molecular mechanism of key genes affecting the progress of POAG. The results show that the proportion of immune cells in each patient and the correlation between immune cells are shown in the Fig. 5A-5B Compared with normal group, the T cells regulatory (Tregs) level of samples in POAG group was significantly higher (Fig. 5C).
We further explored the relationship between key genes and immune cells. The five key genes were highly correlated with immune cells. KRT14 was positively correlated with plasma cells, neutrophils, and negatively correlated with T cells regulatory (Tregs), mast cells resetting. HBB was positively correlated with NK cells activated and monocytes, and negatively correlated with mast cells resting and dendritic cells resting. ACOX2 was positively correlated with T cells CD4 memory resting and monocytes, and negatively correlated with T cells cellular helper and T cells CD4 naïve. HEPH was positively correlated with T cells CD4 memory resetting and T cells regulatory (Tregs), and negatively correlated with T cells CD4 naive and T cells follicular helper. KRT13 is positively correlated with T cells follicular helper and plasma cells, and negatively correlated with T cells regulatory (Tregs) and mast cells resting (Fig. 6A). We further obtained the correlation between these key genes and different immune factors from TISIDB database, including immune modulators, chemokines and cell receptors (Fig. 6B-E). These analyses confirmed that these key genes are closely related to the level of immune cell infiltration and play an important role in the immune microenvironment.
Key genes related pathway
We used these five key genes in the gene set of this analysis to further explore the transcriptional regulatory network involved in key genes. Relevant transcription factors were predicted through Cistrome DB online database, including 55 transcription factors predicted by KRT14, 92 transcription factors predicted by HBB, 71 transcription factors predicted by ACOX2, 106 transcription factors predicted by HEPH and 57 transcription factors predicted by KRT13, Finally, a comprehensive transcriptional regulatory network of POAG key genes was constructed by visualization through Cytoscape (Fig. 7).
We studied the specific signal pathways enriched by five key genes to explore the potential molecular mechanism of key genes affecting the progress of POAG. We selected the significantly enriched pathways which showed in Fig. 8 and Fig. 9. The pathways enriched with KRT14 by GO analysis include cell substrate junction assembly, cell junction assembly and other pathways. The pathways enriched by KEGG include ladder, cancel and so on Butanoate metabolism and other channels. The pathways enriched with HBB by GO analysis included brown fat cell differentiation, corporate cytoskeleton organization. The pathways enriched by KEGG include Angel processing and presentation, focal adhesion. The pathways enriched with ACOX2 by GO analysis include spindle localization and transitional initiation. The pathways enriched by KEGG include promote metabolism and pyruvate metabolism. The pathways enriched with HEPH by GO analysis include numeric expression repair DNA recognition, lamellipodium organization. The pathways enriched by KEGG include glycerophospholipid metabolism, beta alanine metabolism. The pathways enriched with KRT13 by GO analysis include autophagosome organization, column cuboidal epithelial cell differentiation. The pathways enriched by KEGG include circuit cycle, TCA cycle, cytokine receptor interaction (Fig. 9).
Gene regulatory network analysis of key genes in POAG
We predicted and analyzed the five key genes through miRWalk database and ENCORI database to obtain their possible miRNA and lncRNA. Firstly, the mRNA-miRNA relationship pairs related to these five key mRNAs were extracted from the miRWalk database. We only retained 35 mRNA-miRNA relationship pairs with TargetScan of 1 or miRDB of 1 (including 4 mRNAs and 13 miRNAs). Then we predicted the interacting lncRNA according to these miRNAs, in which 1112 pairs of interactions (including 2 miRNA and 823 lncRNA) were predicted. Finally, we constructed the ceRNA network through Cytoscape (V3.7) (Fig. 10).
POAG biomarkers
We discussed the prediction efficiency of key genes through the ROC curve verified by diagnostic efficiency. The results showed that the area under the AUC curve of KRT14 is 0.8254; The area under the AUC curve of HBB is 0.7404; The area under the AUC curve of ACOX2 is 0.7782; The area under AUC curve of HEPH is 0.8008; The area under the AUC curve of KRT13 is 0.816. These results show that these five key genes have good prediction efficiency for POAG and may better predict the occurrence and development of diseases (Fig. 11A -11E).
Drug targeting prediction in POAG
We divided the differential mRNA into up-regulated and down-regulated groups, and predicted the drug targeting of the differential genes through the Conectivity Map database. The results showed that the expression profiles of drug disturbances such as avrainvillamide-analysis-3, cytochalasin-D, NPI-2358, oxymethylone and vinorelbine were negatively correlated with the expression profiles of disease disturbances (Fig. 12). It indicated that these drugs may reduce or even reverse the POAG disease state.