3.1 Identification of DRGs
We identified 58 differentially expressed genes, including 13 with upregulation and 35 with downregulation. For more information, refer to Fig. 2.
3.2 Assessment of the microenvironment in NPC.
GO enrichment analysis focused on peptidase activity control, human antimicrobial response, negative regulation of peptidase activity, human immune response, enriched envelope, dihydrogenase activity NAD-retinol, peptidase regulatory activity, and retinoic acid metabolic process. The signal pathways considered by KEGG when analyzing the results include keratinocyte differentiation, the IL−17 signaling pathway, element cascades and coagulation, tyrosine metabolism, ovarian steroidogenesis, arachidonic acid metabolism, glycolysis/gluconeogenesis, retinol metabolism, drug metabolism - cytochrome P450, and xenobiotics metabolism by cytochrome P450. Bar and bubble charts were drawn during the enrichment analysis using the ggplot2 program with a q-value filter of 0.05 and P-value filter of 0.05. The results demonstrate consistency with diseases of the female reproductive system, myocardial infection, pancreatic adenocarcinoma, and coronavirus infection. The GSEA study findings indicate high activity in the B cell receptor signaling pathway, cytochrome P450, xenobiotic metabolism by cytochrome P450, retinol metabolism, and steroid hormone biosynthesis in normal nasopharyngeal tissues. In NPC tissues, elevated activity is observed in the cell cycle, DNA replication, interaction with ECM receptors, as well as pathways related to cancer and small-cell lung cancer (see Fig. 3).
3.3 Construction of PPI networks
To examine interactions among robustly expressed DEGs, we utilized a text database to create a PPI network comprising 58 DEGs. Loading PPI data into Cytoscape, with a confidence level exceeding 0.15, leads to the integration of all 25 nodes and 96 edges in the PPI network (refer to Fig. 4).
3.4 Screening of biometric genes by machine learning
We utilized the LASSO and SVM-RFE models to identify gene biomarkers associated with NPC among DEGs. Nine genes were identified through LASSO regression (see Fig. 5A), while the SVM-RFE algorithm identified 37 genes (see Fig. 5B). Seven overlapping genes were identified, namely, The Non-Gastric H+/K + ATPase (ATP12A), RAD51 Associated Protein 1 (RAD51AP1), Insulin Associated Protein 1 (INSM1), Sustained Endoxide Syntase 2 (PTGS2), Serum Amyloid A1 (SAA1), Chemokine Light C-X-C motif chemokine light 11 (CXCL11), and Laminin subunit beta 1 (LAMB1) (see Fig. 5C), The included genes were applied to Nomogram for visualization (see Fig. 5D).
We further demonstrate the effectiveness of the validation dataset by verifying the expression levels of various genes. Based on the box plot results, ATP12A expression levels were significantly lower in the NPC group GSE64634 than in the control group (p < 0.05). In contrast, LAMB1 and RAD51AP1 expression levels were markedly higher (p > 0.05). Additionally, SAA1 expression levels were lower, and CXCL11, INSM1, and PTGS2 expression levels were higher, though not statistically significant (refer to Fig. 6).
3.5 The value of gene biomarkers in nasopharyngeal carcinoma
To validate the diagnostic utility of genes, we utilized the Receiver Operating Characteristic (ROC) curve in both the training and validation datasets. Figure 7 shows the specific Area Under the Curve (AUC) and 95% confidence intervals for distinct diagnostic genes: CXCL11 (AUC = 0.909), INSM1 (AUC = 0.886), PTGS2 (AUC = 0.929), SAA1 (AUC = 0.857), ATP12A (AUC = 0.953), LAMB1 (AUC = 0.959), and RAD51AP1 (AUC = 0.978). Furthermore, we employed the Receiver Operating Characteristic (ROC) curve in the validation dataset to confirm the diagnostic value and effectiveness of the genes. In Fig. 8, the AUC and 95% confidence intervals depict the specificity of the following diagnostic genes: CXCL11 (AUC = 0.771), INSM1 (AUC = 0.708), PTGS2 (AUC = 0.771), SAA1 (AUC = 0.792), ATP12A (AUC = 0.917), LAMB1 (AUC = 0.854), and RAD51AP1 (AUC = 0.875). The results of these genes in the validation dataset demonstrate a strong predictive capacity, which is generally satisfactory.
3.6 Analysis of Immune Cell Infiltration
Apply the CIBERSORT algorithm to assess immune infiltration in nasopharyngeal cancer. Figure 9a presents the bar graph illustrating the immune cell composition of the illness group and the control group, while Fig. 9b displays variations among the same immune cells using a violin graph. In the nasopharyngeal carcinoma group, there was a reduction in the activation of naive B cells, B cell memory, CD4 T cell memory, resting memory, and follicular aid of T cells (p = 0.001, p = 0.001, p = 0.001, and p = 0.0036). Conversely, resting NK cells, monocytes, M0 and M1 Macrophages, CD8 T cells, memory-activated CD4 T cells, and gamma delta T cells showed no significant change. Our findings indicate a positive relationship between neutrophils and active mast cells (r = 0.46), and a favorable relationship between activated monocellular and NK cells (r = 0.57). T cell regulation (Tregs) is adversely connected with CD4 T cell activated memory (r=−0.46), while NK cell redefinition is negatively correlated with gamma delta T cells (r=−0.45). Activated dendritic cells showed a substantial negative connection (r = 0.26) with B cells. The correlation between NPC and immune cell infiltration is shown in Fig. 9C.
3.6 Correlation analysis between identified genes and immune cell infiltration
A correlation study of 22 different types of immune cells may reveal how the identified genes contribute to the development of nasopharyngeal cancer by influencing immune cell infiltration. ATP12A (R = 0.31, p = 0.0098), SAA1 (R = 0.45, p = 9e−05), and LAMB1 (R=−0.46, p = 5.8e−05) showed a positive association with B cell memory. RAD51AP1 (R=−0.57, p = 3.5e−07), CXCL11 (R=−0.31, p = 0.01), and PTGS2 (R=−0.28, p = 0.018) were also positively correlated with B cell memory. ATP12A (R = 0.5, p = 1.4e−05) and SAA1 (R = 0.32, p = 0.007) exhibit positive connections with native B cells. On the contrary, LAMB1 (R=−0.41, p = 4.3e−04), RAD51AP1 (R=−0.35, p = 3.1e−03), CXCL11 (R=−0.34, p = 0.0046), INSM1 (R=−0.43, p = 2e−04), and PTGS2 (R=−0.35, p = 0.0035) show negative associations with native B cells. Eosinophils show a positive correlation with RAD51AP1 (R = 0.35, p = 0.0028). LAMB1 (R = 0.24, p = 0.044) exhibits a positive correlation with eosinophils. Additionally, LAMB1 (R = 0.26, p = 0.034), RAD51AP1 (R = 0.36, p = 0.0027), and PYGS2 (R = 0.30, p = 0.014) demonstrate positive correlations with M0 Macrophages. In contrast, ATP12A (R=−0.31, p = 0.01) shows a negative correlation with M0 Macrophages. Under resting conditions, RAD51AP1 (R=−0.28, p = 0.02) has a negative correlation with mast cells. Meanwhile, LAMB1 (R = 0.28, p = 0.019) and RAD51AP1 (R = 0.34, p = 0.0037) show positive correlations with neutrophils. LAMB1 (R = 0.28, p = 0.018), RAD51AP1 (R = 0.42, p = 3.7e−04), and CXCL11 (R = 0.49, p = 3e−05) display positive correlations, while LAMB1 (R=−0.35, p = 0.0029), RAD51AP1 (R=−0.32, p = 0.0078), CXCL11 (R=−0.39, p = 0.00095), and INSM1 (R=−0.24, p = 0.047) show negative correlations with CD4 T cell memory rest. LAMB1 (R=−0.24, p = 0.046) and RAD51AP1 (R=−0.29, p = 0.0015) are negatively correlated. Refer to Table 2 and Fig. 10 for more details.
Table 2
Correlation analysis of immune infiltration among 7 characteristic genes
|
ATP12A
|
LAMB1
|
RAD51AP1
|
CXCL11
|
INSM1
|
PTGS2
|
SAA1
|
|
R
|
P
|
R
|
P
|
R
|
P
|
R
|
P
|
R
|
P
|
R
|
P
|
R
|
P
|
B cells memory
|
0.31
|
0.0098
|
-0.46
|
5.8e-05
|
-0.57
|
3.5e-07
|
-0.31
|
0.01
|
NA
|
NA
|
−0.28
|
0.018
|
0.45
|
9e − 05
|
B cells native
|
0.5
|
1.4e-05
|
-0.41
|
0.00043
|
-0.35
|
0.0031
|
-0.34
|
0.0046
|
-0.43
|
2e − 04
|
−0.35
|
0.0035
|
0.32
|
0.007
|
Eosinophils
|
NA
|
NA
|
NA
|
NA
|
0.35
|
0.0028
|
NA
|
NA
|
NA
|
NA
|
NA
|
NA
|
NA
|
NA
|
Dendritic cell actived
|
NA
|
NA
|
0.24
|
0.044
|
NA
|
NA
|
NA
|
NA
|
NA
|
NA
|
NA
|
NA
|
NA
|
NA
|
Macrophages M0
|
-0.24
|
0.044
|
0.26
|
0.034
|
0.36
|
0.0027
|
NA
|
NA
|
NA
|
NA
|
0.30
|
0.014
|
NA
|
NA
|
Macrophages M1
|
-0.31
|
0.01
|
0.24
|
0.05
|
NA
|
NA
|
0.66
|
2.2e − 16
|
0.28
|
0.02
|
0.45
|
0.00011
|
NA
|
NA
|
NK cells resting
|
0.33
|
0.0059
|
NA
|
NA
|
NA
|
NA
|
−0.36
|
0.0027
|
− 0.32
|
0.0077
|
− 0.33
|
0.0062
|
NA
|
NA
|
Mast cells activated
|
NA
|
NA
|
0.28
|
0.018
|
0.33
|
0.0052
|
NA
|
NA
|
NA
|
NA
|
NA
|
NA
|
NA
|
NA
|
Mast cells resting
|
NA
|
NA
|
NA
|
NA
|
-0.28
|
0.02
|
NA
|
NA
|
NA
|
NA
|
NA
|
NA
|
NA
|
NA
|
Neutrophils
|
NA
|
NA
|
0.28
|
0.019
|
0.34
|
0.0037
|
NA
|
NA
|
NA
|
NA
|
NA
|
NA
|
NA
|
NA
|
T cells CD4 memory-activated
|
-0.46
|
5.7e-05
|
0.28
|
0.018
|
0.42
|
0.00037
|
0.49
|
3e − 05
|
0.26
|
0.029
|
NA
|
NA
|
-0.39
|
0.0011
|
T cells CD4 memory resting
|
0.41
|
5e-04
|
-0.35
|
0.0029
|
-0.32
|
0.0078
|
-0.39
|
0.00095
|
−0.24
|
0.047
|
NA
|
NA
|
0.45
|
1e − 04
|
T cells CD4 native
|
NA
|
NA
|
-0.24
|
0.046
|
-0.29
|
0.0015
|
NA
|
NA
|
NA
|
NA
|
NA
|
NA
|
NA
|
NA
|
T cells CD8
|
NA
|
NA
|
0.27
|
0.023
|
0.31
|
0.01
|
NA
|
NA
|
NA
|
NA
|
NA
|
NA
|
-0.42
|
0.00037
|
T cells delta
|
NA
|
NA
|
NA
|
NA
|
NA
|
NA
|
0.42,
|
0.00034
|
NA
|
NA
|
NA
|
NA
|
NA
|
NA
|
Monocytes
|
NA
|
NA
|
NA
|
NA
|
NA
|
NA
|
NA
|
NA
|
-0.32
|
0.0073
|
NA
|
NA
|
NA
|
NA
|
T cells follicular helper
|
NA
|
NA
|
NA
|
NA
|
NA
|
NA
|
NA
|
NA
|
NA
|
NA
|
− 0.31
|
0.0098
|
NA
|
NA
|
NA: no available; R: Pearson”s correlation coefficient; P: p value; |