Identification and screening of differential expression genes
We analyzed the differential expression profile of a total of 1,382 homo sapiens (human) CCRGs in 594 LUAD samples (normal count: 59; tumor count: 535) and 551 LUSC samples (normal count: 49; tumor count: 502) by the Wilcoxon signed-rank test respectively. Finally, we identified 290 CCRGs (downregulated CCRGs: 126; upregulated CCRGs: 164), and 447 CCRGs (downregulated CCRGs: 226; upregulated CCRGs: 221) differentially expressed in LUAD and LUSC samples respectively. Results of expression analyses are illustrated as heatmaps, volcano plots, and boxplots (Fig. 1).
Functional enrichment analysis of differential expression genes
Firstly, we analyzed the association of these differentially expressed CCRGs with the GO terms of the biological process (BP) and cellular component (CC) categories. For LUAD samples, the top five enriched BP terms were 'regulation of hemopoiesis,’ 'neutrophil degranulation,’ 'neutrophil activation involved in immune response,’ 'neutrophil-mediated immunity,' and 'neutrophil activation (Fig. 2A-D).' The top two enriched CC terms were 'chromatin' and 'secretory granule membrane (Fig. 2A-D).' For LUSC samples, the top five enriched BP terms were 'neutrophil activation involved in immune response,’ 'neutrophil activation,’ 'neutrophil degranulation,’ 'neutrophil-mediated immunity,' and 'negative regulation of immune system process,' which was similar to the results of the former, while more inclined towards the immunomodulation (Fig. 2E-H). For the results of CC, the top two enriched terms were precisely the same as the former (Fig. 2E-H). Then, results of KEGG analysis indicated that altered CCRGs were mainly involved in the systemic lupus erythematosus and osteoclast differentiation of LUAD samples, meanwhile in the systemic lupus erythematosus, apoptosis, and B cell receptor signaling pathway of LUSC samples (Fig. 3).
Finally, we further investigated the differential distribution of signal pathway enrichment of differentially expressed CCRGs set between the high risk-score and low risk-score groups using the GSVA method. In LUAD samples, compared with the low-risk group, CCRGs of the high-risk group mainly enriched in the G2M_CHECKPOINT, COAGULATION, MYC_TARGETS_V2, and MITOTIC_SPINDLE. Meanwhile, CCRGs of the low-risk group mainly enriched in the MYOGENESIS and NOTCH_SIGNALING. However, the results of the LUSC samples are quite different. CCRGs of the high-risk group mainly enriched in the UV_RESPONSE_DN, KRAS_SIGNALING_UP, TNFA_SIGNALING_VIA_NFKB, INFLAMMATORY_RESPONSE, and INTERFERON_GAMMA_RESPONSE, while CCRGs were mainly involved in the E2F_TARGETS, UNFOLDED_PROTEIN_RESPONSE, MYC_TARGETS_V1, G2M_CHECKPOINT, and DNA_REPAIR in the low-risk group (Fig. 4). It indicated the differential expression and regulation of CCRGs in LUAD and LUSC. This difference may lead to different biological processes related to the circadian rhythm in cancer cells.
Construction of the prognostic risk-score model in the training set
To explore the prognostic signatures of these differentially expressed CCRGs, we preliminarily performed a univariate analysis of the standardized expression of the 290 CCRGs of LUAD and 447 CCRGs of LUSC in the training sets to identify the prognostic CCRGs respectively. The results showed that the expression of 31 CCRGs and 70 CCRGs each significantly correlated with the OS (p < 0.05) of LUAD patients and LUSC, respectively (Fig. 5A-B). Further, we performed Cox proportional hazard regression to screen the ultimate CCRGs of risk-score models. Subsequently, results indicated that the expression of 10 CCRGs correlated with the OS of LUAD and LUSC patients, respectively (Table 1). According to this, we constructed a risk-score model for predicting the prognosis of LUAD and LUSC patients using the calculation formula mentioned in the method part, respectively. Finally, CDA, POU2AF1, TUBB6, SPAG8, NT5E, ARRB1, DDIT4, HAL, PHLDB2, and AGMAT as risk genes in the risk-score model of LUAD and ALOX5AP, RALGAPA2, TIGD3, PNPLA6, ALPL, TREM1, VSIG4, CD300C, HIST1H2BH, and WNT10A as risk genes in the LUSC risk-score model (Fig. 5C-F). Survival analysis revealed that there was a significant difference between the high-risk group and the low-risk group in OS, and patients in the high-risk group significantly correlated with an inferior prognosis (LUAD: p < 0.0001, HR: 2.117, 95% CI: 1.546 to 2.900; LUSC: p < 0.0001, HR: 2.066, 95% CI: 1.552 to 2.751) (Fig. 6A-B). Finally, we also ranked the risk scores of LUAD and LUSC patients for OS and explored the distribution features (Fig. 6C-D). The dot plots revealed the status of each patient in the training sets (Fig. 6E-F). The heat maps showed the differential expression of the feature CCRGs in the high-risk and low-risk groups (Fig. 6G-H). As results show, there was an upregulation of HAL, PHLDB2, AGMAT, DDIT4, CDA, NT5E, and TUBB6 as high-risk genes and downregulation of POU2AF1, SPAG8, and ARRB1 as protective genes of LUAD patients in the high-risk score group comparing the low-risk score group. Moreover, samples with high-risk scores of LUSC patients suggested upregulation of TREM1, WNT10A, CD300C, ALPL, ALOX5AP, VSIG4, RALGAPA2 and PNPLA6 as high-risk genes and downregulation of TIGD3 and HIST1H2BH as protective genes.
Table 1
Descriptions of CCRGs of the risk-score model in Cox proportional hazard regression analysis.
Type | Gene ID | Full name or Description | Risk coefficient | HR | HR.95L | HR.95H | p-value |
LUAD | CDA | Cytidine Deaminase | -0.089059134 | 0.914791 | 0.822109 | 1.017923 | 0.102253 |
| POU2AF1 | POU Class 2 Associating Factor 1 | -0.295832916 | 0.743912 | 0.631686 | 0.876075 | 0.000392 |
| TUBB6 | Tubulin Beta 6 Class V | 0.253501154 | 1.288529 | 1.031295 | 1.609925 | 0.025669 |
| SPAG8 | Sperm Associated Antigen 8 | -0.215285427 | 0.806311 | 0.602534 | 1.079006 | 0.14751 |
| NT5E | 5'-Nucleotidase Ecto | 0.165120683 | 1.179535 | 1.040611 | 1.337006 | 0.009806 |
| ARRB1 | Arrestin Beta 1 | -0.229636783 | 0.794822 | 0.638692 | 0.989119 | 0.039588 |
| DDIT4 | DNA Damage Inducible Transcript 4 | 0.128789666 | 1.137451 | 0.974568 | 1.327557 | 0.102412 |
| HAL | Histidine Ammonia-Lyase | 0.254600207 | 1.289946 | 1.101163 | 1.511093 | 0.001613 |
| PHLDB2 | Pleckstrin Homology Like Domain Family B Member 2 | 0.232732475 | 1.262044 | 1.006844 | 1.581927 | 0.043472 |
| AGMAT | Agmatinase | 0.276379132 | 1.318348 | 1.00177 | 1.73497 | 0.048543 |
LUSC | ALOX5AP | Arachidonate 5-Lipoxygenase Activating Protein | -0.204079367 | 0.815398 | 0.638768 | 1.040868 | 0.10134 |
| RALGAPA2 | Ral GTPase Activating Protein Catalytic Subunit Alpha 2 | 0.209187368 | 1.232676 | 0.963034 | 1.577816 | 0.096734 |
| TIGD3 | Tigger Transposable Element Derived 3 | -0.338879739 | 0.712568 | 0.445497 | 1.139747 | 0.157327 |
| PNPLA6 | Patatin Like Phospholipase Domain Containing 6 | 0.262619794 | 1.300332 | 0.94122 | 1.79646 | 0.11125 |
| ALPL | Alkaline Phosphatase, Biomineralization Associated | 0.088447658 | 1.092477 | 0.977686 | 1.220746 | 0.118395 |
| TREM1 | Triggering Receptor Expressed on Myeloid Cells 1 | 0.205116552 | 1.227668 | 1.028907 | 1.464826 | 0.022834 |
| VSIG4 | V-Set and Immunoglobulin Domain Containing 4 | 0.21911239 | 1.244971 | 1.000762 | 1.548772 | 0.049206 |
| CD300C | CD300 Antigen-Like Family Member C | -0.355141335 | 0.701074 | 0.440876 | 1.114837 | 0.133452 |
| HIST1H2BH | histone cluster 1, H2bh | -0.161357251 | 0.850988 | 0.749454 | 0.966278 | 0.012805 |
| WNT10A | Wnt Family Member 10A | 0.127271005 | 1.135725 | 1.008189 | 1.279394 | 0.036246 |
Validation of the prognostic risk-score model in the testing set
We next validated the stability and accuracy of the prognostic risk-score model in the testing sets, which included LUAD and LUSC cohorts from the GEO database. The OS was selected as the key indicator to compared the groups and samples were divided into low and high risk-score groups based on the calculated risk score. The formula is as mentioned before. For the testing set of LUAD type, 519 and 544 samples were separated into low and high risk-score groups, respectively. Survival analysis showed that there was a significant difference between the high and low risk-score groups (p < 0.0001, HR: 1.493, 95% CI: 1.248 to 1.787) (Fig. 7A). Similarly, 88 samples of the low-risk group and 89 samples of the high-risk group were included in the survival analysis of the LUSC testing set. Results also suggested a significant difference between the high and low risk-score groups (p = 0.0486, HR: 1.453, 95% CI: 1.002 to 2.105) (Fig. 7B). To summarize, our results confirmed that these two risk-score models based on CCRGs signatures were all stable and accurate in predicting the prognosis of patients.
Clinical characteristics correlation analysis
In this section, we further explored the stability and reliability of the risk score as a clinical indicator. Seven variables, including age, gender, T stage, N stage, M stage, pathologic stage, and risk-score, were analyzed using the univariate analysis and Cox proportional hazard regression in the training sets of LUAD and LUSC respectively (LUAD: p < 0.001, HR: 1.229, 95% CI: 1.157 to 1.305; LUSC: p < 0.001, HR: 2.609, 95% CI: 2.073 to 3.284) (Fig. 8A-D). Our analysis showed that risk-score was found to be an independent prognostic indicator both in LUAD patients and LUSC patients. Then, we constructed ROC curves for different variables to evaluate the risk-score as classifiers, and the AUC was calculated and considered as the basis for evaluation (LUAD: AUC 0.788; LUSC: AUC 0.738) (Fig. 8E-F). Our results indicated that the risk-score had superior accuracy and predictability comparing other clinical characteristics both in LUAD and LUSC samples.
Immune correlation and infiltration analysis
Firstly, correlation analysis between immune cell infiltration levels and CCRGs of the risk-score model was performed using the TIMER. We initially assessed the correlations of characteristic CCRGs expression with immune infiltration levels in LUAD and LUSC samples using the TIMER one by one. We found that almost all CCRGs expression in the risk-score model has significant correlations with infiltrating levels of B cells, CD8 + T cells, CD4 + T cells, macrophages, neutrophils, and dendritic cells both in LUAD and LUSC (Supplementary Fig. 1–2). Further, we explored the immune infiltration level of the diverse immune infiltrating cells between the high-risk group and the low-risk group by the CIBERSORTx. We first calculated the abundances of 22 immune cell types in each sample using standardized CCGRs expression profiles in LUAD and LUSC, respectively. Then we filtered each sample according to the p-value to eliminate the bias caused by inaccurate estimation and grouped the samples based on the risk-score. Finally, 23 samples of the low-risk group and ten samples of the high-risk group were selected in LUAD. Meanwhile, of LUSC samples, 19 samples of the low-risk group, and 26 samples of the high-risk group were included for analysis. Our results suggested that immune infiltration levels in the high-risk group of naive B cells, plasma cells, naive CD4 + T cells, CD8 + T cells, dendritic cells, M2 macrophages, and mast cells decreased significantly in the LUAD compared with the low-risk group (Fig. 9A-B). Furthermore, there were significantly increased levels of infiltrates of activated memory CD4 + T cells, T follicular helper cells, regulatory T cells (Tregs), natural killer (NK) cells, monocytes, neutrophils, and M0/1 macrophage in the high-risk group. Moreover, for LUAC samples, immune infiltration levels in the high-risk group of naive B cells, resting memory CD4 + T cells, naive CD4 + T cells, CD8 + T cells, T cells follicular helper, NK cells, dendritic cells, and M0/1 macrophage decreased significantly comparing the low-risk group (Fig. 9C-D). Meanwhile, there were significantly increased levels of infiltrates of plasma cells, activated memory CD4 + T cells, Tregs, mast cells, monocytes, M2 macrophages, and neutrophils in the high-risk group. In summary, we found that there were similar commonalities in the level of immune cell infiltrations between the high-risk group and the low-risk group in LUAD and LUSC. For instance, our results showed that immune cells, which play a vital role in tumor immunity, were significantly decreased in the high-risk group, such as CD8 + T cells, dendritic cells, B cells, and naive CD4 + T cells. Meanwhile, negative regulation and inflammation-related cells, such as Tregs, mast cells, monocytes, and neutrophils, were significantly increased. These results suggested that compared with the low-risk group, anti-tumor immune responses were inhibited while the inflammatory process and hyperresponsiveness may be significantly enhanced in the high-risk group. However, there were still some differences and changes in the level of infiltration of some immune cells between the LUAD and LUSC, which might illustrate the difference between LUAD and LUSC at the level of circadian-rhythm-related immune infiltration.
Genetic alteration analysis
We investigated the genetic alteration of these CCRGs in the risk-score model to further understand their contributions to carcinogenesis by the cbioportal. We found that these risk-associated CCRGs were relatively conservative, and the mutation rates of these genes were all lower than 3% (most the percent of which were around 1%-2%) both in LUAD and LUSC (Fig. 10). The low frequent genetic alterations suggested the high stability and conservatism of these CCRGs and the crucial roles of these genes in the genetic, epigenetic, and development of lung cancer.
Related drugs and related pathways
We analyzed related drugs and signaling pathways related to these CCRGs of the risk-score model in tumor cell lines. All relevant information was retrieved by searching the GeneCards and supplemented by retrieving relevant literature in PubMed. We found that although some anti-tumor drugs targeting these CCRGs have been applied in clinical practice or were in clinical trials, there were still many blank spots in relevant research, especially for LUSC patients (Fig. 11). As the results showed, most of the existing drugs played a corresponding role in cancer through classic signaling pathways, such as the EGFR signaling pathway, NF-kB signaling pathway, and PI3K-Akt signaling pathway.