A hypoxia-related gene pairs signature for predicting overall survival in lung adenocarcinoma

Background: Increasing evidence has depicted the clinical importance of the correlation between hypoxia and immune status in lung adenocarcinoma environment (LUAD). However, the reliable prognostic signatures based on the interaction of hypoxia and immune status are still limited. Therefore, we strived to construct a hypoxia-immune-related gene pairs signature for risk assessment and stratication for patients with LUAD. Methods: Gene expression proles and clinical data of patients with lung adenocarcinoma were acquired from two public data sets, used as training and validation cohorts respectively. Different bioinformatics and statistical methods were applied to construct a robust hypoxia-related gene pairs (HRGPs) signatures for predicting overall survival in lung adenocarcinoma. Furthermore, we explored the correlation between HRGPs signature and inltrating immune cells using the CIBERSORT algorithm in LUAD samples. Results: Among 146 hypoxia-related genes, a 13 HRGPs signature was built that was signicantly associated with OS in the training cohort (P<0.01). In the validation cohort, the HRGPs signature stratied patients into high- and low-risk groups with a signicant OS difference (P=0.04). After adjusting the other clinical factors, the developed HRGPs signature remained independent in multivariate analysis. Plasma cells and NK cells resting were signicantly correlated with the OS of patients with LUAD. Functional analysis showed that the high-risk group was enriched in terms of DNA replication, lymphocyte apoptotic process, and cell cycle-related pathways. Conclusion: The HRGPs signature represents a promising risk model for patient’s stratication in LUAD. It might provide new insights into clinical decision-making regarding individualized treatment.

Background LUAD is the most common subtype of non-small cell lung cancer (NSCLC) [1]. Currently, the standard treatment for individual LUAD patients is based on a combination of surgery, postoperative chemotherapy and radiotherapy [2]. Despite signi cant progress in treatment, the prognosis for LUAD remains dismal [3]. Conventional survival classi ers for LUAD patients, such as differentiation grade and tumor-node-metastasis (TNM) staging, only considers anatomical and clinicopathological factors [4].
However, LUAD is a heterogeneous malignancy, and its prognosis can be signi cantly different even for patients with similar clinical features, suggesting that current classi ers are insu cient [5]. Therefore, development of novel markers with higher predictive value is urgently needed.
Hypoxia, or lack of oxygen, is a characteristic feature of most solid tumor [6]. Increasing evidence has found that the hypoxic environment in tumors directly stimulated the malignant properties of cancers, including advanced aggressiveness and resistance to therapies [7][8][9][10]. Meanwhile, the in ltration of immune cells in the tumor is also a crucial component that can affect the tumor invasion and metastasis [11]. Interestingly, there was a direct or indirect interaction of immune status and hypoxia in the lung cancer microenvironment, although their underlying mechanisms remain unclear [9,12].
Researchers have identi ed several molecular subtypes of LUAD based on gene expression data, which were used as new markers for survival prediction [13][14][15]. However, due to the large-scale gene expression data, none of them have been applied to daily clinical practice. Recently, a new method based on relative ranking of gene expression levels to eliminate the shortcomings of data scaling and normalization has been reported, and has produced robust results in various studies [16,17].
In this study, we explore the correlation between hypoxia and immune in LUAD, and identi ed an individualized prognostic signature for LUAD by incorporating HRGPs, aimed to improve the prognostication of LUAD patients.

Results
Construction of the prognostic HRGPs signature A total of 146 HRGs were measured on all platforms and ful lled the criteria of the training cohort (MAD > 0.5). From these 146 HRGs, 1645 HRGPs were constructed. After removing HRGPs with relatively small variation (MAD=0), 677 HRGPs were kept as candidates. Then we developed the HRGPs index using Lasso Cox regression on the training cohort and selected 13 HRGPs in the nal risk model (Figure1 A-B).
This HRGPs signature consisted of 24 unique HRGs (Table 2). Next, we used the HRGPs index to calculate the risk score for each patient in the training cohort. The patients were strati ed into high-or low-risk groups based on the optimal cut-off point which was set at 0.101 using time-dependent ROC curve analysis ( Figure 1C). The survival curves suggested that the high-risk group presented a signi cantly poor OS than the low-risk group in the training cohort (P <0.001, Figure 2A). We further evaluated the prognostic power of HRGPs signature based on other clinical factors in the TCGA-LAUD dataset using the univariate and multivariate Cox proportional hazards regression analysis. The results of the univariate analysis showed that the clinical stage and risk model presented a prognostic effect on the OS of patients with LUAD ( Figure 3A). After adjusting for other clinical variables, the risk model remained an independent prognostic factor in multivariate analysis (HR=3.3178, 95% CI 2.351-4.296, P < 0.001, Figure 3B).

Validation of the HRGPs signature for prognostic prediction
To validate the constant prognostic value of HRGPs signature, we applied the same formula to other independent populations. GSE31210 (n=226) served as an external validation cohort and were divided into high-and low-risk groups based on the optimal cut-off value (HRGPs index= 0.101). when compared the survival difference between these two groups, patients in the high-risk group had a poorer OS than those in the low-risk group (P =0.04, Figure 2B). The risk model also remained independent from other clinical variables in univariate and multivariate Cox analyses ( Figure 3C-D). Besides, The HRGPs signature presented a higher C-index when compared with other clinically applicable biomarkers, including RNA-binding proteins signature, immune-related signature, and autophagy-related signature ( Figure 4) [13,18,19].

Correlation of the HRGPs signature and immune cells in ltration
Increasing evidence has been shown that tumor-in ltrating immune cells had an impact on tumors' prognosis [20,21]. We applied the CIBERSORT algorithm to calculated the abundance of 22 types of immune cells for each sample within different risk groups. Figure 5A depicted that several immune cells, such as plasma cells, T-cells CD4 memory activated, T cells follicular helper, T cells regulatory (Tregs), NK cells resting, NK cells activated, Macrophages M0, dendritic cells resting, mast cells resting, mast cells activated, and neutrophils, were signi cantly enriched between different risk groups (P <0.05). Next, we performed the survival curves based on these immune cells. The results showed the plasma cells and NK cells resting were signi cantly correlated with the OS of patients with LUAD (P <0.05, Figure 5B-C).

Functional analysis of the HRGPs signature
To characterize the biological process determined by the HRGPs signature, we performed the GO analysis and GSEA. The results of the GO analysis showed that the HRGPs signature was mainly enriched in the terms of corni cation, glycolmetabolism, and chromosome-related process ( Figure 6). GSEA demonstrated that DNA replication, lymphocyte apoptotic process, and cell cycle-related pathways were signi cantly altered in high-risk groups (FDR < 0.05, Figure 7).

Discussion
As a hallmark of solid tumors, hypoxia is an imbalance between increased oxygen demand and oxygen supply inadequate, which is associated with high proliferative rates [22]. Previously, numerous studies reported that hypoxic phenotypes in solid tumors caused various biological processes, including malignant metastasis, angiogenesis, metabolic alteration, and inducing resistance to treatment [23][24][25]. These ndings support that hypoxia has prognostic value and can be a therapeutic target for tumors. In recent years, some new prognostic biomarkers for LUAD have already appeared, which were used to select patients who are at high-risk and who may bene t from the personalized therapy. However, their accuracy of survival estimation remains uncertain. Therefore, we strived to identify a robust prognostic model for LUAD patients by detecting the HRGs systematically and comprehensively.
Considering the technical bias caused by sequencing platforms and biological heterogeneity of LAUD, previous risk models need to standardize gene expression data, which is di cult for clinical practice. Instead, we used a new model to perform the comparing of gene expression without addressing the technical bias of different platforms [26]. This method has been considered to have reliable results in many studies, including cancer molecular classi cation [27,28]. The developed prognostic model is achieved through pairing comparison of gene expression levels and relative ranking, rather than data pretreatments, such as normalization and scaling.
In this study, we found that both the immune status and hypoxia of the tumor microenvironment were associated with the survival of LUAD patients. Moreover, the adverse effects of hypoxia status were signi cant correlated with prognosis, even after adjusting clinicopathologic risk factors. Finally, an HRGPs signature consisted of 24 unique HRGs was constructed as a prognosis classi er, with high predictive value in risk strati cation both in the training cohort and validation cohort. Most of the genes involved in HRGPs are related to oxygen metabolism. Within these 24 HRGs, elevated expression of NDRG1 contributed to chemoresistance and cancer growth in NSCLC [29]. EFNA1 (ephrinA1) is a member of the ENF family, and it has been studied that over-expression of EFNA1 induced hypoxic environments and pro-angiogenic function in various cancer cells [30]. Decreased expression of FBP1 was associated with cell epithelial-mesenchymal transition, invasion, and metastasis, which present poor prognosis in prostate cancer [31]. Plasma cells had been considered to be related to better survival in diverse cancer [32]. We found a signi cantly elevated in ltrating of plasma cells in the low-risk group, which was associated to better OS. Besides, the function of NK cells resting in LUAD prognosis remains unknown. In our study, the abundance of NK cells resting was high in the high-risk group and correlated with a poorer prognosis for patients with LUAD. Moreover, we also found that glycometabolism, chromosome-related process, and cell cycle-related pathways were related to HRGPs signature. These pathways were wellknown risk factors that affect tumor progression [33][34][35]. These results suggest the crucial role of HRGPs signature in the development of LUAD patients.
Several limitations exist in this study. First, the prognostic signature is based on the gene expression pro les produced by RNA-seq or microarray platforms, which is di cult to popularize in daily clinical applications due to its high price, long conversion cycle, and requirement of bioinformatics expertise.
Second, considering that the microenvironment characteristics might be distinct in different tumor regions, such as tumor core and invasive margin. Samples used for analyses were all collected from the core of the tumor, and it is impossible to evaluate the immune and hypoxia status in different tumor regions. Therefore, ndings in this study are waiting for further validation by multicenter, well-designed, prospective studies.

Conclusion
In conclusion, we systematically explored the prognostic value of the HRGPs, which could assess the risk strati cation and modify the immune status for LUAD patients. These biological functions of the hypoxia-related genes provide a deep insight into the role of our signature in the progression of LUAD. This signature might serve as a prognostic classi er for clinical decision-making regarding individualized treatment and follow-up schedule.

Data collection and processing
In total, two independent public datasets were used to do this comprehensive analysis. The RNA-seq data of 494 LUAD patients were downloaded from The Cancer Genome Atlas (TCGA) data portal (https: //portal.gdc.cancer.gov/) and used as the training cohort. GSE31210 was the other RNA-seq data of 226 LUAD patients obtained from the Gene Expression Omnibus (GEO) and served as a validation cohort [36].
For TCGA-LUAD datasets, the fragments per kilobase of exon per million fragments mapped (FPKM) value was used to measure all the mRNA expression. For GSE31210 datasets, the original expression levels were normalized using the robust multi-array average (RMA) method from the "affy" R package [37]. The corresponding clinicopathological information of included cohorts was listed in Table 1.

Development and validation of HRGPs signature for LUAD
The cases from TCGA-LUAD datasets were used to develop the prognostic model. The hallmark gene sets of hypoxias comprised of 200 genes were collected from the molecular signature (MSigDB version 6.0) of the Kyoto Encyclopedia of Genes and Genomes (KEGG) database (https://www.kegg.jp/; ID:04066), including 15 genes involved in "Increase oxygen delivery" and 11 genes related to "Reduce oxygen consumption". We measured the hypoxia-related genes (HRGs) on all the cohorts included in this study and with relatively high variation (determined by median absolute deviation > 0.5). Next, each HRGP was computed by pairwise comparison of the gene expression level for every sample. The output is 1 if the rst gene is larger than the later one and 0 for the reverse order. We removed the HRGPs with a low variation (median absolute deviation, MAD =0) and kept those remaining as initial candidate HRGPs signature for prognosis prediction. Finally, from the initial candidate HRGPs, an index was calculated using the Lasso Cox proportional hazards regression with 10-fold cross-validation (by "glmnet" R package) [38]. As a result, the selected gene pairs were used to construct the nal prognostic model. Based on the optimal cut-off value of the HRGPs index determined by the time-dependent receiver operating characteristic (ROC) curve (using "survivalROC" R package) at three years for overall survival (OS), the patients were then strati ed into high-and low-risk groups.
The prognostic capability of the HRGPs signature was further validated in GSE31210 datasets using a log-rank test. Besides, we also assessed the independent predictive value of HRGPs signature from other clinical factors by univariate and multivariate Cox analysis in each cohort.

Correlation of HRGPs with the tumor-in ltration immune cells
Immune cells are the main components of the tumor microenvironment and correlated to the prognosis of malignancy. We strived to estimates the relative abundance of various immune cells in different risk groups using the CIBERSORT algorithm, which was a machine learning method based on support vector regression in gene expression data of each TCGA-LUAD sample [39]. The gene expression data of TCGA were uploaded to the online analytical platform CIBERSORT portal website(http://cibersort.stanford.edu/), using the default signature matrix at 1,000 permutations. CIBERSORT quanti ed the proportions of 22 kinds of in ltration immune cells, including B cells, T cells, neutrophils, and macrophages, amongst others.

Gene functional enrichment analysis
To elucidate the molecular function of the HRGPs signature, we conducted the Gene Ontology (GO) annotations using the "clusterPro ler" (version 3.12.0) R package [40]. Gene set enrichment analysis (GSEA) was performed using the Bioconductor package "fgsea" with 10,000 permutations [41]. The whole RNA-seq of all TCGA-LUAD samples was used for GSEA based on the C5 collection in the molecular signature database. The differential gene sets from high-and low-risk groups were compared, and those with FDR q < 0.05 and NOM p < 0.05 were de ned as signi cant.

Statistical analysis
All the statistical tests were performed using R (version 3.6.0). The differences among groups were compared using the student's t-tests. Survival analysis was conducted using the Kaplan-Meier method and analyzed using the log-rank test from the "survival" package. The Cox proportional-hazards regression model was used to perform the univariate and multivariate analyses. The concordance (C)index was applied to test the predictive value of signature using the "survcomp" package [42]. The primary endpoint was OS and the p-value of two-sided less than 0.05 was considered as a statistically signi cant difference.  Figure 1 Identi cation of HRGPs signature in the training cohort. (A) The LASSO coe cient pro les of prognostic HRGPs signature with the optimal λ value of minimum criteria; (B) The LASSO Cox regression proportional model was used to construct the robust signature; The partial likelihood deviance is plotted with the log(λ) and error bars representing standard error (SE); The dotted vertical lines are depicted at the optimal value by minimum criteria and 1-SE criteria; (C) Time-dependent ROC curves for HRGPs index at three years; The optimal score of 0.101 was identi ed as cut-off value for dividing patients into high-and low-risk groups.

Figure 2
The Kaplan-Meir curves of OS between different groups in the training cohort (A) and the validation cohort (B).  Gene set enrichment analysis showed the upregulated pathways in the high-risk group.