Identification of the signature genes and network of Reactive Oxygen Species (ROS) related genes and DNA repair genes in Lung Adenocarcinoma (LUAD)


 Reactive Oxygen Species (ROS) are present in high amount in patients with tumors, and these ROS can kill and destroy tumor cells. Thus, tumor cells upregulate ROS-related genes to protect themselves and reduce their destruction. Cancer cells already damaged by ROS can be repaired by expressing DNA repair genes consequently promoting their proliferation. In this work, lung adenocarcinoma (LUAD) transcriptome data in the TCGA database was analyzed and samples were clustered into 5 ROS-related categories and 6 DNA repair categories. Survival analysis revealed a significant difference in patient survival between the two classification methods. In addition, the samples corresponding to the two categories overlap, thus, the gene expression profile of the same sample with different categories and survival prognosis was further explored, and the connection between ROS-related genes and DNA repair genes was investigated. The interactive sample recombination classification was used, revealing that the patient's prognosis was worse when the ROS-related genes and DNA repair genes were expressed at the same time. The further research on the potential regulatory network of the two categories of genes and the correlation analysis revealed that ROS-related genes and DNA repair genes have a mutual regulatory relationship. The ROS-related genes NQO1, TXNRD1, and PRDX4 could establish links with other DNA repair genes through the DNA repair gene NEIL3, thereby increasing the growth of tumor cells and balancing the level of ROS, leading to tumor cell death and constant damage to the tumor cell repair system, thus prolonging patient survival. Thus, targeting ROS-related genes and DNA repair genes might be a promising strategy in the treatment of LUAD. Finally, a survival prognostic model of ROS-related genes and DNA repair genes was established (TERT, PRKDC, PTTG1, SMUG1, TXNRD1, CAT, H2AFX and PFKP), the risk score might be used as an independent prognostic factor in LUAD patients.


Introduction
Reactive oxygen species (ROS) are small oxygen-derived active small molecules, including O2·-, ·OH, RO2·, and RO· [1]. ROS can be produced by exogenous or endogenous sources, and when they are in excess compared with the amount of antioxidants in the body, the system is out of balance, and the antioxidants are not able anymore to completely remove or reduce ROS. On the one hand, their accumulation damages biological macromolecules, including DNA, leading to different type of tumors.
On the other hand, the increase of the level of intracellular ROS can allow the selective killing of tumor cells [2]. A high ROS amount is detected in most cancer patients [3]. The expression of ROS-related proteins increases in many types of cancer, and they are involved in cell growth, proliferation, differentiation, protein synthesis, glucose metabolism, cell survival and in ammation [4]. Oxidative stress and non-small cell lung cancer (NSCLC) have a mutually promoting and dependent relationship. Indeed, the presence of oxidative stress greatly increases gene damage, and the damage to the mitochondrial DNA of alveolar cells can cause energy supply barriers, promote tumor blood vessel formation, and inhibit tumor immune microenvironment. These multiple effects promote the occurrence of NSCLC. In addition, the abnormal expression of speci c transcription factors and downstream cell signaling pathways caused by and related to oxidative stress allow a rapid development and metastasis of NSCLC.
Furthermore, NSCLC cells maintain the oxidative stress response at the appropriate level for their proliferation and survival by regulating their antioxidant levels and ROS levels [5][6].
The internal and external environmental factors including ROS can cause DNA damage. If the damage is not repaired in time and correctly, it causes the instability of the genome, threatening the survival of cells. In order to maintain the stability of the structure and function of DNA in a complex genomic environment, a timely and reasonable response to damaged signals should be provided. Under the condition of DNA damage, coordinated regulation of damage repair mechanisms and dynamic chromatin changes are required for the maintenance of genetic and epigenetic information. Thus, cells should correct the damages before the replication process in order to maintain the integrity of the genetic material. Therefore, the DNA repair system plays a vital role in maintaining the normal physiological functions of cells [7]. At present, more than 100 repair enzymes are known that participate in the DNA repair process. The DNA repair system in the cell mainly includes ve pathways: direct damage reversal repair, base excision repair, nucleotide excision repair, recombination repair, and mismatch repair [8]. If the repair function is defective, or when a key protein in a speci c DNA damage repair pathway is mutated, DNA damage may lead to two results: one is cell death; the other is gene mutation, or malignant transformation into tumor cells. It is worth noting that although defects in DNA repair function can cause tumors, the DNA repair function of cancer cells is not reduced; on the contrary, it is signi cantly increased, and can fully repair the DNA damage caused by chemotherapeutic drugs. This is also one of the reasons why most anti-cancer drugs are not effective [9]. Therefore, in this work the combined action of ROS genes with DNA repair genes on the prognosis of patients diagnosed with lung adenocarcinoma (LUAD) was explored. Since this is a cancer type with a high incidence and high mortality rate, our aim was to nd a potential correlation between ROS genes and DNA repair genes, to evaluate whether the inhibition of the repair of damaged tumor cells could increase tumor cell death and ameliorate the prognosis of patients. In this way, a potential combined therapeutic therapy can be also considered.

Data Source and Pre-processing
The RNA-Seq based transcriptome pro les (FPKM; Fragments Per Kilobase of transcript per Million mapped reads) and corresponding clinical data of LUAD patients were downloaded from the Cancer Genome Atlas (TCGA) portal using the gdc-client software downloading tool. Additionally, the gene expression pro les in LUAD patients (GSE68465, sequenced using Affymetrix, HG-U133A Plus 2.0 Array, up to November 2020) were also obtained from the GEO database (http://www.ncbi.nlm.nih.gov/geo/). All analyses were performed using the R software (R Foundation for Statistical Computing, Vienna, Austria, 3.4.1 Version).

ROS and DNA repair gene acquisition and sorting
The ROS-related genes and DNA repair genes were downloaded from the Molecular Signatures Database (MSigDB) for use with the Gene Set Enrichment Analysis (GSEA) database. The intersection of these genes with the genes from TCGA was used to obtain the nal ROS-related genes and DNA repair genes.
The TCGA samples with incomplete clinical data and survival time less than 30 days were not taken into consideration and consequently removed.
Consistent clustering and screening of ROS-related genes and DNA repair related genes The ConsensusClusterPlus package of R was used to cluster ROS-related genes and DNA repair genes separately, and the survival analysis was performed to compare the prognostic differences of different categories. Genes showing signi cant differences in their expression in tumor samples and normal samples were obtained, the screening conditions were set at p<0.05 and |LogFC|>1, and nally the expression of differential genes in different categories were analyzed according to ROS genes and DNA repair genes.
Sample reclassi cation and differential gene expression analysis in different prognostic categories The categories and prognosis of some samples of the two clustering methods were different. The samples obtained from the two clusters are reclassi ed in an interactive manner and called ROS_Cn_DNA_Repair_Cm (Table 1). Then, differential genes in different categories according to ROS genes and DNA repair genes in the new category were compared.
Regulatory network and correlation analysis among target genes ROS-related genes and DNA repair genes signi cantly different in the new categories were obtained where the samples obtained from the two clusters are reclassi ed in an interactive manner and called ROS_Cn_DNA_Repair_Cm. A regulatory network was constructed using the STRING database, the correlation coe cient between the two set of genes at the same time was calculated, and then the relationship between ROS-related genes and DNA repair genes was obtained.
LASSO regression analysis for the construction of the prognostic gene model Univariate Cox proportional hazards regression analysis was performed to screen target ROS-related genes and DNA repair genes signi cantly associated with overall survival (OS) in the TCGA LUAD dataset. Then, LASSO Cox regression analysis of the identi ed OS-related genes was performed using the R-glmnet package. Multivariable Cox proportional hazards regression analysis was performed to establish the prognostic model of the target genes. The LUAD samples were divided into high risk and low risk by the median risk score; the Kaplan-Meier curve was constructed, and the log-rank test was conducted to compare the survival differences between the two groups. The ROC curve was used to evaluate the accuracy of the model. GSE68465 data were used as the validation set to further con rm the model.

Data processing results
The ROS-related gene set as the hallmark of ROS-related pathway containing 49 genes, and the DNA repair gene set Kauffmann DNA repair genes [1] containing 230 DNA repair genes were downloaded from the MSigDB and used with the GSEA. The intersection of these genes with the genes from TCGA resulted in a total of 45 ROS-related genes and 194 DNA repair genes The TCGA samples with incomplete clinical data and survival time less than 30 days were not taken into consideration and removed, and the data of 465 samples were collected for further analysis.
Consistent clustering and screening of ROS-related genes and DNA repair genes The consistent clustering of TCGA_ROS data divided the 465 samples into 5 categories. The survival analysis of the 5 categories revealed a signi cant difference in survival, with the category C3 having the worst prognosis, while the C5 having the best prognosis. The difference analysis resulted in a total of 14 ROS-related genes (11 up-regulated and 3 down-regulated genes). Then, the expression of differential genes in the 5 categories was compared, and 10 genes were signi cantly different in C1-C5 ( Figure 1). Similar to the above procedure, the consistent clustering of TCGA_DNA repair gene data divided the 465 samples into 6 categories, and survival analysis of these 6 categories revealed that C3 had the worst prognosis, while C2 had the best prognosis. Forty-nine DNA-related differential genes (48 up-regulated genes and 1 down-regulated gene) were obtained, the differences of genes in the 6 categories were compared, and the results revealed that 25 genes were signi cantly different in C1-C6 ( Figure 2). Subsequently, ROS-related genes and DNA repair genes were visualized in the ROS classi cation and DNA repair genes and ROS-related genes were visualized in the DNA classi cation in order to observe the overall expression of genes in the two classi cations. Certain differences in the expression of ROS-related genes and DNA repair genes existed, corresponding to different clustering methods. The most intuitive reaction was that ROS_C3 had the most different prognosis, and the ROS-related genes and DNA repair genes contained in it were highly expressed. The differences in the expression of the two categories of genes in other categories were not the same, which might be related to the mutual regulation of the two categories of genes ( Figure 3).

Differences in survival and gene expression in the reclassi cation samples
The samples obtained from the two clusters were interactively divided into 9 categories, as shown in the following table. The survival analysis revealed that the survival prognosis of the patients whose samples that originally belonged to the ROS category was signi cantly different after regrouping. The comparison of the expression of the genes between the different new classi cations that originally belonged to the ROS category revealed that the higher the expression of up-regulated ROS-related genes and DNA repair genes, the worse the prognosis, while the down-regulated genes (CYR2, PFKP, CAT) were positively correlated with a longer survival. (Figure4-5, Table 2) Regulatory network and correlation analysis among target genes The enrichment of differential ROS-related genes and DNA repair genes in the ROS_Cn_DNA_Repair_Cm category was visualized by the Venn diagram, and the intersection between the differential genes of the ROS category and DNA repair category was performed to obtain a total of 29 target genes ( Figure 6). These 29 differentially enriched genes were imported into STRING to construct a gene regulation network and calculate the correlation coe cient among genes. The results showed that the DNA repair genes had a strong internal regulatory relationship. DNA repair genes and ROS-related genes could be linked through NEIL3-TXNRD1, and the Pearson correlation coe cient between the two was 0.60. In addition, the CYR2 gene showed a negative correlation with other ROS-related genes and DNA repair genes, while NQO1, PRDX4, and IPCEF1 showed a weak negative correlation with other genes (Figure 7-9).
Prognostic model and genes associated with prognosis A total of 49 DNA repair genes and 14 ROS-related genes from the TCGA LUAD data were analyzed by Univariate Cox regression. Twenty-eight genes were associated with a prognosis and were entered into the LASSO regression analysis (Figure 10), and a total of 8 genes (TERT, PTTG1, SMUG1, PRKDC, H2AFX, PFKP, TXNRD1, and CAT) were identi ed to build the model. The prognostic value of the risk scores was assessed, which were estimated with the formula: risk score = ∑ Xβ* coef β, where coef β was the coe cient and Xβ was the gene relative expression (risk score = TERT*0.102+PTTG1*0.012+SMUG1*0.123+PRKDC* 0.005+ H2AFX*0.002+ PFKP*0.003+TXNRD1*0.0006+CAT*-0.003). As regard the TCGA LUAD data, the risk score in both univariate and multivariate analysis was signi cantly related to OS (HR = 4.494, 95% CI = 2.563-7.880, p < 0.001; HR =4.155, 95% CI = 2.258-6.645, p < 0.001, respectively) ( Figure 12 a-b). The patients with lowrisk scores showed a signi cantly better prognosis than those with a high-risk score (Figure 13 a-b) both in TCGA and GEO LUAD data, as demonstrated by the Kaplan-Meier cumulative curve. The AUC of the risk score was 0.731, which implied that the Cox model could predict the prognosis quite well (Figure 12 c).

Discussion
ROS is produced in many cellular compartments including mitochondria, which are the major source of ROS (mROS) [11]. Superoxide anion (•O2-), hydrogen peroxide (H2O2) and hydroxyl radical (•OH) belong to a group of highly reactive and heterogeneous molecules derived from oxygen (O2) and are the main forms of ROS in biological systems. [12] Many factors in the tumor microenvironment, including the presence of ROS, promote the progress of solid tumors. The increase of ROS level, the imbalance of redox homeostasis and the enhancement of antioxidant capacity are some of the many signs in cancer cells. Therefore, the understanding and elucidating the role of ROS in the tumor microenvironment is essential for developing new methods to combat this disease [13]. Various tumors, including LUAD, possess high levels of ROS with abnormal metabolism and constitutive carcinogenic signals. ROS are the main effectors of DNA damage associated with cancer and is accompanied by tumor suppression [14][15]. Therefore, tumor cells adapt to the oxidative DNA damage to prevent cell destruction by regulating cell necrosis through the modi cation in the expression of some genes, thereby inducing the aberrant expression of signaling networks that cause tumorigenesis and metastasis [16]. 8-hydroxyguanine is the strongest product of oxidative stress in cells, and is mostly closely related to the occurrence and development of tumors. The DNA repair gene can hydrolyze 8-hydroxyguanine in the base pool to avoid base mismatch and replacement. Once the 8-hydroxyguanine in tumor cells is hydrolyzed by the DNA repair gene, it promotes tumor cell growth. Certain protective effects lead to a malignant phenotype, poor cancer prognosis, or resistance to treatment. [17][18] In some cases, tumors up-regulate the mutagenic repair pathways to survive. Therefore, cancer cells generally rely more on repair pathways than normal cells. In addition, cancer cells often have dysfunctional redox homeostasis, and therefore once again, they rely heavily on mechanisms that repair oxidative DNA damage and inhibit enzymes that modify compounds, which can then be incorporated into genomic DNA in their unmodi ed form. Processes such as replication and oxidative stress provide a background for ongoing DNA damage in cancer cells and can provide a potential therapeutic window for compounds that exacerbate these processes. Such compounds can accomplish by further emphasizing replication, weakening the ability of cancer cells to handle high levels of replication or oxidative stress, or potentially inhibiting DNA repair and related processes [19][20][21].
Therefore, in this work, the synergistic tumorigenic effect of ROS-related genes and DNA repair genes was evaluated, and the regulatory relationship between the two groups of genes was further explored. It is important to consider whether it is better to use ROS to kill cancer cells or to inhibit the DNA repair in cancer cells to improve patient prognosis.
The expression of ROS-related genes and DNA repair genes was used to cluster TCGA tumor samples uniformly. ROS-related genes divided tumors into classes, and DNA repair genes divided tumor samples into classes. Signi cant differences in survival between the internal classi cations were obtained by the two clustering methods, and the differentially expressed genes were further screened. Our analysis found that the samples that originally belonged to the ROS classi cation partial overlapped in the classi cation of DNA repair genes. After reclassifying the samples according to the two classi cations, the prognosis of patients changed when the expression of ROS-related genes and DNA repair genes in the samples changed. Thus, our hypothesis was that ROS-related genes and DNA repair genes might have a mutual regulatory relationship, which in turn affected the occurrence and development of tumors. A total of 29 differential genes were nally identi ed and included 5 ROS-related genes and 24 DNA repair genes. STRING analysis of the regulatory relationship found that 3 ROS-related genes (NQO1, TXNRD1, and PRDX4) can be repaired by the DNA repair gene NEIL3 and other DNA repair genes.
A large amount of evidence showed that NQO1 has a "Janus" effect in cancer biology, playing a role in suppressing cancer and promoting tumors [22]. NQO1 is constitutively expressed at a relatively low level in various normal tissues. Under oxidative stress, NF-E2 p45-related factor 2 (Nrf2)/Kelch-like ECH-related protein 1 (Keap1) signaling pathway can cooperate to transcribe a series of defense genes and provide cells with multiple layers of protection against carcinogenesis. These measures include the immediate elimination of ROS [23]. The expression of NQO1 is considered as a practical and economical way to control cancer. NQO1 is abnormally up-regulated in solid tumors, and high levels of NQO1 are associated with poor patient prognosis. It is known that cancer cells have a signi cant increase in ROS production compared to normal cells. In this case, high levels of NQO1 in cancer can help cancer cells to cope with the increased ROS just like normal cells, thus, tumor growth and metastasis is not only not compromised, but promoted [24]. Our results showed that NQO1 was correlated with the expression of the DNA repair gene NEIL3 (Pearson correlation coe cient), suggesting its role as a tumor control gene The cytoplasmic selenoprotein thioredoxin reductase 1 (TXNRD1) has several different effects related to cancer including the protection of normal cells to evolve into cancer cells or the protection against the promotion of cancer progression. TXNRD1 has a unique connection with Nrf2 signaling and ribonucleotide reductase-dependent deoxyribonucleotide production and it supports a variety of antioxidant systems against oxidative stress. Thus, it is essential that metabolic pathways regulated by TrxR1 are affected in cancer [25]. Our regulatory network suggested that TXNRD1 had a signi cant correlation with the DNA repair gene NEIL3, thus, it might be considered as a potential targeted gene in a combination therapy affecting ROS-related genes and DNA repair genes.
Peroxiredoxin 4 is a typical peroxidase 2-Cys antioxidant in the endoplasmic reticulum, which protect cells against oxidative stress by detoxifying hydrogen peroxide, thus promoting cell survival [26]. The role of PRDX4 in cancer received considerable attention. The expression of PRDX4 in NSCLC-derived endothelial cells is higher than that in normal cells [27]. Sul redoxin is an antioxidant protein induced by H2O2 that acts as a catalyst for reducing the peroxidized PRDXs to reduce their peroxidase activity.
Sul redoxin is more inclined to combine with PRDX4 than other PRDXs. The up-regulation or downregulation of the sul redoxin-PRDX4 axis can affect the mitogen-activated protein kinase pathway, cAMP response element binding protein and activator protein-1/matrix metalloproteinase axis pathway [28]. Furthermore, another study revealed that the expression of PRDX4 is closely related to the disease-free survival time and short recurrence time of patients with early-stage lung squamous cell carcinoma undergoing early radical surgery. [29] Endonuclease VIII-like 3 (NEIL3) is a DNA glycosylase protein that is involved in oxidative and interstrand crosslink DNA damage repair. [30] NEIL3 is highly expressed in various human cancer cells and is associated with metastatic cancer, indicating that it may be necessary to maintain cancer cell growth or malignant progression. [31,32] NEIL3 overexpression is positively correlated with homologous recombination and mismatch repair gene expression. High NEIL3 expression may promote cancer phenotype by increasing genomic instability and/or interfering with other DNA repair [30]. Our analysis found that NEIL3 played a pivotal role in the connection between DNA repair genes and ROS-related genes. Therefore, the mutual regulation of ROS-related genes and DNA repair genes centered on NEIL3 might become an important topic for further studies.
A prognostic model based on all differentially expressed ROS-related genes and DNA repair genes was constructed and combined with the clinical data of the samples, and nally 9 genes were selected to calculate the risk score. The results revealed that the prognosis of patients in the high-and low-risk groups was signi cantly different, and the GEO data veri ed this result. The multivariate analysis suggested that the risk score could be used as an independent prognostic factor to evaluate patient prognosis. The above mentioned model genes included 3 ROS-related genes and 6 DNA repair genes, and TXNRD1 gene played an important role in the regulatory network of the two groups of genes, as revealed by previous studies.
Thus, this study might highlight the signi cance of ROS-related genes and DNA repair genes in LUAD, and the combined target of ROS and DNA repair genes might be a promising strategy in the treatment of LUAD, although further studies should be performed to validate these ndings.

Declarations
Authors' contributions YQ has designed the research; FHM and ZY analyzed data and wrote the paper, WXP retrieved and collected data, YWJ were responsible for drawing, WC and LB revised the manuscript. All authors read and approved the nal manuscript.   Figure 11 Kaplan-Meier analysis of OS for LUAD patients using TCGA and GEO database.

Figure 12
Construction of ROS and DNA-repair-related genes model for patients with LUAD.