Comprehensive Analysis of DNA Damage Repair Genes and Identification of RAD54L as an Immunological and Prognostic Biomarker in Clear Cell Renal Cell Carcinoma and Other Cancers


 Background: DNA damage repair (DDR) plays a pivotal role in the tumorigenesis and progression of multiple cancers, including clear cell renal cell carcinoma (ccRCC), and immunotherapy is galvanizing research on ccRCC, while the interactions between DDR and tumor microenvironment (TME) in ccRCC still remain elusive. Methods: The expression, mutation and clinical data were downloaded from public datasets in TCGA, GTEx and human protein atlas (HPA) database. Consensus clustering analysis was used to cluster subtypes based upon the expression of DDR genes. Cox regression analysis was adopted to conduct survival analysis, and qRT-PCR method was used to verify the expression of RAD54L in ccRCC samples. GSEA was employed to explore potential enriched KEGG pathways. CIBERSORT, ssGSEA and xCell algorithms were used to evaluate the tumor microenvironment (TME).Results: Two subtypes were identified according to the expression of DDR genes in ccRCC. Subtype1 was correlated with increased proportion of higher-grade tumors, worse prognosis and lower PD-L1 expression compared to subtype2. Distinct TME was also noted between two subtypes. GSEA revealed that the TGF-β signaling pathway was significantly enriched in subtype1. RAD54L was subsequently determined as a potential immune-related DDR gene, which was positively associated with PD-L1 expression and its elevated expression predicted unfavorable prognosis in ccRCC. qRT-PCR also verified the overexpression of RAD54L in ccRCC samples. Additionally, a systematic analysis involving 33 cancer types demonstrated that RAD54L was remarkably upregulated and its overexpression tightly linked to worse prognosis in multiple cancers. Moreover, both xCell and ssGSEA algorithm showed the strong associations between RAD54L expression and immune infiltration in more than 30 cancers. Conclusions: DDR is implicated in regulating TME of ccRCC, and RAD54L is a potential immunological and prognostic biomarker in multiple types of cancer, including ccRCC.


Background
As a life-threatening disease, cancer is a major obstacle to improving global life expectancy, causing approximately 10 million deaths worldwide in 2020 [1]. Meanwhile, there are an estimated 431,300 new cases and 179,400 deaths attributed to kidney cancer each year globally [1]. Although advances have taken place in the diagnosis and treatment over the past years, clear cell renal cell carcinoma (ccRCC), which accounts for nearly 80% of kidney cancer, continues to reduce the quality of life and threaten the lives of patients all over the world [2]. Recently, immunotherapy has become a crucial treatment option for patients with advanced tumors, especially for ccRCC, where it has shown unprecedented therapeutic bene ts [3,4]. However, there are still considerable number of patients who cannot acquire long term clinical bene ts because of insensitivity to immune checkpoint blockade (ICB) therapy. With the expansion of public databases such as TCGA and the development of a range of algorithms for immune in ltration assessment [5][6][7], it is possible to discovery new immunotherapeutic targets by screening for potential immune-related genes and performing pan-cancer analysis of these genes to evaluate the correlations between their expression and tumor microenvironment (TME) and clinical outcomes [8][9][10][11].
DNA damage repair (DDR) pathway is one of the major pathways in tumorigenesis [12], and the widespread applications of chemotherapy and radiotherapy demonstrated that the tumor DDR de ciency has long been a therapeutic target in oncology [13,14]. In recent years, patients with variations in DDR genes (e.g., MSH2, POLD2 and POLE) have been noted to obtain durable bene t from ICB therapy, suggesting that DDR de ciency might contributed to ICB response [15]. As a tumor with high genomic instability[16], ccRCC is not likely to occur and develop without the alterations in DDR pathway [11]. Many literatures reported that alterations in the expression of DDR genes in uence the proliferation and drug resistance of ccRCC cells [17][18][19]. Nonetheless, there is currently a lack of comprehensive understanding of DDR genes in ccRCC, particularly the interaction between DDR genes and TME.
Here, we identi ed two ccRCC subtypes according to the expression levels of DDR genes by performing consensus clustering analysis. Discrepant prognosis, differential PD-L1 expression and TME were observed between two subtypes. Subsequently, RAD54L was determined as a potential immune-related DDR gene. The potential functions of RAD54L were explored by using GSEA method. Moreover, we systematically explored the expression and prognostic values of RAD54L in pan-cancer. xCell and ssGSEA algorithm were combined to analysis the associations of RAD54L expression with TME in more than 30 cancers. Our results provided novel insights into the interactions between DDR genes and TME in ccRCC, and revealed that overexpression of RAD54L tightly linked with unfavorable prognosis and immune in ltration in multiple cancers.

Data processing and analysis
The pan-cancer RNA-seq and clinical data were downloaded from the TCGA(https://portal.gdc.cancer.gov/) and GTEx database (https://gtexportal.org/home/) in January 2020, including 10201 samples of 33 cancer types (including 530 ccRCC tissues and 72 normal tissues). "Limma" package was used to obtain the differentially expressed genes (DEGs), "Adjusted P < 0.05 and Log 2 |FC|>1" were de ned as the thresholds for the screening of DEGs. R software "maftools" package was applied to analyze genetic mutation data.
Consensus clustering analysis and gene set enrichment analysis (GSEA) R software package "Consensus Cluster Plus" was used for consistency analysis, the consensus matrix from k=2 to k=6 were generated to select optimal clustering, and the total samples were drawn 100 times to obtain optimal clustering subtypes. The GSEA was conducted to investigate the KEGG signaling pathways between two subtypes or two RAD54L expression groups via using "Cluster Pro ler" package.
To obtain reliable meaningful pathways, the tests were calculated 1000 times, adj. p values and false discovery rate (FDR) less than 0.05 were recognized as a signi cantly meaningful pathway.
qRT-PCR analysis of human clinical samples A total of 21 paired ccRCC and normal tissues were acquired from patients that underwent radical nephrectomy (in 2013) at the rst a liated hospital of Zhejiang University. Total RNA was extracted by using TRIzol reagent (Takara), then reversely transcribed into cDNA for RT-qPCR. the SYBR Premix Ex Taq (Takara) and ABI 7500 fast real-time PCR System (Applied Biosystems) were adopted to detect RAD54L expression. GAPDH was used for normalizing mRNA expression of RAD54L. Additional information of primers and ccRCC patients were provided in Supplementary Table 1.

Immunohistochemistry (IHC) Staining
To assess the relative protein expression of RAD54L in multiple cancers and corresponding normal tissues, IHC staining images of 9 types of cancer and corresponding normal tissues, including BRCA, CESC, COAD, KIRC, LIHC, LUAD, LUSC, PAAD and STAD, were downloaded and analyzed from the human protein atlas (HPA) (http://www.proteinatlas.org/) database. All samples were treated with same anti-RAD54L antibody (ID: HPA028954).

Analysis of RAD54L expression with clinical outcomes
With the help of "survminer" and "survival" package of R software, cox regression analysis was employed to explore the associations of RAD54L expression with clinical outcomes of multiple cancer types, including overall survival (OS) and disease-free survival (DFS). The forest plots were generated by using the "forestplot" package, and the nomogram was developed by adopting the "rms" package.

TME analysis
To obtain a convincing evaluation of immune in ltrating correlation analysis, CIBERSORT algorithm [6] was utilized to deconvolute the proportions of tumor-in ltrating immune cells between two ccRCC subtypes. To better demonstrate the correlations of RAD54L expression with TME in ccRCC, three RAD54L positively co-expressed genes were determined via utilizing "stat" package, the ssGSEA algorithm of "GSVA" package [20] was rstly employed to evaluate the relationships of RAD54L expression and its positively co-expressed genes with diverse tumor in ltrating immune cells. The xCell algorithm [7] was ulteriorly applied to investigate the links between RAD54L expression and TME (including Th2 cell) across more than 30 types of cancer.
The expression values of immune escape marker genes (e.g., TIM3, TIGIT, PD-1, LAG3, CTLA4 and PD-L1) were extracted and used to assess the associations between their expression and RAD54L expression in pan-cancer.

Statistical analysis
Wilcox test was utilized to implement the intergroup comparisons and visualized with "ggplot2" package.
Spearman correlation analyses were implemented to assess the associations between RAD54L expression and co-expressed genes. p < 0.05 denoted statistical signi cance. Statistics were processed by the GraphPad Prism 8.0 and R software 4.0.3.

Results
Differences in clinical features and prognosis of two ccRCC subtypes based on the expression of DNA damage repair (DDR) genes.
According to previous studies [21,22], we obtained 231 DDR genes (Supplementary Table 2). Consensus clustering analysis was adopted to identify subtypes based on the expression of these selected 231 DDR genes in TCGA ccRCC cohort. Consequently, compared with consensus clustering matrix for k=3, 4, 5 and 6 (Supplementary Figure 1), the k=2 was determined as the most appropriate clustering from k = 2 to 6 ( Figure 1a-1c). Thus, a total of 530 ccRCC patients were split into two subtypes, i.e., Subtype1 (S1, n=165) and Subtype2 (S2, n=365). To explore the heterogeneity between two subtypes, we further analyzed the clinical characteristics and prognosis of patients of these two subtypes. As displayed in Figure 1d Discrepant tumor microenvironment (TME) of two ccRCC subtypes Given the marked heterogeneity between two subtypes, immune-related analyses were also performed to investigate the TME of two subtypes. Firstly, in comparation with the subtype1 and normal group, a remarkably elevated PD-L1 expression levels were noted in subtype2 and ccRCC group, respectively (P<0.001, Figure 2a-b). Subsequently, by using CIBERSORT algorithm, discrepant enrichment of tumorin ltrating lymphocytes (TIL) was observed in two subtypes (Figure 2c), notably, subtype1 exhibited higher enrichment of Tregs, CD8+ T cell and T follicular helper cell (Figure 2d), whereas subtype2 tightly linked with the in ltration of CD4+ memory resting cell, activated Mast cell and Monocyte (Figure 2d). GSEA was subsequently conducted to investigate the regulatory mechanisms that lead to this difference in TME of two subtypes. Consequently, there were 3 KEGG pathways signi cantly enriched in subtype1 (Figure 3e-3g), including the TGF-β signaling pathway (NES=1.806; Figure 3e), Focal adhesion (NES=1.735; Figure 3f) and JAK-STAT signaling pathway (NES=1.588; Figure 3g). These signaling pathways might implicated in the discrepancy in TME of two subtypes.
Identi cation of RAD54L as the potential immune-related DDR gene in ccRCC As previously described, distinct PD-L1 expression and TME were noted between two subtypes de ned according to the expression of DDR genes. These ndings implied the potential roles of DDR genes in regulating TME of ccRCC. Further, with the help of R software "stat" package, most DDR genes (191/231, Supplementary Table 3) were found to be positively associated with the expression of PD-L1 (Spearman Cor>0, P<0.05, Figure 3a). then by analyzing the RNA-seq and clinical information from the TCGA database, 1437 genes (including 6 DDR genes) were found to be remarkably upregulated in ccRCC compared to normal samples (adj. P<0.05, log 2 FC>1, Figure 3b), and 34 DDR genes were shown to be detrimental to overall survival (OS) in ccRCC patients (P<0.05, Figure 3c). Subsequently, an intersection of DDR genes positively linked to PD-L1 expression, highly expressed in ccRCC and correlated with worse OS were used to determine potential immune-related DDR genes among the 231 DDR genes. As displayed in

Associations of RAD54L expression with prognosis in ccRCC
Further, the prognostic values of elevated RAD54L expression were analyzed, where, highly expressed RAD54L group showed unfavorable OS (P=0.0027) and DSS (P=0.00017) in comparation with the lowly group (Figure 4a-b). Cox analyses were undertaken to analysis the prognostic values of RAD54L expression and other clinical variates, univariate analysis revealed that RAD54L expression and 6 clinical variates including age, TNM stage, pathological stage and histologic grade were tightly linked to the OS of ccRCC patients (P<0.05, Figure 4c), and multivariate analysis, displayed as forest plot in Figure 4d, demonstrated that RAD54L expression, age, and M stage can independently predict OS for ccRCC patients (P<0.05). Moreover, with the help of R "rms" and "survival" packages, a nomogram model was developed via combining RAD54L expression with other variates to help predict 1-y, 3-y and 5-y OS of ccRCC patients (Figure 4e).
Associations of RAD54L and its co-expressed genes with TME in ccRCC GSEA was then undertaken to explore the potential KEGG pathways of upregulated RAD54L in ccRCC, and the results revealed that 5 signaling pathways were remarkably enriched in highly expressed RAD54L samples, including T cell receptor signaling pathway, cell cycle, JAK-STAT signaling pathway, antigen processing presentation, and cell adhesion molecules pathway (adj. p and FDR <0.05, Supplementary  Figure 5a). Further, through performing spearman correlation analysis, EZH2, KIF18B and EME1 were noted to be the 3 strongest correlated genes (Figure 5b), besides, these 3 genes also remarkably overexpressed in ccRCC (Supplementary Figure 2a) and their elevated expression tightly linked to worse OS and DSS of ccRCC patients (Supplementary Figure 2b-c). Subsequently, ssGESA algorithm was adopted to initially assess the relationships between these 4 genes and TIL in ccRCC, consequently, RAD54L and its 3 co-expressed genes tightly linked to the enrichment of T helper cells, Th2 cells and Tregs, while negatively associated with the in ltrating levels of NK cells, Th17 cells and Mast cells ( Figure  5c-5f). These ndings revealed that RAD54L and its co-expressed genes might be implicated in regulating TME of ccRCC.
Correlations of RAD54L with TME in pan-cancer Furthermore, the xCell algorithm was adopted to evaluate the links between RAD54L expression and the TME in pan-cancer. As depicted as a heat map in Figure 6a, a correlation analysis involving 33 cancer types revealed signi cant associations of RAD54L expression with the in ltrating levels of Th2 cells in 32 cancer types, in addition, RAD54L also showed positive correlations with the excessive production of immune-escape markers (Figure 6b). Moreover, the ssGSEA algorithm was also carried out to further evaluate the links between the RAD54L expression and Th2 cells in pan-cancer, as shown in a series of scatter plots, striking correlations were noted between the enrichment of Th2 cells and RAD54L expression in 30 cancer types (Supplementary Figure 3). These ndings revealed that RAD54L might be implicated in regulating the enrichment of Th2 cells and modulating the TME of cancers.

The expression and prognostic values of RAD54L in pancancer
To comprehensively evaluate the transcriptional and translational expression levels of RAD54L in pancancer, the RNA-seq from TCGA and GTEx database were combined with IHC staining results of HPA database to illustrate the expression levels of RAD54L in multiple cancer types. As displayed in Figure 7a, compared to corresponding normal tissues, the expression of RAD54L were increased in 24 cancer types. Specially, Breast invasive carcinoma (BRCA) (Figure 7b), Cervical squamous cell carcinoma and endocervical adenocarcinoma (CESC) (Figure 7c

Discussion
Cancer genomes bear endogenous DNA damage repair (DDR), which has been one of the primary targets in ghting against tumors in recent years [14,23]. Emerging evidence suggests that DDR defects is capable of activating anti-tumor immunity [24]. However, there is no comprehensive research on the crosstalk between DDR and TME in ccRCC. Here, we identi ed two ccRCC subtypes with discrepant prognosis, differential PD-L1 expression and TME based on the expression of DDR genes. Among these, RAD54L was determined as a potential immune-related DDR gene. Furthermore, a pan-cancer analysis showed that RAD54L is signi cantly upregulated in various cancer types and its elevated expression associated with unfavorable prognosis and immune in ltration in multiple cancer types.
In recent years, accumulating studies have attempted to construct cancer subtypes to identify different prognostic and treatment-sensitive groups in order to guide clinical practice [25][26][27]. For instance, Gay et al. classi ed small cell lung cancer (SCLC) into four subtypes based on tumor expression data, SCLC-I subtype was noted to bene t most from the combination of immunotherapy to chemotherapy, while other subtypes also have different therapeutic sensitivities, including BCL-2, Aurora kinase or inhibitors of PARP [27]. Here, we classi ed ccRCC into two subtypes according to the DDR genes expression. Subtype1 was correlated to increased proportion of high-grade tumors, unfavorable prognosis and lower PD-L1 expression compared to subtype2, which indicated that subtype2 may obtain more bene ts from immunotherapy, even if there has been metastasis. Further analysis revealed that subtype1 exhibited higher enrichment of Tregs, CD8 + T cell and T follicular helper cell, whereas subtype2 was tightly linked with the in ltration of activated Mast cell and Monocyte. Intriguingly, unlike other cancers, a high degree of CD8 + T cell in ltration is thought to be associated with poor prognosis in ccRCC patients [28][29][30]. Meanwhile, a previous study classi ed ccRCC into four subtypes based upon multi-omics data, as a consequence, CD8 + in amed subtype was associated with worse prognosis and high tumor heterogeneity, while CD8-in amed subtype was correlated to innate immunity which exhibited an augmented level of monocyte and macrophage in ltration [29]. Recent studies have shown that DNA damage can activate cGAS-STING pathway [31] or other pathways [32] to exert anti-tumor immune effects. In our study, we applied GSEA method to explore the potential KEGG pathways between two distinct subtypes. Our results demonstrated that the TGF-β signaling pathway, focal adhesion, and JAK-STAT signaling pathway were signi cantly enriched in subtype1. TGF-β signaling pathway is a classic pathway that modulates carcinogenesis and TME [33], as well as an extremely important target for immunotherapy [34]. Moreover, numerous studies revealed tight connections between DDR and TGF-β signaling pathway in cancer [35][36][37]. These ndings implied that DDR might in uence the TME of ccRCC by activating the TGF-β signaling pathway.
RAD54L was then determined as a potential immune-related DDR gene, and its overexpression was positively associated with PD-L1 expression and predicted unfavorable prognosis in ccRCC. RAD54L was reported to be involved in DNA repair and homologous recombination process [38,39], and the oncogenic roles of RAD54L have also been explored in several malignancies [40,41]. However, the potential roles of RAD54L in ccRCC remain unclear. Herein, we discovered the upregulation of RAD54L in cccRCC by analyzing the RNA-seq data from TCGA and veri ed the increased expression in human ccRCC tissues via qRT-PCR method. Also, elevated RAD54L expression independently predicted poor OS for ccRCC patients. Furthermore, given the prevalence of DDR involvement in cancer progression, we comprehensively evaluate the mRNA and protein expression levels of RAD54L in pan-cancer through analyzing multi-omics data from TCGA, GTEx and HPA database, and we found that RAD54L was signi cantly increased in various cancers in comparation with their normal counterparts. Moreover, in addition to ccRCC, elevated RAD54L expression was also closely linked with unfavorable prognosis in multiple cancer types. These discoveries suggested that RAD54L is a potential prognostic biomarker for a wide range of cancers.
The GSEA results demonstrated that RAD54L was associated with the T cell receptor signaling pathway and cell cycle pathway, indicating the potential role of RAD54L in regulating TME. To obtain a convincing assessment of immune in ltration correlation analysis, we applied multiple methods and algorithms to elucidate the associations between RAD54L and TME. First, through spearman correlation analysis, EZH2, KIF18B and EME1 were identi ed as the three genes most positively associated with RAD54L expression, and ssGSEA algorithm revealed strong correlations of RAD54L and the three most coexpressed genes with immune in ltration. Interestingly, accumulating evidence have demonstrated that EZH2 is a potential therapeutic target associated with immunotherapy resistance [42][43][44], which exhibited the strongest expression correlation with RAD54L in ccRCC. Additionally, given the generalizability of immune response in tumors, xCell and ssGSEA algorithm was combined to explored the correlations of RAD54L expression with the immune cell across more than 30 cancer types. Both algorithms unveiled the inextricably connections between RAD54L and Th2 CD4 + T cells in more than 30 cancers. Th2 cells are well known to promote cancer progression by suppressing immunity [45,46]. Gonzalez et al. reported that human breast cancer cells are capable of constructing a Th2 in ammed TME to escape immunity by producing thymic stromal lymphopoietin (TSLP) [47]. Mahata et al. demonstrated that the tumors can evade immunity by inducing steroid biosynthesis in Th2 cells [48]. These ndings illustrated that RAD54L might serves as a potential immunological and prognostic biomarker in ccRCC and others by taking part in the in ltrating levels of Th2 CD4 + T cells. Yet, there are several worth-mentioning limitations in this study. First, the clustering analysis and the assessment of TME were performed by analyzing the data from the TCGA as the clinical in the present cohort was insu cient. Furthermore, the exact mechanism by which DDR affects TME and the interactions between RAD54L and TME needs further validation through a series of large clinical studies and basic research.

Conclusions
In summary, this study determined two independent subtypes based on the expression of DDR genes in ccRCC. Distinct prognosis, differential PD-L1 expression and TME were discovered between two subtypes. TGF-β signaling pathway might be involved in the process of DDR affecting TME. RAD54L was identi ed as a potential immune-related DDR gene. Further comprehensive analysis revealed that RAD54L was signi cantly upregulated and associated with unfavorable prognosis and immune in ltration in multiple cancer types. These ndings indicated that the DDR is implicated in regulating TME of ccRCC, and RAD54L serves as a potential immunological and prognostic biomarker in multiple cancers, including ccRCC.

Availability of data and materials
All the datasets were open access and derived from the TCGA database (https://portal.gdc.cancer.gov/), GTEx database (https://gtexportal.org/home/) and HPA (http://www.proteinatlas.org/) database. further inquiries can be directed to the corresponding author.

Ethics approval and consent to participate
The study was approved by the Ethics Committee of the First A liated Hospital, School of Medicine, Zhejiang University. Patients provided their written informed consent to participate in this study.

Consent for publication
Written informed consent was obtained from all participants. Clinical features and prognosis of two ccRCC subtypes. a Consensus clustering matrix for k = 2. b cumulative distribution function (CDF) curves for k=2-6. c CDF Delta area curve of k=2-6. d Distribution of S1-S2 across ccRCC gender, TNM stages and tumor grades. the value is -log10 (p value), p value was analyzed by chi-square test, *p < 0.05. e-g Kaplan-Meier Curve of e Overall survival, f Progression Free survival, and g Disease Speci c survival of S1-S2 in ccRCC. S1 and S2 represents Subtype1 and Subtype2, respectively. expression levels in a ccRCC/Normal and b S1/S2 tissues. c Heatmap visualized the distribution of 22 tumor-in ltrating immune cell in S1/S2. d Relative enrichment of immune cells in S1/S2 tissues. *p < 0.05; **p < 0.01, ***p < 0.001. e TGF-β signaling pathway. f Focal adhesion. g JAK-STAT signaling pathway. NES, Normalized enrichment score; FDR, False discovery rate.   Relative expression levels of RAD54L in pan-cancer. a pan-cancer analysis of RAD54L expression in TCGA and GTEx database. b-j RAD54L mRNA expression in tumor and normal tissues (left) and immunohistochemistry stain images in normal (middle) and corresponding cancer (right) tissues. Relative transcriptional and translational expression levels of RAD54L in b BRCA, c CESC, d COAD, e KIRC, f LIHC, g LUAD, h LUSC, i PAAD, j STAD and their corresponding normal tissues. All tissues were treated with the same anti-RAD54L antibody (antibody ID: HPA028954). *p < 0.05; **p < 0.01, ***p < 0.001.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.