Identification of a novel methylation gene sites signature to predict prognosis in kidney renal papillary cell carcinoma by comprehensive profiling

Most kidney renal papillary cell carcinoma (KIRP) patients are diagnosed at the advanced stage and have a poor prognosis. DNA methylation plays an important role in prognosis prediction of cancers. Identification of novel DNA methylation site biomarkers could be beneficial for the prognosis of KIRP patients. In this study, we aimed to find an efficient methylated sites model for predicting survival in KIRP.


Background
Renal carcinoma is a heterogeneous tumor, of which epithelial renal cell carcinoma (RCC) accounts for the majority of cases. There are over ten recognized histological subtypes of RCC but malignancies of chromophobe, kidney renal papillary cell carcinoma (KIRP), and the kidney renal clear cell carcinoma (KIRC) are the three most common subtypes. Among these subtypes, KIRP has the second highest morbidity rate (10%-15% of RCC cases), while KIRC has the highest incidence (75%-80% in RCC [1][2][3][4][5]). About 30% of RCC patients present with distant metastasis at diagnosis and have a poor prognosis. In clinical studies, patients with KIRC and malignancies of chromophobe often have ideal results. However, patients who were diagnosed with KIRP frequently experience significantly worse clinical outcomes [6][7][8]. Because most of the RCC presented are the KIRC subtype, researchers have a relatively good understanding of KIRC pathogenesis. The KIRP subtype has been less well studied than KIRC, so what we know is very limited [9][10]. Many therapies for advanced RCC are based on blocking known KIRC pathways using mTOR inhibitors and tyrosine kinase inhibitors to regulate the HIF1α and VEGF pathways, however, there are very few treatments for KIRP.
KIRP is a renal parenchyma malignant tumor, including two different subtypes (type 1 and type 2), which is often observed as papillary or tubulopapillary architecture [11]. Just like the KIRC, VEGF inhibitors and mTOR inhibitors have been developed based on the understanding of specific molecular sites [12]. Although researchers are beginning to develop therapeutic targets for KIRP, such as foretinib and capozantanib, these drugs are specific for type 1 KIRP but not the more aggressive type 2 KIRP [13][14]. Therefore, we must identify new and powerful molecular markers for prediction and treatment sites which can help in the development of new targeted drugs specific for KIRP.
Although cancer occurrence and development mainly depend on the alteration of tumorassociated genes, epigenetic changes such as the DNA methylation of tumor-related genes play an important role in the molecular barrier against tumor development [15][16]. DNA methylation is often considered a mechanism of gene silencing and it functions directly in many cellular processes such as embryonic development, transcription, genomic imprinting, and X-chromosome inactivation [17][18][19]. DNA methylation signatures have already been used in the early diagnosis and prognosis of cancers. For example, in breast cancer, the poor prognosis in patients may be correlated with CDH1 promoter methylation [20]. In addition, the DNA methylation of the promoter regions of P16, CDH13, APC, and RUSSF1A in stage I patients with non-small-cell lung cancer may be associated with early recurrence [21,22].
There are also a large number of methylation biomarkers that have been proposed for the prediction of RCC [23], including the single methylation biomarkers for prognosis such as CRHBP [24], RCVRN [25], AR [7], CDO1 [26], BMP-2 [27], KEAP1 [5], and DAB2IP [28]. While promising, tumorigenesis is a complex process that requires the involvement of multiple genes, thus many of these biomarkers are imperfect [29]. Considering that the occurrence and development of KIRP is a complex process that requires the joint regulation of multiple omics, it is necessary to establish a molecular marker model with high sensitivity and strong predictive ability to elucidate the prognosis of KIRP.

4
In this study, we aimed to find potential survival-related DNA methylation sites signatures in KIRP, which may pave the way for the development of novel prognostic markers and therapeutic targets for KIRP.  Table 1. Other DNA methylation data were retrieved from the Gene Expression Omnibus (GEO) database (GSE126441, n=14), which were used to validate the methylated level of signature genes. To improve the data accuracy, DNA methalytion sites on sex chromosomes were excluded. Genes with missing expression values in >30% of samples or patients were removed, and the remaining missing values were entered by the knearest neighbour method. Genes whose RPKM expression values were 0 in all samples were excluded [30]. The technical route to select the DNA methylated sites signature is shown in Fig. 1.

Identification of DMGs and DEGs associated with KIRP
To identify the differentially methylated genes (DMGs), we adopted the Benjamini-Hochberg false-discovery rate (FDR) method to adjust the P value for each gene. The DMGs were identified by a fold change >2, p-value <0.05, FDR<0.05, and Beta value >0.1.
After identifying multiple DMGs and DEGs from these datasets, we screened nine hub genes, which are genes both differentially expressed and enriched in differential methylation, between the DMGs and DEGs.
Constructing a prognostic DNA methylation signature in the training dataset Statistics were employed to build a model based on reports of a better method to construct a signature module [31][32]. Gene methylation often occurs at specific loci, and the methylation of the gene is composed of multiple methylation sites, so in order to make the detection more accurate, we looked for methylation sites of hub genes, and then identified the sites associated with survival. Methylation sites were validated in the GEO dataset. After identifying the methylation sites of hub genes, we randomly divided all the KIRP methylation samples into two groups, the test group (87 cases) and training group (174 cases). The two groups were separate and uncrossed. Univariate Cox proportional hazards regression analysis was used to identif the association between survival time/status and each methylation site in the training data set [33]. In order to screen the most powerful and accurate DNA methylation sites to predict KIRP prognosis, multivariate Cox regression analysis was then used to build a model to assess the prognosis risk according to the following expression: (see Formula 1 in the Supplementary Files) where N represents the number of prognostic methylation gene sites, is the methylation value of the gene sites, and is a single factor Cox regression coefficient. When the coefficient of the < 0, we defined it as a favorable prognosis site, while the sites with the coefficient of > 0 were considered as a poor prognosis site. Risk Score (RS) is the multinode weighted sum of risk scores.

Statistical analysis
The selected methylated sites were used to construct a risk model. KIRP patients were dichotomized into either high risk or low risk groups in the training dataset, the median risk score was used as a cutoff value. Kaplan-Meier survival analysis and ROC analysis were used to validate the effective prognostic ability of the methylation gene site signatures. We then validated the prognostic ability of the DNA methylation signature in the test dataset. Furthermore, multivariable Cox regression analysis was carried out to identify whether the DNA methylation signature was an independent factor in survival prediction; we considered P<0.05 indicate a statistically significant difference. All analyses were performed with the R statistical program (version 3.5.1).

Generating the nomogram
We generated a nomogram by using the "RMS" package of R software. The nomogram concordance index (C-index) of all patients was obtained by multivariate Cox regression analysis. The higher the C-index, the more accurate the prediction. The nomogram was used to calculate the total score of each patient. Overall scores were then used to predict 1-year, 3-year, and 5-year survival rates [34].
Functional annotation of the selected DNA methylation signature genes In order to further study the function of survival-related DNA methylation signature genes, we used gene ontology (GO) analysis (http://www.geneontology.org) to investigate the roles of all the selected genes, and KEGG pathway analysis (http://www.genome.jp/kegg/) to determine the significant pathways. Fisher's exact test and chi-square tests were used to select significant GO and pathway categories, with the threshold of significance of P 0.05.

7
Identification of DMGs and DEGs associated with KIRP After preprocesing the methylation dataset, there were 261 cancerous tissues remaining.
We identified the DEGs and DMGs in the two datasets. To identify the KIRP-related DMGs, we performed comparisons between 276 cancerous tissues and 45 adjacent tissues from KIRP patients. A total of 88 DMGs (Table S1) were identified in the methylation dataset (p<0.05; Δβ>0.1), among these there were 48 hypomethylated genes and 40 hypermethylated genes. For the DEGs identified, we compared the 289 cases and 32 controls in the gene expression dataset. In total, we obtained 5109 DEGs (Table S2) of which 3076 were upregulated and 2033 were downregulated. After identification of the DMGs and DEGs, there were nine overlapping genes with hypomethylated-high-expression and hypermethylated-low-expression, which contained seven hypomethylated-highexpression and two hypermethylated-low-expression genes as described in Fig. S1; the nine hub genes are shown in Table S3.
Identifying the four DNA methylation sites signature in the training group The 261 patients were randomly divided into two groups (training group, n=174; test group, n=87) to identify and test the prognostic methylated gene sites found in the KIRP patients. In total, we identified 47 methylation sites (Table S4) in the nine hub genes.
Then, univariate Cox proportional hazards regression analysis was conducted in the training group to identify the methylation sites significantly associated with overall survival time from the 47 sites. Eight methylated sites were significantly related to the survival of KIRP patients (P < 0.05, Fig. 2a, Table S5). In order to select the greatest prediction power signature, we conducted multivariable Cox regression analysis and built a model with the four methylated gene sites set (cg04448376, cg24387542, cg08548498, and cg14621323, Fig. 2b) to assess the prognosis risk. The risk score (RS , Table S6)  Identification of the survival power of the DNA methylation signature Each patient got a risk score from the selected methylated signature, and the median risk score was used as the cut off to divide the training group patients into either the low-risk group (n = 87) or high-risk group (n =87). Kaplan-Meier survival analysis showed that the overall survival (OS) rate of the low-risk group was significantly higher than that of the high-risk group (OS rate: 97.7% vs. 75.9%, log-rank test P<0.001; Fig. 3a). To validate the prediction power of the DNA methylation signature, we validated it in the test group using the same prognostic risk score model, and found significant differences between the highrisk and low-risk groups (Fig. 3b). In the test dataset, the high-risk group had a significantly lower OS rate than the low-risk group (OS rate: 69.8% vs. 90.9%, log-rank test P=0.032).
The four methylation site signature has great survival predictive power To test the predictive ability of the DNA methylation signature model, we conducted a separate ROC analysis, which showed high predictive ability of the four methylation signature sites in the training group (AUC Signature =0.791, Fig. 3c); this further shows that the signature in our study is a new, highly accurate prognostic prediction marker. Similar results were found in the test group (AUC Signature =0.704, Fig. 3d).

Nomogram of combined methylated sites signature and clinical variables predict patients' OS
Our multivariate Cox regression model demonstrated that the predictive power of the signature risk score was independent of clinical characters (High-risk group vs. Low-risk group, HR = 1.40, 95% CI: 1.01-2.00, P=0.045, Fig. 4a). According to the above analysis 9 results, we developed a methylated gene sites nomogram, which combined the clinicalrelated factors (stage, age, and sex) and methylated gene sites signature. In the training group, the calibration chart of the five-year operating system is well predicted (Fig. 4b).

Validated methylation sites in independent GEO cohorts
To confirm the four-methylation site pattern in different populations, we evaluated the samples (4 patients vs. 10 normal) in GSE126441 (Table S7). It showed that cg04448376 and cg14621323 have been down regulated and cg08548498 and cg24387542 have been up regulated in patients vs. normal (Fig. 5a,b,c,d), which is the same methylation pattern In our study, we identified 47 methylated gene sites from nine differentially methylated genes (DMGs), with opposite differential expression patterns. We used various statistical approaches to identify a four methylated sites signature from the 47 methylated sites. The signature that we selected can separate the KIRP patients into high-risk and low-risk groups with significantly different survival times in the training and test datasets, indicating it has a powerful prediction ability. The independence of the selected DNA methylation genes signature in predicting OS in the entire dataset was identified using multivariable Cox regression analysis, which confirmed that the risk score of DNA methylation sites signature was independent of OS.
The ROC curve showed that the AUC is 0.791 in the training group and 0.742 in the test group. Considering that larger AUC usually indicates better prediction power, this result further demonstrated that the DNA methylation signature in our study is a high accuracy novel prognostic marker and has important clinical value. In addition, our DNA methylation sites signature did not depend on other clinical features. Moreover, we validated the methylation gene sites signature and methylation sites in the TCGA and GEO cohorts and demonstrated their ability to predict overall survival of KIRP patients. We also established a methylation gene sites nomogram including methylation gene sites signature and clinical-related risk factors (e.g. stage and age) to predict OS. So, our study results help in understanding the development KIRP and for developing tailored therapy, and ultimately may contribute to an increase in survival rates of KIRP patients.
In addition, we analyzed the function of the selected DNA methylation genes. The four methylation sites cg04448376, cg24387542, cg08548498, and cg14621323, were located in UTY, LGALS9B, SLPI, and PFN3, respectively. SLPI is a gene encoding secretory leukocyte protease inhibitor, a 11.7 KDa serine protease inhibitor, and is one of the members of the whey acidic protein four-disulfide core family [37,38]. SLPI can reduce the activities of trypsin, neutrophil elastase, chymotrypsin and cathepsin G [39]. Therefore, SLPI may be a potential tumor marker to predict the prognosis [39]. Previous studies have shown that SLPI is related to tumor metastasis. In some high-risk, aggressive or metastatic tumors, such as pancreatic, uterine cervix, papillary thyroid, and ovarian cancers, SLPI was often found to be highly expressed [40][41]. However, in bladder tumors, nasopharyngeal carcinoma and some breast cancers, SLPI has a low expression. SLPI has high expression in gastric cancer cells with serosa invasion, and SLPI overexpression in gastric cancer cell lines can improve the cell migration and invasion rate [38]. These observations are entirely consistent with the previous observation that the expression of SLPI in tumors is often associated with poor prognosis. As far as we know, the present study is the first report of SLPI methylation sites as a prognostic biomarker in human KIRP.
SLPI is hypomethylated and overexpressed in this report just like the previous study demonstrated, so we believe that SPLI is a favorable biomarkers for prognosis.
UTY is located on the Y chromosome and can encode a demethylase and was reported to be an epigenetic-related gene. UTY is essential in the development of teratoma through regulation of epigenetic changes [42][43]. In urothelial bladder cancer (UBC), 22.8% (8/35) of patients were found to have a reduced UTY copy number and cell proliferation was found to increase in a UTY knockout. UTY also plays an important role in some regulatory pathways, such as the NF-B and p53 pathways [44,45]. In this study, we showed that methylation of UTY was significantly associated with KIRP survival, and that UTY can act as a survival-related methylation biomarker for KIRP.
For LGALS9B and PFN3, there is very little known about their regulatory mechanisms. The LGALS9B gene was initially thought to represent a pseudogene of galectin 9, however, the 12 association between it and tumors is unclear. This gene is one of two similar loci on chromosome 17p similar to galectin 9 and is now thought to be a protein-encoding gene.
We have found that its functions are primarily associated with protein binding. Thus, we suspect that it is similar in function to galectin 9. Galectin-9 was reported to be related to metastasis and immunosuppression. In terms of its role in cancer, galectin-9 can affect different aspects in the process of tumor growth and aggressiveness, such as metastasis and immunomodulation [46][47]. In breast carcinoma, liver cancer and cervical tumors, LGALS9 expression affects disease prognosis [48][49][50][51] LGALS9B, SLPI, and PFN3 have important roles in KIRP.
The limitations of this study need to be recognized. First, the samples of our study are entirely retrospective and inherent biases may influence the results. Hence, we may have lost signatures that are potentially correlated with KIRP survival. Secondly, we have not further searched the mechanism of action of these DNA methylation genes in KIRP. Finally, although we identified the selected DNA methylation sites as a powerful prediction signature, applying it in a clinical setting will require more research. Despite these drawbacks, the significant and consistent correlation between our four methylated site signature and overall survival in two independent datasets indicated that it is a potentially powerful prognostic marker for KIRP.

Conclusions
In summary, we identified a novel methylated sites signature to predict prognosis in KIRP.
We confirmed this signature could serve as potential robust and specific biomarker in the 13 prognosis prediction and tailored therapy for KIRP patients.

Funding
Not applicable.

Availability of data and materials
All data used and analyzed during this study are available from the corresponding author on reasonable request.

Ethics approval and consent to participate
The contents of this article are data mining from the The Cancer Genome Atlas database (TCGA) database. The TCGA database is open and shared and doesn't require informed patient consent.

Consent for publication
Not Applicable. and apoptosis of human melanoma cell lines and its clinical significance.  Additional File Legends    Validated methylation sites of signature in independent GEO cohorts.

Supplementary Files
This is a list of supplementary files associated with the primary manuscript. Click to download.