Integrated analysis reveals the prognostic significance of pyrimidine metabolic rate limiting enzymes in patients with lung adenocarcinoma

Background: Reprogramming of metabolism is illustrated in many types of cancer and is associated with the clinical outcomes of cancer patients. However, the prognostic significance of pyrimidine metabolism signaling pathway in patients with lung adenocarcinoma (LUAD) is unclear. Methods: Using the Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA) datasets, we determined the prognostic significance of pyrimidine metabolic rate limiting enzymes in patients with lung cancer. The enrichment of pyrimidine metabolism signaling pathway was analyzed by gene set enrichment analysis (GSEA). The expression profiles of pyrimidine metabolic rate limiting enzymes were identified from Affymetrix gene expression data and RNA-seq data. The Kaplan-Meier Plotter was used to identify the prognostic significance of the pyrimidine metabolic rate limiting enzymes. Spearman’s correlation and multivariate cox regression were used to determine the correlation efficiency of pyrimidine metabolic rate limiting enzymes. Results: The pyrimidine metabolism signaling pathway is significantly enriched in LUAD tissues, but not in lung squamous cell carcinoma (LUSC) tissues. Compared with normal lung tissues, the pyrimidine metabolic rate limiting enzymes are highly expressed in lung tumor tissues. Furthermore, the high expression levels of pyrimidine metabolic rate limiting enzymes are associated with unfavorable prognosis in patients with LUAD. Moreover, we demonstrate that the hypo-DNA methylation, DNA amplification and TP53 mutation are contributing to the high expression levels of pyrimidine metabolic rate limiting enzymes in cancer cells. Finally, we reveal the significant connections between the pyrimidine metabolic rate limiting enzymes. Comprehensively, the pyrimidine metabolic rate limiting enzymes are highly expressed in patients with bladder cancer, breast cancer, colon cancer, liver cancer and stomach cancer. And the high expression levels of pyrimidine metabolic rate limiting enzymes are associated with unfavorable prognosis in patients with liver cancer. Conclusions: Pyrimidine metabolism signaling pathway is associated with the development of lung adenocarcinoma. Pyrimidine metabolic rate limiting enzymes CAD, DTYMK, RRM1, RRM2, TK1, TYMS, UCK2, NR5C2 and


Background
Lung cancer is one of most commonly diagnosed cancer and the leading cause of cancer related mortality [1][2][3]. Although some improvements of treatment of lung cancer have been achieved in the past few decades, the 5-year survival rate of patients with lung cancer is still low [4,5]. Lung cancer is a heterogeneous disease, including small cell lung cancer and non-small cell lung cancer (NSCLC) [6]. NSCLC accounts for the 85% of lung cancer cases and could be further divided into 3 major pathologic subtypes: lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC) and large cell carcinoma [7]. Each subtype of NSCLC demonstrates different molecular profiles and different drug response [8,9]. Although, gene alterations [10,11], mRNA expression signature [12,13], microRNA profiles [14,15], long non-coding RNAs [16,17], immune signature [18] and tumor microenvironment [19] are used for the prognosis of patients with NSCLC, more candidate biomarkers are needed to be identified.
Reprogramming of cell metabolism is a hallmark of cancer [20]. Cancer cells increase glucose uptake and utilize aerobic glycolysis to facilitate the uncontrolled cell proliferation [21]. Glycolysis-related gene signature is associated with the overall survival of lung adenocarcinoma patients [22]. Besides the mis-regulation of glucose metabolism, the normal pyrimidine metabolism is also disrupted during the development of cancer [23]. The disruption of the pyrimidine metabolism is reflected by the malfunctions of the pyrimidine metabolic rate limiting enzymes. The high expression levels of pyrimidine metabolic rate limiting enzymes CAD, CTPS, CTPS2, DHODH, DTYMK, NT5C2, NT5C3, RRM1, RRM2, TK1, TK2, TYMS, UCK2 and UCKL1 are illustrated in poorly differentiated liver cancer patients and correlated poor clinical outcomes [24]. Inhibition of pyrimidine synthesis by targeting pyrimidine metabolic rate limiting enzymes DHODH and CAD could accentuate the molecular therapy response in patients with glioblastoma [25]. Also inhibition of pyrimidine synthesis sensitizes triple negative breast cancer cells to chemotherapy agents [26]. However, the prognostic significance of the pyrimidine metabolism singling pathway in patients with lung adenocarcinoma is unclear.
In order to reveal the prognostic significance of pyrimidine metabolic rate limiting enzymes in patients with lung adenocarcinoma, we used large cohorts of lung cancer patients derived from Gene 4 Expression Omnibus (GEO) datasets and The Cancer Genome Atlas (TCGA) datasets. Overall, the analysis of GEO and TCGA datasets in the present study allowed an improved understanding of the functions of the pyrimidine metabolic rate limiting enzymes. The results also indicated the potential biomarkers of the pyrimidine metabolic rate limiting enzymes for further clinical studies.

Data collection
The TCGA LUAD and LUSC gene expression, DNA mutation and DNA methylation, along with the clinical datasets were downloaded from the TCGA hub (https://tcga.xenahubs.net). The LUAD and LUSC gene expression data was generated from RNA-seq and the DNA methylation data was The gene expression series matrix of normal and cancerous lung tissues was downloaded from the GEO website (www.ncbi.nlm.nih.gov/geo) and included GSE7670, GSE10072, GSE18842, GSE19188, GSE27262, GSE30219, GSE31210, GSE31908, GSE33532 and GSE75324 datasets. The DNA methylation data of patients with lung adenocarcinoma was downloaded from the GEO datasets with GEO number GSE32867 and GSE62948. All the GEO expression datasets were based on Affymetrix Human Genome microarray. The detailed description of the collected data used in this study was illustrated in Fig. 1A.

GEO data processing
The GEO expression datasets were processed using R software (version 3.5.0, https://www.rproject.org/). The matrix file of each dataset was annotated with corresponding platform. When multiple probes corresponded to the same gene symbol, the expression values were averaged using 'plyr' package (version 1.8.5) in R software. Plyr package includes multiple tools for splitting, applying 5 and combining data and could be downloaded from bioconductor (https://cran.rproject.org/web/packages/plyr/index.html). The different gene expression between normal and lung cancer samples was determined using paired Student's t test. The different DNA methylation intensity between normal and lung cancer samples was also determined using paired Student's t test.

Gene set enrichment analysis (GSEA)
The metabolic singling pathways enriched in lung cancer gene expression profiling were determined using GSEA software (version 2.0) [27]. The GSEA software and the signaling pathways gene sets were downloaded from the GSEA web site (www.broad.mit.edu/gsea/index.html). Genes were ranked by the signal-to-noise ratio, and statistical significance was determined by 1,000 gene set permutations. The results of significance should meet the criteria of nominal P value less than 0.05.
The clustering distance was determined by the 'correlation' method. Other parameters were provided in the usage of the 'pheatmap'.

Survival analysis using GEO datasets
The Kaplan-Meier Plotter (https://kmplot.com/analysis/) [28,29] was used to identify the association between the expression levels of the pyrimidine metabolic rate limiting enzymes and overall survival in patients with lung cancer derived from GEO datasets. The Kaplan-Meier Plotter is an online survival analysis tool to rapidly assess the prognostic effects of genes using GEO microarray data. The patients were divided by the auto select best cutoff using the expression of the pyrimidine metabolic rate limiting enzymes. P values were determined using Log-rank test.

Survival analysis using TCGA datasets
R statistics software 'survival' package (version 3.1-8) was used to identify the clinical influence of pyrimidine metabolic rate limiting enzymes on overall survival in patients derived from TCGA LUAD, LUSC, LIHC, BRCA and STAD datasets. The 'survival' package and the basic usage were downloaded 6 from bioconductor (https://cran.r-project.org/web/packages/survival/index.html). The patients were divided into two clusters based on the mean expression levels of the pyrimidine metabolic rate limiting enzymes. Kaplan-Meier estimator was applied to determine the clinical outcomes in patients with high expression levels and low expression levels of pyrimidine metabolic rate limiting enzymes. P values were determined using Log-rank test.

Oncoprints of the pyrimidine metabolic rate limiting enzymes
The genomic alterations of pyrimidine metabolic rate limiting enzymes in LUAD patients were downloaded from cbioportal (version 3.2.0) based on the TCGA datasets (http://www.cbioportal.org/index.do).

Correlation plots of the pyrimidine metabolic rate limiting enzymes
Correlation plots of the pyrimidine metabolic rate limiting enzymes were created using the 'corrplot' package (version 0.84) in R. The 'corrplot' package and the basic usage were downloaded from bioconductor (cran.r-project.org/web/packages/corrplot/index.html). The Spearman's correlation test was used to demonstrate the correlation efficiency.

Multivariate cox regression
Multivariate cox regression was analyzed by 'coxph' method in R software 'survival' package (version 3.1-8). The 'survival' package and the basic usage were downloaded from bioconductor (https://cran.r-project.org/web/packages/survival/index.html). Log-rank test was used to calculate the P values.

Statistical analysis
The box plots were generated from prims5.0. Statistical analysis was performed using the paired Student's t test. P value less than 0.05 was chosen to be significantly different.

Results
Pyrimidine metabolism signaling pathway is highly enriched in lung tumor samples across different datasets.
In order to reveal the metabolism related transcriptional profiling of lung cancer, we analyzed lung 7 cancer patients with expression data from previously published GEO datasets. Totally, 1290 samples were collected from ten previously published datasets based on Affymetrix gene microarray platforms, including 336 normal lung samples and 954 lung tumor samples. Most of the lung cancer patients belonged to lung adenocarcinoma subtype. A detailed description of the collected data used in this study was illustrated in Fig. 1A.
We then identified the enriched metabolic signaling pathways in patients with lung cancer using the GSEA assay. Among all the enriched metabolic signaling pathways, pyrimidine metabolism signaling pathway was significantly enriched in seven out of ten datasets, including, GSE10072, GSE18842, GSE19188, GSE27262, GSE30219, GSE31210 and GSE75324 datasets, representing the most frequently enriched metabolic signaling pathway (Fig. 1B). Only in GSE7670, GSE31908 and GSE33532 three datasets, the pyrimidine metabolism signaling pathway was not significantly correlated with the transcriptional profiling of lung cancer (Fig. 1B).
Expression levels of pyrimidine metabolic rate limiting enzymes are associated with the tumor overall survival in patients with lung cancer: analysis from GEO datasets.
Since the pyrimidine metabolic rate limiting enzymes were activated in lung tumor tissues, the present study next assessed the prognostic effects of pyrimidine metabolic rate limiting enzymes in lung cancer patients. The Kaplan-Meier Plotter is an online survival analysis tool to rapidly assess the prognostic effects of genes using the integrated GEO microarray data derived from 1926 lung cancer patients [28,29]. Using Kaplan-Meier Plotter, the present study showed that, high expression levels of the pyrimidine metabolic rate limiting enzymes CAD, CTPS, DHODH, DTYMK, RRM1, RRM2, TK1, TYMS and UCK2 were unfavorable prognostic markers in patients with lung cancer (Fig. 3). Lung cancer patients with higher expression levels of CAD, CTPS, DHODH, DTYMK, RRM1, RRM2, TK1, TYMS and UCK2 had worse clinical outcomes than patients with low expression levels of those genes (Fig. 3).
However, consistent with the decreased expression levels of NR5C2 and TK2 in lung cancer tissues, patients with higher expression levels of NR5C2 and TK2 had better prognosis than patients with low expression levels of those genes (Fig. 3).
Expression levels of pyrimidine metabolic rate limiting enzymes are associated with the tumor overall survival in patients with lung adenocarcinoma: analysis from TCGA LUAD dataset.
Furthermore, using TCGA LUAD dataset, we confirmed the prognostic effects of pyrimidine metabolic rate limiting enzymes in patients with lung adenocarcinoma. Similarly, the Kaplan-Meier survival analysis revealed that the pyrimidine metabolic rate limiting enzymes DTYMK, NT5C3, RRM1, RRM2, TK1, TYMS and UCK2 were all associated with worse prognosis in the patients with lung cancer using TCGA LUAD dataset (Fig. 4). Patients with high expression levels of DTYMK, NT5C3, RRM1, RRM2, TK1, TYMS and UCK2 were with low overall survival. However, we found that the CAD, CTPS, DHODH, NR5C2 and TK2 had no prognostic effect in LUAD dataset (Fig. 4).
Increased expression levels of the pyrimidine metabolic rate limiting enzymes in lung cancer cells are induced by DNA hypo-methylation.
Next, we tried to determine the mechanisms that induced the activation of the pyrimidine metabolic rate limiting enzymes in lung cancer. The high expression levels of oncogenes are usually mediated by hypo-DNA methylation, DNA amplification and gene mutation [30]. Using the DNA methyltion data deposited in GSE32867 and GSE62948 datasets, we analyzed the DNA methylaion intensity of the pyrimidine metabolic rate limiting enzymes in normal lung tissues and lung cancer tissues. The detailed description of the GSE32867 and GSE62948 datasets was illustrated in Fig. 1A.
Compared with the lung normal tissues, the pyrimidine metabolic rate limiting enzymes CAD, RRM2 and TK1 were with hypo-DNA methylation patterns in lung cancer tissues derived from GSE32867 dataset (Fig. 5A). Similar results were obtained in GSE62948 dataset that the DNA methylation intensity of CAD, RRM2 and TK1 was lower in lung cancer tissues, compared with normal lung tissues ( Fig. 5B). Also, in TCGA LUAD dataset, pyrimidine metabolic rate limiting enzymes RRM2, TK1, CAD, UCK2, TYMS and CTPS genes exhibited hypo-DNA methylation in LUAD tumor tissues (Fig. 5C). These observations suggested that DNA methylation was partially contributing to the activation of pyrimidine metabolic rate limiting enzymes in the tumor cells.
Increased expression levels of the pyrimidine metabolic rate limiting enzymes in lung cancer cells are induced by DNA amplification and TP53 mutation.
Another factor determining the activation of pyrimidine metabolic rate limiting enzymes in lung cancer cells was genomic aberration, particularly DNA amplification. It was revealed that 6% lung cancer patients were with UCK2 gene amplification and 5% lung cancer patients were with UCKL1 gene amplification (Fig. 6A). Also TK1 gene amplification was occurred in 2.2% lung cancer patients (Fig. 6A). However, other pyrimidine metabolic rate limiting enzymes were without DNA amplification in lung cancer tissues (Fig. 6A).
Loss of TP53 functions induces uncontrolled pyrimidine synthesis [34]. The present study assessed whether TP53 regulated the expression levels of the pyrimidine metabolic rate limiting enzymes. It was revealed that, pyrimidine metabolic rate limiting enzymes CAD, CTPS, DTYMK, RRM1, RRM2, TYMS, UCK2 and TK1 were all highly expressed in patients with TP53 mutant lung cancer patients, compared with lung cancer patients with wild type TP53 in GSE72094 dataset (Fig. 6B). Interestingly, TK2 which was down regulated in lung tumor tissues was highly expressed in lung cancer patients with wild type TP53 (Fig. 6B).
Those results were further validated in the TCGA LUAD dataset. The expression levels of pyrimidine metabolic rate limiting enzymes CAD, CTPS, DTYMK, RRM1, RRM2, TYMS, UCK2 and TK1 were particularly higher in patients with TP53 mutant lung cancer patients (Fig. 6C). And the expression levels of TK2 were lower in TP53 mutant lung cancer patients (Fig. 6C). Overall, our results suggested that hypo-DNA methylation, DNA amplification and TP53 mutation were combined contributing to the high expression levels of pyrimidine metabolic rate limiting enzymes in lung cancer cells.
Correlation between pyrimidine metabolic rate limiting enzymes is identified in patients with lung cancer.
Next, we tried to determine the connections between those pyrimidine metabolic rate limiting enzymes. Spearman correlation demonstrated high correlation of those genes. Particularly, RRM2 was highly associated with the expression of TK1, RRM1, TYMS and DTYMK genes in GSE30219 lung cancer expression dataset (Fig. 7A). However, NT5C2 and TK2 were negatively correlated with other pyrimidine metabolic rate limiting enzymes (Fig. 7A). Similar results were obtained from TCGA LUAD expression dataset. RRM2 was positively correlated with other pyrimidine metabolic rate limiting enzymes, while, TK2 was negatively correlated with other pyrimidine metabolic rate limiting enzymes (Fig. 7A). Furthermore, we used multivariate cox regression analysis to determine the connections between the pyrimidine metabolic rate limiting enzymes. It was revealed that RRM2 was an independent prognostic marker in patients with lung cancer in GSE30219 dataset (Fig. 7B). In LUAD dataset, all pyrimidine metabolic rate limiting enzymes were interconnected with each other and those genes were not independent prognostic markers (Fig. 7B).
Combined prognostic effects of pyrimidine metabolic rate limiting enzymes are identified in patients with lung cancer.
In both GSE30219 and TCGA LUAD datasets, the pyrimidine metabolic rate limiting enzymes were interconnected with each other and were not independent prognostic markers. So, we tested the combined prognostic effects of pyrimidine metabolic rate limiting enzymes in patients with lung cancer. Lung cancer patients were divided into two clusters based on the unsupervised clustering of the expression levels of pyrimidine metabolic rate limiting enzymes in GSE30219 dataset (Fig. 8A).
However, TK2 and NT5C2 were not significantly over-expressed in tumor tissues (Fig. 9). These results indicated the universal importance of pyrimidine metabolic rate limiting enzymes in the development of cancer.
However, using GSEA assay, we found that pyrimidine metabolism signaling pathway was only significantly enriched in BRCA and THCA tumors (Fig. S2). Although, pyrimidine metabolic rate limiting enzymes were over-expressed in BLCA, COAD, ESCA, LIHC and STAD tissues, the pyrimidine metabolism signaling pathway was not significantly enriched (Fig. S2).
The association between the expression levels of pyrimidine metabolic rate limiting enzymes and the tumor overall survival in patients with liver cancer, breast cancer or stomach cancer: analysis from BRCA, STAD and LIHC datasets.
Like LUAD, pyrimidine metabolic rate limiting enzymes were highly expressed in patients with BRCA, LIHC and STAD. However, in TCGA BRCA dataset, pyrimidine metabolic rate limiting enzymes CAD, CTPS, DHODH, DTYMK, NT5C3, RRM1, RRM2, TK1, TYMS, UCK2, NT5C2 and TK2 demonstrated no prognostic effect in patients with breast cancer (Fig. S3). Similarly, expression levels of pyrimidine metabolic rate limiting enzymes CAD, CTPS, DHODH, DTYMK, NT5C3, RRM1, RRM2, TYMS, UCK2, NT5C2 and TK2 had no clinical relevance in patients with stomach cancer in TCGA STAD dataset (Fig.   S4). Only, TK1 was associated with better clinical outcomes in patients with stomach cancer (Fig. S4) On the contrary, like LUAD patients, high expression levels of pyrimidine metabolic rate limiting enzymes CAD, DTYMK, NT5C3, RRM1, RRM2, TK1, TYMS and UCK2 were all associated with worse clinical outcomes in patients with liver cancer in TCGA LIHC dataset (Fig. 10). Moreover, consistent with the decreased expression levels of TK2 in liver cancer tissues, patients with higher expression levels of TK2 had better prognosis than patients with low expression levels of TK2 (Fig. 10)

Ethics approval and consent to participate
Not applicable.

Consent for publication
Not applicable.

Availability of data and materials
The datasets generated and/or analyzed during the current study are available in TCGA (tcga.xenahubs.net) and GEO (www.ncbi.nlm.nih.gov/geo) repositories.

Competing interests
The authors declare that they have no competing interests.

Funding
The present study was supported by grants from the Fujian Provincial Maternity and the Children's Hospital (grant nos. YCXB 18-10 and YCXM 19-04).

Authors' contributions
HW designed the study and wrote the manuscript. HW, XW and LX performed the data analysis. HC and JZ designed the study and supervised the work.