The current work used a two-stage study design. An initial discovery phase identified the sources of molecular variance (or latent factors) across tumours of ccRCC patients using the unsupervised Multi-Omics Factor Analysis (MOFA) approach to integrate DNA methylome, transcriptome, and somatic mutation profile data from whole-genome sequencing (WGS) data (see methods; Supplementary Fig. 1–2). We used LASSO regression (see methods, Supplementary Table 1) to select key features to establish signatures for ccRCC latent factors and attributed these signatures to independent cohorts of ccRCC patients in a validation phase. Associations between latent factors and molecular and epidemiological features were then tested within the discovery and validation phases. The characteristics of each cohort are described in Supplementary Table 2 and workflow of the study in Supplementary Fig. 1.
Description of molecular components in ccRCC tumours
Collectively, 31%, 41%, and 6% of the variance in DNA methylome, transcriptome, and somatic mutation profile data, respectively, were explained by the first six latent factors estimated by MOFA (Fig. 1). While pan-cancer analyses reported the global DNA hypomethylation of tumours in comparison with their histological normal material (Witte et al., 2014), the CpG sites associated with latent factors tended to be hypermethylated and annotated to functional regions of the genome (CpG islands, shores, and shelves within regulatory and coding regions) (Fig. 2A, Supplementary Tables 3–6, Source data file). There was no consistent evidence for associations between latent factors and global DNA methylation, measured by the mean methylation levels of Alu and LINE1 transposable elements (Supplementary Table 7), implying that DNA methylation changes in specific functional regions of the genome contributed to most of the inter-patient variation in the DNA methylome. Important inter-patient variation in the gene expression levels (transcriptome) across ccRCC tumours were captured by the latent factors (Fig. 1). Pathway analysis showed that the gene expression levels with the highest loadings in each latent factor were enriched for cancer-related pathways, such as those involved on immune system, metabolism, cell cycle, cell plasticity and signalling, chromatin remodelling, and tissue development (Fig. 2B, Supplementary Table 8). Latent factors were also related to a range of DNA mutational signatures, including those implicated in endogenous (i.e., SBS1, SBS13) and exogenous (e.g., SBS4, DBS2) processes, as well as signatures where the aetiology remains unclear (i.e., SBS40a,b,c, ID5). Latent factors were also related to the presence of ccRCC somatic cancer driver mutations in genes that act as epigenetic regulators (PBRM1, SETD2, BAP1, and KTM2C) (Fig. 2C, Source data file).
In the following sections, we will report the key findings of latent factors 1 to 6 using MOFA (discovery) and their respective signatures (validation) to illustrate sources of ccRCC inter-patients’ variability potentially attributed to endogenous and exogenous exposures (more details in MOFA subsection in methods).
The major ccRCC molecular component is linked to cellular mitotic age.
The major component of ccRCC inter-patients’ variability (latent factor 1) was typified by DNA hypermethylation of CpG islands annotated to functional regions of the genome (Fig. 2A, Supplementary Table 4). These profiles were similar to those noted in cellular ageing processes (Marttila et al., 2015; Bell et al., 2019) and latent factor 1 was associated with chronological age (Table 1), prompting us to explore this ccRCC component in the context of biological ageing. Biological age is a complex process that reflects an individual’s physiological state over time; varying aspects can be measured by epigenetic clocks (Rutledge et al., 2022). Latent factor 1 was correlated with a range of epigenetic clocks, showing the highest correlation with the age-adjusted mitotic-like clock epiTOC2 (Teschendorff, 2020) (Supplementary Fig. 3). EpiTOC2 is based on the CpG sites of Polycomb target genes that are unmethylated at birth but become progressively methylated as cells replicate, a process called cellular mitotic age (Yan et al., 2016, Teschendorff, 2020). Consistent with this, pathway analysis of gene expression levels correlated with latent factor 1 suggested an enrichment for Polycomb (EZH2) target genes (Supplementary Table 8). Linear regression models estimated that this epigenetic clock explained around 80% of the variance in latent factor 1 across ccRCC tumours (Fig. 3A). Latent factor 1 was also associated with other types of mitotic clocks, including clock-like DNA mutational signatures (SBS1/ID1) (Fig. 2C) and telomere attrition (Supplementary Table 9), reinforcing the hypothesis that these metrics act as proxies for the number of cell mitosis (Alexandrov et al, 2015; Yang et al., 2016, Alexandrov et al, 2020; Teschendorff, 2020). Similarly, phenotypic consequences consistent with higher mitotic counts, such as higher copy number alteration fraction of the genome and homologous recombination DNA repair deficiency, were associated with this ccRCC component (Supplementary Table 9). It was also associated with the presence of somatic cancer driver mutations in PBRM1 and SETD2, chromatin remodeling genes involved in cell senescence (Lee et al., 2016) and proliferation (Dominguez et al., 2016; Cai et al., 2019) (Fig. 2A, Supplementary Table 9). Latent factor 1 was strongly associated with tumour stage and grade (Table 1, Fig. 3B). Patients in the top quintile of latent factor 1 were estimated to be 23 times more likely to be late-stage tumours (III and IV) and 9 times more likely to be high-grade tumours (grade 3–4), compared to those in the bottom quintile (Supplementary Table 9). Nevertheless, the levels of latent factor 1 considerably varied within ccRCC tumours within tumour stage and grades (Fig. 3B) and multivariate analysis suggested that the associations between latent factor 1 appeared not to be driven by tumour stage and grade alone (Supplementary Table 9). Consistent with the notion that cellular mitotic age is accelerated in tumour tissues, the patient’s tumor material had marked higher values of latent factor 1 (tumours: mean of 0.43 ± 0.95; normal tissue: mean of -0.87 ± 0.20, p = 1.6x10− 45) than paired normal kidney tissue after adjustments for chronological age (Fig. 3C). Taken together, the main source of ccRCC inter-patients’ variability in the tumour DNA methylome appears related to cellular mitotic age, which may be influenced by proliferative effects of somatic mutations in PBRM1 and SETD2. The top 5 epigenetically regulated genes with the highest loadings in latent factor 1 encode zinc-finger transcription factors with pleotropic role in cancer (Jen and Wang, 2016), such as ZNF471 previously linked to ageing process (Marttila et al., 2015), and SPON1, essential to nephron formation in mice (Vidal et al., 2020) (Supplementary Table 3). Such genes may provide additional clues to explain how the deregulation of genes important to kidney homeostasis might contribute to the biological ageing process in ccRCC tumours.
The latent factor 1 signature was inferred across tumour samples from the TCGA pan-cancer cohorts (N = 8,040) (see methods, Supplementary Table 1). The signature for latent factor 1 explained on average 15% of variance in tumor DNA methylation in TCGA patient cohorts, ranging from 31% of variance explained in lymphoid neoplasm diffuse large B-cell lymphoma (TCGA-DLBC) to 5% in Uveal Melanoma (TCGA-UVM) (Supplementary Table 10). Independent to ccRCC tumours (TCGA-KIRC), the latent factor 1 signature was strongly associated with epiTOC2, dosage of the clock-like mutational signature SBS1, WGS-telomere length ratio, and copy number alterations in a joint model of TCGA tumours combined (Table 2). Intriguingly, these effects also appeared particularly prominent in other histological subtypes of kidney cancer (TCGA-KICH/chromophobe, TCGA-KIRP/papillary) as well as in other tumour types (e.g., adrenocortical carcinoma/TCGA-ACC, mesothelioma/TCGA-MESO, pancreatic/TCGA-PAAD) (Supplementary Table 10).
Molecular components related to the ccRCC tumour microenvironment.
To further explore how tumour microenvironment (TME) contributes to ccRCC inter-patients’ variability across multi-omics layers, we imposed 65 gene expression TME signatures derived from previously published single-cell RNA sequencing (scRNA) data of ccRCC tumours (Li et al., 2022) into bulk ccRCC tumour transcriptomes (see methods). The scRNA-derived signatures were correlated within patient’s tumours (Supplementary Fig. 4) and 27 representative signatures could be derived from the 65 TME signatures (see methods). Figure 4A describes the replicated associations between TME signatures and latent factors in the validation series, suggested from the discovery series (Supplementary Fig. 5A). Latent factors 2–6 were associated with TME signatures related to kidney epithelial, immune cell infiltrates, inflammation, epithelial-mesenchymal transition process (EMT) and cell proliferation processes (Fig. 4B), consistent with the TME making an important contribution to the inter-patient’s variability in ccRCC (Li et al., 2022).
Latent factor 2 captured shared inter-patients’ variability across omics layers (Fig. 1), associating with DNA methylation changes (annotated to CpG island and open sea, Fig. 2A), immune system-related pathways (Fig. 2B, Supplementary Table 8), and BAP1 cancer driver mutations (Fig. 2C). Higher latent factor 2 levels were observed in male patient’s tumours compared with female patient’s tumours (Table 1). This ccRCC component was associated with the presence of TME signatures related to EMT process, proliferating cells (cycling endothelial cells and cell cycle kidney meta-programs) and myeloid cells (particularly fibronectin-positive tumour associated macrophages/FN1_TAM) (Fig. 4A, Supplementary Fig. 5A), and late-stage, high-grade tumours (Table 1). Of the differentially methylated regions associated with latent factor 2, the expression of EMT related genes (WT1, IL20RB, KRT19) (Kalluri & Weinberg, 2010; Wang et al., 2020; Guo et al., 2022; Wu et al., 2023) appeared epigenetically upregulated (Supplementary Table 5). The CpG sites annotated to IL20RB (interleukin 20 receptor subunit beta) displayed the highest functional impact on its transcript levels (Supplementary Table 5), with higher levels of latent factor 2 associating with DNA hypomethylation (1st exon and 5-UTR) of IL20RB and respective upregulation of its RNA levels (Fig. 4B). The functional regulation of IL20RB expression by DNA methylation was pronounced in the presence of BAP1 cancer driver mutations regardless of tumor stage in ccRCC (Supplementary Fig. 5B). Multivariate analyses suggested that IL20RB expression levels predicted TME features (i.e., cell cycle, EMT, and FN1_TAM) in addition to the effects of somatic cancer driver mutations in BAP1 gene (Fig. 4C). Together, these results indicate that latent factor 2 is representing a complex relationship between the DNA hypomethylation in specific CpG sites and increased expression of EMT and cell proliferation related genes (i.e., IL20RB, KRT19, WT1), BAP1 cancer driver mutations, and ccRCC tumour microenvironment components (i.e., FN1_TAM) that could contribute to disease progression and unfavorable prognosis of the ccRCC patients, particularly in men.
Latent factor 6, epigenetic regulation of GSTP1, and genotoxicity
An additional source of inter-patient’s variability was an intriguing ccRCC component (latent factor 6) linked to DNA hypermethylation and gene expression changes (Fig. 1). This component was associated with tobacco smoking (Table 1), the environmental exposure robustly associated with ccRCC risk by observational studies (Hsieh et al., 2017). The DNA methylation patterns in CpG sites in proximity to CpG islands and gene bodies noted with latent factor 6 (Fig. 2A) were similar to those observed in tobacco smokers in blood (Guida et al., 2015; Joehanes et al., 2016; Plusquin et al., 2017; Svoboda et al., 2021); it was also associated with the dosage of tobacco-related DNA mutational (SBS4 and DBS2) (Fig. 2C) (Alexandrov et al., 2020) and methylation signatures (Fig. 5A) (Chamberlain et al., 2022). Of the differential methylated regions associated with latent factor 6, the CpG sites annotated to GSTP1 (glutathione S-transferase pi) displayed the highest functional impact on its transcript levels (Supplementary Table 6). Latent factor 6 was related to hypermethylation and decreased expression of GSTP1 (Fig. 5A), estimated to jointly explain around 36% of the variance in latent factor 6. GSTP1 is a key enzyme involved in phase II detoxification of xenobiotics by glutathione conjugation and it has been implicated in the metabolism and clearance of a variety of genotoxic compounds (e.g., the carcinogens in tobacco smoke, cisplatin, mercury as well as endogenous free radicals) (Miller et al., 2003; Simic et al., 2009; Sawers et al., 2014; Shin et al., 2017). Epigenetic silencing of GSTP1 is postulated to increase cellular sensitivity to genotoxic compounds (Su et al., 2007; Rønneberg et al., 2008, Cui et al., 2020). Consistent with this, when excluding the outlier effect of high mutation burden tumours from Romania, latent factor 6 was associated with total tumor mutation burden (Fig. 5A) among tobacco smokers, particularly in ever smokers (Supplementary Table 11). It is noteworthy that latent factor 6 was also present in never smokers and associated with DNA mutational signatures ID5 and SBS40b (Fig. 2C) commonly observed in ccRCC, suggesting that these DNA mutational signatures may be related to genotoxic compounds, as raised elsewhere (Senkin et al., 2023). ccRCC tumours with higher loadings of latent factor 6 tended to have lower levels of the gene expression signatures related to interferon gamma (INFG) response, implying a decrease in cellular response to INFG (Fig. 2B, Fig. 4A, Supplementary Table 8). INFG acts as a gatekeeper of ccRCC progression by restraining the clonal expansion of ccRCC cells (Perelli et al., 2023). The suppression of INFG response associated with latent factor 6 might also have contributed to formation of a permissive environment for the expansion of these tumour cells and disease progression. Latent factor 6 was also associated with the presence of cancer driver mutations (e.g., KMT2C) (Fig. 2C) and ccRCC tumours of female patients had higher loadings of this ccRCC component (Table 1), although how these relate to molecular processes captured by latent factor 6 remains to be elucidated. Interestingly, latent factor 6 was significantly higher in a subset of normal kidney tissues in comparison with the matched ccRCC tumours (mean of 0.95 ± 0.26 vs. mean of -0.47 ± 0.89, p = 6.1x10− 56) (Fig. 5B). While ccRCC tumours tended to have lower RNA levels of GSTP1 than matched normal kidney tissues, a discrete distribution of DNA methylation levels of GSTP1 in tumour samples were observed, with one subset of tumours displaying hypermethylation and another one hypomethylation of GSTP1 CpG sites (Supplementary Fig. 6). Together, these findings suggest that there may be an impaired metabolism of potentially exogenous compounds, such as tobacco smoke exposure, in ccRCC tumours, leading cells to accumulate more somatic DNA mutations once exposed to such genotoxic compounds.
We then inferred latent factor 6 across tumour samples from the TCGA pan-cancer cohorts (see methods, Supplementary Table 12). The associations between latent factor 6, DNA methylation and RNA levels of GSTP1, tobacco methylation signature, and total mutation burden were also observed in papillary cell kidney cancer (Supplementary Table 12). However, this relationship varied substantially in other tumour types (Supplementary Table 12), which might reflect the differences in aetiology by tumour type.
The prognostic significance of ccRCC components
Finally, we investigated the prognostic value of the latent factors identified in ccRCC tumours in the discovery and validation sets. Latent factors were associated with ccRCC patients’ survival (Fig. 6), with worse survival of ccRCC patients noted for higher values of latent factors 1 (Validationmodel1: HR: 1.63, 95%CI = 1.35–1.98, p = 4.1x10− 7), latent factor 2 (Validationmodel1: HR: 1.46, 95%CI = 1.16–1.84, p = 0.001), and latent factor 5 (Validationmodel1: HR: 1.34, 95%CI = 1.11–1.63, p = 0.003) at baseline model (model1: sex and age at diagnosis). Despite the respective 14% and 17% attenuation in the effect of latent factors 2 (Validationmodel2: HR: 1.33, 95%CI = 1.06–1.67, p = 0.014) and 5 (Validationmodel2: HR: 1.23, 95%CI = 1.00-1.53, p = 0.055) on the overall survival of ccRCC patients after additional adjustments for tumour stage and grade, their association estimates remained consistent (Fig. 6). Furthermore, these latent factors were also associated with tumour stage and grade (Table 1). When considering other tumour types in the TCGA cohorts, we also noted a similar striking prognostic value for latent factor 1 in patients with papillary kidney cancer (TCGA-KIRP: HR: 2.63, 95%CI = 1.75–3.69, p = 6.8x10− 17) and other cancer types (e.g., TCGA-LGG, TCGA-ACC) (Supplementary Table 10). Our findings are in line with the previously described role of BAP1 somatic mutations, EMT, and tumour immune cells (factor 2), as well as cell proliferation (factor 5) on ccRCC patients’ outcomes (Hakimi et al., 2013; Rickets et al., 2018; Motzer et al., 2020).