Whole genome sequencing refines stratification and therapy of patients with clear cell renal cell carcinoma

Clear cell renal cell carcinoma (ccRCC) is the most common form of kidney cancer, but a comprehensive description of its genomic landscape is lacking. We report the whole genome sequencing of 778 ccRCC patients enrolled in the 100,000 Genomes Project, providing the most detailed somatic mutational landscape to date. We identify new driver genes, which as well as emphasising the major role of epigenetic regulation in ccRCC highlight additional biological pathways extending opportunities for drug repurposing. Genomic characterisation identified patients with divergent clinical outcome; higher number of structural copy number alterations associated with poorer prognosis, whereas VHL mutations were independently associated with a better prognosis. The twin observations that higher T-cell infiltration is associated with better outcome and that genetically predicted immune evasion is not common supports the rationale for immunotherapy. These findings should inform personalised surveillance and treatment strategies for ccRCC patients.


INTRODUCTION
Renal cell carcinoma (RCC) is an increasing global health problem with 431,000 new diagnoses each year, set to increase to 666,000 by 2040 1 .Around 75% of RCCs are clear cell RCC (ccRCC) tumours.These cancers have a variable clinical course and while 75-80% of patients present with apparently localised disease and are offered curative intent treatment 30% will subsequently relapse 2 .There is therefore a pressing need for more accurate risk strati cation, to guide clinical decisions relative to therapy and surveillance.
While therapeutic advances in the treatment of metastatic ccRCC have been made with the advent of antiangiogenic targeted therapies and immune checkpoint inhibitors (iCPi) only a fraction of patients experience durable clinical bene t.Mixed outcomes of adjuvant PD1-therapies demonstrates that clinical biomarkers fail to reconcile the variable disease course following surgery [3][4][5] .
The need to understand ccRCC biology to inform development of novel therapies and better predict patient outcomes has been a major motivation in sequencing studies.While these projects have identi ed recurrent gene mutations and chromosomal rearrangements analyses have primarily been based on whole-exome sequencing or panel testing of cancer-associated genes, hence the full complement of drivers is incomplete.Correspondingly, studies of the relationship between clinical parameters and genomic alterations have been limited [6][7][8][9] .To advance our understanding of ccRCC we analysed whole genome sequencing (WGS) data from 778 ccRCC patients recruited to the UK Genomics England (Gel) 100,000 Genomes Project (100kGP) 10 .

The Gel cohort
The analysed cohort (100kGP, release v14) comprised tumour-normal (T/N) sample pairs from 778 patients (mean age 63 years, range 25-88 years) with primary ccRCC recruited to 100kGP through 13 Genomic Medicine Centres across England (Fig. 1).Comprehensive clinico-pathology information on the patients is provided in Supplementary Table 1.We restricted our WGS analysis to samples with high-quality data from PCR-free, ash-frozen fresh tumour samples (Supplementary Methods).For 29 of the patients WGS data on multi-regional sampling of tumours was available (2-4 samples per tumour, 94 samples in total).In addition to using variant calls from the 100kGP analysis pipeline we: (i) removed alignment bias introduced by ISAAC soft clipping of semi-aligned reads 11 ; (ii) called tumour copy number using Battenberg 12 ; (iii) called structural variants (SVs) from a consensus of Manta 13 , LUMPY 14 , and DELLY 15 ; (iv) removed insertion-deletions (indels) within 10 base pairs (bp) of a common germline indel.Complete details on sample curation, somatic variant calling, and annotation of mutations are provided in the Supplementary Methods.
Restricting our analysis to WGS data on one sample per patient (Supplementary Methods), we identi ed 4,267,943 single nucleotide variants (SNVs), 699,100 indels, and 19,756 chromosomal rearrangements or structural variants (Supplementary Table 2).While the median tumour mutational burden was 1.88/Mb, three tumours displayed a hypermutated phenotype: i.e., excessively high SNV/indel mutation burden (maximal SNV/Mb = 33.65,maximal indel/Mb = 21.77).Twenty-two of the patients (2.8%) were carriers of pathogenic germline variants in one of the well-established RCC susceptibility genes and 10 (1.2%) were carriers of a variant in another cancer susceptibility gene (Supplementary Table 3).
To identify non-coding drivers in gene promoters, untranslated and non-canonical splice regions, we used OncodriveFML 27 , ActiveDriverWGS 28 , and negative binomial regression adjusting for trinucleotide mutational context (Supplementary Methods).The only mutated region, displaying consistent evidence of positive selection, was the TERT promoter region.This association was primarily driven by the canonical mutations 5:1295113G > A and 5:1295135G > A, both of which were reported as recurrent in TRACERx 29 and have been documented to be early drivers for bladder cancer 30,31 (Supplementary Table 6).
Systematic analyses of cancer genomes provide an opportunity of estimating the number of patients eligible for a targeted therapy and identify opportunities for drug repurposing.We assessed the clinical actionability of driver gene mutations by referencing OncoKB Knowledge Base 32 (version 3.11), and found 93 unique alterations were targetable (OncoKB Level 1-4), and were all at least Level 4 (compelling biological evidence supporting the biomarker being predictive of drug response).We also examined COSMIC Mutation Actionability in Precision Oncology 33 database highlighting an additional 717 unique alterations which are potentially targetable (Supplementary Fig. 3, Supplementary Table 7) .
Pan-chromosome, 13.2% of tumours showed evidence of chromothripsis.Between chromosomes 3p and 5q, chromothripsis was only detected at low frequency (2.6%) and the rate of unbalanced translocations was 4.0%, in contrast to some 29 but not all previous reports 34 .16.6% of the tumours displayed whole genome duplication (WGD), a nding almost identical to the 15% reported by TracerX 35 (Supplementary Table 2).
We identi ed 37 hotspots of recurrent simple SVs (FDR < 0.05) by piecewise constant tting adjusting for local genomic features known to in uence rearrangement density (chromatin accessibility, repeated elements, GC content, replication timing, gene density and expression).Fragile sites are prone to rearrangement (possibly due to replication error) and tend to co-occur with large, late-replicating genes.SVs occurring at fragile sites are hence likely to be the consequence of mechanistic rather than selective factors.After excluding 10 SV hotspots mapping to potential fragile sites, we identi ed 27 SV hotspots (Fig. 4, Supplementary Table 10).We identi ed a total of 66 breakpoints within 5p15.33, spanning TERT.These included, a deletion breakpoint and an unclassi ed event 2kb downstream of the TERT promoter, and tandem duplications overlapping TERT (n = 5).In tumours from the 34 patients with a TERT 5'UTR mutations, there were no overlapping unclassi ed/tandem-duplication events or a SV deletion/unclassi ed promoter breakpoints; an observation consistent with earlier ndings 29 .

Mutational signatures
To gain insight into mutational processes in ccRCC, we extracted single-base substitution (SBS), double-base-substitution (DBS) and indel (ID) signatures de novo and related those to known COSMIC signatures (v3.2) using SigPro lerExtractor 41,42 .In the majority of cancers, single base substitutions could be assigned to signatures SBS5/SBS40 and SBS1 (nomenclature as per COSMIC) resulting from clock-like mutagenic processes (Fig. 5, Supplementary Fig. 4, Supplementary Table 11).Other signatures recovered with known speci c underlying aetiology include those associated with oxidative damage (SBS18), defective base excision repair (SBS30), APOBEC (SBS2, SBS13), tobacco smoking (SBS4, DBS2, ID3) and aristolochic acid (SBS22).While the incidence of renal cancer has been linked to aristolochic acid exposure in residents of Danube river countries 43 , 88% of patients with SBS22 tumour activity in the Gel cohort were self-reported to be white British.SBS31 and SBS35 have been attributable to platinum chemotherapy.We recovered SBS35 in four cases, none of which were reported to have a past history of platinum chemotherapy.In contrast the tumours from ve patients, which had a past history of a non-RCC cancer and had received carboplatin or oxaliplatin did not display SBS31 or SBS35.To complement SigPro lerExtractor we searched for mutational signatures associated with defective mismatch repair (dMMR) and defective homologous recombination (dHR) using mSINGs 44 and HRdetect 45 .Three cases displayed evidence of dMMR, of which two harboured MLH1 somatic mutations accompanied by LOH, but none carried germline pathogenic MMR variants.Two of the cases had mutations assigned to signatures associated with dMMR (SBS20, SBS26).No case showed evidence of dHR.Considering mutational signature activity between clonal and subclonal mutations we found no signi cant enrichment or depletion of any SBS signatures between clonal and subclonal mutations.

Ordering of mutational events
Using PhylogicNDT 46 in conjunction with MutationTimeR 47 , we reconstructed the chronological ordering of focal CNAs and driver mutations.Across all tumours gain of 5q were consistently earlier alterations.As expected, mutations in VHL, PBRM1, SETD2 and BAP1 were predicted to be early events, generally occurring before corresponding CNAs.In contrast, mutations in KMT2C, ARID1A and HIF1A were late events (Fig. 6).Estimating the chronological timing of CNAs under varying mutational rates and tumour initiation (Supplementary Methods) implies WGD occurred on average 9.2 years before tumour sampling and gain of 5q, 35.5 years before sampling (Fig. 6).Moreover, the estimated lead time of 5q gain and WGD were both correlated with age at presentation (adjusting for h grade and stage P = 9.4 x 10 -13 and 0.02 respectively).

Immune evasion
Using pVAC-Seq 48 , we predicted 24,893 class I neoantigens across the 778 tumours (1-327 per tumour, median 26), resulting from: 66.5% missense mutations, 32.0% frameshift variants, 1.3% inframe deletions and 0.25% inframe insertions (Fig. 7).As expected, TMB was positively associated with tumour neo-antigen count (TNC) (OR = 1.21, 95% CI: 1.18-1.23;Supplementary Table 12).Examining evidence of immune evasion we considered (LOH) or mutation of HLA class I genes (HLA-A, HLA-B, HLA-C) and immune escape genes (Supplementary Methods, Supplementary Table 13).Using LOHHLA, we detected LOH of HLA in only 5.9% of tumours.It has been reported that LOH on HLA class I genes and 9p21 loss tend to co-occur 49 , suggesting a potential mechanism for immune escape.However, after adjusting for stage and grade (correlated with both LOH of HLA and 9p21 loss, Supplementary Table 14) the correlation was not signi cant (OR = 1.32, 95% CI: 0.58-2.99;Supplementary Table 15).Similarly, nonsynonymous mutations of HLA genes were rare (0.5%).An inactivating mutation in at least one of the 22 antigen presenting genes 50,51 (APG) was seen in only 3.1% (24/778) of tumours.None of the APGs displayed a propensity for mutation.Collectively, on the basis of alteration of these escape pathways only 9.0% (70/778) of tumours were predicted to exhibit some form of genetically-driven immune evasion.
After excluding patients with missing follow up information we examined the relationship between genomic features and overall survival (OS) in 605 patients (Supplementary Tables 1, 15, 16, Supplementary Figs. 5 and 6.).Strong predictors of OS were age, grade and stage (Log-rank P , P and P respectively).After adjusting for co-variants using Cox regression increased OS was associated with VHL (Hazard Ratio (HR) = 0.60, 95% CI: 0.36-0.98)and PBRM1 (HR = 0.64, 95% CI: 0.42-0.97)mutation status (Fig. 8).Given the co-occurrence of VHL and PBRM1 mutations, 84%, 325/388 of PBRM1-positive tumours were also VHL mutated), after adjusting for VHL status, PBRM1 mutational status did not show an independent relationship with OS (HR = 0.68, 95% CI: 0.44-1.03).Aside from VHL, mutation of none of the other driver genes showed an independent association with OS; acknowledging we had limited statistical power to demonstrate a relationship with less frequently mutated genes.After adjusting for VHL status, higher SV count was, however, associated with worse OS (HR = 1.01, 95% CI: 1.00-1.10).We also observed that four speci c copy number gains were associated with better OS.While we found no association between OS and either neoantigen burden or immune escape a higher TCRA T-cell fraction was associated with a better OS (HR = 0.65, 95% CI: 0.43-0.99;Fig. 8).We did not nd evidence to support a relationship between OS and intratumor heterogeneity or wGII, both of which have previously been purported to in uence prognosis 23,54 (Supplementary Table 16).
We examined the relationship between molecular features and progression free survival (PFS) in 167 of the patients ascertained on the basis of being at intermediate-high risk of tumour recurrence on the basis of their Leibovich score 55 (Supplementary Tables 1, 15 and 17, Supplementary Figs. 7 and 8).While VHL status was not associated with better PFS we observed that KDM5C mutation was independently associated with worse outcome (HR = 1.98, 95% CI: 1.00-3.91;Supplementary Table 16) and a higher incidence of necrosis (OR = 4.81, 95% CI: 1.20-19.11,Supplementary Table 18).Thirty-seven of the 167 patients had received iCPi therapy as a rst or second line treatment and in 21 of these there was documented evidence of clinical bene t.

DISCUSSION
This study, to our knowledge, represents the largest WGS analysis of primary ccRCC reported to date providing for a more comprehensive description of the genomic landscape of ccRCC.We acknowledge that there are limitations to our analysis.Speci cally, our reliance on short-read sequencing and lack of transcriptomic information.Nevertheless, as well as con rming established driver genes we identify new drivers further highlighting oncogenic metabolism and epigenetic reprogramming as being central to ccRCC biology.Additionally, we validate pTERT mutations as drivers, thereby further substantiating telomerase dysfunction in the development of ccRCC.Mutational signature analysis provides a mechanistic basis for known lifestyle and exposure risk factors as well as potentially indirectly suggesting additional ones.While we did not identify any new mutational signatures, our analysis provides further support for tobacco smoking being a risk factor for ccRCC 56 .
The large size of our study, coupled with the standardised management protocols for ccRCC patients within the UK National Health System, has enabled us to investigate the correlation between molecular features and patient prognosis.The clinical course for many ccRCC patients with apparent same stage disease can be highly variable.Upfront identi cation of patients who are likely to relapse early offers the prospect of intervening preemptively to maintain remission.Furthermore, since metastatic ccRCCs are chemotherapy and radiotherapy resistant, identifying tumour sub-groups with targetable molecular dependencies has the potential to inform on biologically driven therapies.The relationship between mutations in the major clonal driver genes and patient survival has been the subject of a number of previous studies, but ndings have been inconsistent 6,7,[57][58][59][60][61][62][63] (Supplementary Table 20).While some studies 7,9 have reported BAP1 mutations being associated with a worse clinical outcome, other studies 58,62 have failed to demonstrate any relationship.As previously documented 9,64 , and herein, BAP1 mutations are strongly associated with increased grade and after adjustment we failed to show support for an independent relationship.In our study, we, however, show VHL mutation status was independently associated with an improved OS, consistent with a recent study 61 .VHL mutations are early events of ccRCC development whereas other mutated genes are acquired later therefore they might be assumed to play more of a role in disease progression.Hence it is unclear why VHL-positive ccRCC tumours might have a more favourable outcome than VHL-wildtype ccRCC.Distinct evolutionary subtypes of ccRCC have, however, been proposed that appear biologically and clinically distinct, with subtypes de ned being by VHL-wildtype, VHL-monodrivers, and those with multiple clonal drivers 8 .After adjusting for VHL status we did not nd support for an independent association between other driver mutations and survivorship.Amongst the strongest relationships we identi ed was between increased copy number with increased survivorship, which was independent of tumour grade, presumably re ecting tumour heterogeneity.We did not nd support for the purported relationship between intra-tumor heterogeneity and prognosis 23 , however our analysis did not bene t from multi-region sampling.
Although current drug treatment paradigms for ccRCC exploit targeted therapies they are primarily not directed against any speci c genomic feature.To investigate the prospect of targeting speci c driver mutations we queried OncoKB 32 , which is regularly curated by an expert panel and therefore generally considered to re ect the current state of knowledge.Since other investigators have reported a higher targetable variant detection rate by applying multiple tools to annotate variants we also made use of The COSMIC Mutation Actionability in Precision Oncology resource 33 .The majority of the alterations we describe as being actionable are based on clinical evidence from other cancers or biological plausibility.As per previous reports, the majority of the targetable alterations we identi ed are within PI3K/mTOR pathway genes.Randomised clinical trials showing clinical bene t of the mTOR inhibitors temsirolimus and everolimus in RCC have already led to their regulatory agency approval.Other targets have not been speci cally studied in the context of ccRCC, hence results cannot be interpreted as de nitive proof of response prediction.Examples of drugs that might be repurposed for treating ccRCC include: Temsirolimus, which is undergoing ongoing trials as a treatment for FBXW7-positive solid tumours 65,66 , nilotinib for ABL1 mutations 67 , niraparib for BAP1 mutations 68 , Tazemetostat hydrobromide for SMARCA4 mutated cancers 69,70 , olaparib with pembrolizumab for ARID2-positive melanoma 71 , and alpelisib for PIK3CA in ER-positive metastatic breast cancer 72,73 .An important caveat to our analysis is that the genetic pro les we derived are of a single region, which has potentially limited our ability to detect clinically important sub-clonal targetable alterations.
In many other cancers a high mutational and neoantigen burden have been linked to better overall survival and responsiveness to checkpoint inhibitors presumably re ecting native immune responsiveness 74 .In our study, there was no association between neoantigen burden and OS.In contrast there was a strong relationship between increased T-cell in ltration and better prognosis.Whilst this might seem counterintuitive, however, this nding may be explained by the poor accuracy (6%) of current HLA-a nity based neoantigen prediction algorithms 75 .Accepting these limitations, the twin observations of higher T-cell in ltration being associated with better outcome and genetically predicted immune evasion is uncommon and supports the rationale for immunotherapy.
There is interest in the prospect of population screening for RCC, given the rising incidence of the disease, the high proportion of asymptomatic individuals at diagnosis and associated high mortality rate.Our analysis supports previous work suggesting that ccRCC driver mutations often precede diagnosis by many years, if not decades 29 , information relevant to the design of any screening programme.
Although some cancers have reaped demonstrable bene ts from the current genomic revolution, the same bene ts have not been yet observed in RCC, and further efforts should be directed to identify the precise role of genomic tumour pro ling in the clinical setting.

DATA AVAILABILITY
The data supporting the ndings of this study are available within the Genomics England Research Environment, a secure cloud workspace.Details on how to access data for this publication can be found at https://re-docs.genomicsengland.co.uk/pan_cancer_pub/.Additional processed aggregated data supporting the ndings presented in this manuscript can be found in the Supplementary Tables.To access genomic and clinical data within this Research Environment, researchers must rst apply to become a member of either the Genomics England Research Network (https://www.genomicsengland.co.uk/research/academic) or the Discovery Forum (industry partners https://www.genomicsengland.co.uk/research/research-environment).The process for joining the network is described at https://www.genomicsengland.co.uk/research/academic/join-gecip and consists of the following steps: 1.Your institution will need to sign a participation agreement available at https:// les.genomicsengland.co.uk/documents/Genomics-England-GeCIP-Participation-Agreement-v2.0.pdf and email the signed version to gecip-help@genomicsengland.co.uk.
Overview  Frequency of nonsynonymous mutations in driver genes.The colour scheme indicates whether the mutational frequency of a driver gene is reported as being above (blue) or below (red) 1% in other ccRCC cohorts.
of the Gel cohort of ccRCC patients.(a) The location of the 13 Genomic Medicine Centers (GMCs) across England from which patients were recruited; (b) The breakdown of the cohort by tumour grade and stage.Figure created using BioRender.

Figure 3 Biological
Figure 3 Biological pathways in ccRCC.(a) The SWI/SNF pathway; (b) The MAPK signalling pathway; (c)The TP53 pathway; (d)The RAS/ERK and hypoxia pathway; (e)The VHL/HIF1A pathway.Driver genes identi ed shown in blue, non-ccRCC driver genes in green and other pathway genes in grey.The number in the bottom left is the nonsynonymous mutational frequency and the number in the bottom right the copy number alteration (CNA) frequency.RTK, Receptor Tyrosine Kinase.Figure created using BioRender.

Figure 4 Copy
Figure 4

Figure 7 Immune
Figure 7 2. Once you have con rmed your institution is registered and have found a domain of interest, you can apply through the online form at https://www.genomicsengland.co.uk/research/academic/join-gecip.Once your Research Portal account is created you will be able to login and track your application.3.Your application will be reviewed within 10 working days.4. Your institution will validate your a liation. 5.You will complete online Information Governance training and will be granted access to the Research Environment within 2 days of passing the online training.Oyeyemi Akala 2 , Janet Brown 3 , Guy Faust 2 , Kate Fife 4 , Victoria Foy 5 , Styliani Germanou 1 , Megan Giles 6 , Charlotte Grieco 7 , Simon Grummet 8 , Ankit Jain 9 , Anuradha Kanwar 2 , Andrew Protheroe 10 , Iwan Raza 10 , Ahmed Rehan 3 , Sarah Rudman 11 , Joseph Santiapillai 12 , Naveed Sarwar 13 , Pavetha Seeva 10 , Amy Strong 4 , Maria Toki 11 , Maxine Tran 12 , Rippie Tutika 8 , Tom Waddell 5 , Matthew Wheater 6 Data that has been made available to registered users include: alignments in BAM or CRAM format, annotated variant calls in VCF format, signatures assignment, tumour mutation burden, sequencing quality metrics, summary of ndings that is shared with Genomic Lab Hubs, secondary clinical data as described in this paper.Further details of the types of data available (for example, mortality, hospital episode statistics and treatment data) can be found at https://re-docs.genomicsengland.co.uk/data_overview/.Germline variants can be explored in Interactive Variant Analysis Browser (see description at https://re-docs.genomicsengland.co.uk/iva_variant/).Cancer patients cohort and longitudinal clinical information on treatment and mortality can be explored with Participant Explorer (see description at https://re-docs.genomicsengland.co.uk/pxa/).S.T. has received speaking fees from Roche, AstraZeneca, Novartis and Ipsen.ST has the following patents led: Indel mutations as a therapeutic target and predictive biomarker PCTGB2018/051892 and PCTGB2018/051893 and Clear Cell Renal Cell Carcinoma Biomarkers P113326GB.None of the other authors have a con ict of interest.