Analysis of variants from germline, somatic and haematology assays
Between the period October 2013 to May 2019, we performed next generation sequencing assays on samples from a cohort of hospital (n=32,670) and external (n=15,365) patients, covering a broad range of tumour streams, over a period of six years. This yielded 24,168,398 variants of which 23,255 were clinically reported from 95,954 patient samples from 48,036 patients using a heterogenous set of cancer assays (see Figure 1). The assays were targeted cancer gene panels covering a wide range of genomic capture regions ranging from highly targeted panels of four genes through to comprehensive cancer panels of up to 701 genes. Ten different panels were employed covering varying regions of the genome using hybrid capture or amplicon technologies (see Table 1) comprising hereditary cancer germline panels, somatic panels and haematology panels for solid cancers and blood cancers respectively. A detailed breakdown by assay is provided in Table S1.
Of the 23,255 clinically reported variants, 17,240 (74.1%) were identified in subsequent assays and reused in reports. The remainder, 6,015 (25.9%) were only observed in a single patient sample.
Curation workload growth
The total number of variants curated over the study is shown in Figure 2 showing the significant increase with the introduction of hybrid capture assays in 2017. The solid line shows all curated variants (reported, benign and variants of unknown significance (VUS)) compared to the pale lines of reported variants (69.1% of total).
The number of new variants requiring curation per sample per month increased from 3.38 to 3.73 from January 2017 until May 2019 (see Figure 3). Over this period, curations of somatic hybrid capture assays rose significantly from 0.90 to 2.55 samples per month until they accounted for 68% of the curation burden per month. There was also more variability in the number of average variants per month for somatic hybrid capture assays as shown by the larger 95% confidence intervals (see Figure 4).
Low overlap between in-house and public databases
We compared the presence of reported variants with a number of common public genomic knowledgebases. Of the 8,214 unique clinically reported variants within our in-house database, 28.6% (n=2,356) were not present within key public cancer variant resources; COSMIC13 (size=11,453,569 coding mutations), ClinVar14 (size=789,593 variants), VICC29 (incorporating CiVIC16, size=2,528 variants) and GA4GH Beacon network17 (see Figure 5). The highest number of in-house (PathOS) variant matches was to COSMIC, 4,049 (49.2%), followed by ClinVar matches with 2,888 (35.1%), but only 581 (7.1%) matched VICC variants. Variant matches to resources on the Beacon Network were 2,127 (25.9%). Our clinically reported variants include prognostic and diagnostic variants in addition variants with a clear therapeutic option which is a focus of VICC. Further, the variants within PathOS but not present in VICC are enriched for TSGs as these variants are often loss of function variants (see Figure S2 and Figure S3).
We then examined the variants (n=2,356) not found in external knowledgebases to more closely identify their characteristics. The majority of variants (87.6%: n=2,041) were non-recurrent, that is, only reported in a single patient (see Figure 6). Somatic assays contributed 65.5% (n=1,543), 24.8% (n=585) from haematology assays, and 9.7% (n=228) from germline assays. The category of variants without external knowledgebase data were curated de novo and stored in our internal database, where they provided little benefit for future patients due to the large proportion that did not reoccur within other cancer patients over the study period.
Of the in-house only variants, 43.2% (n=1,017) were from somatic assays, of missense consequence and classified as VUS (see Figure S4). Analysis of gene type shows a large number of the variants were missense VUS from oncogenes (n=239), tumour suppressor genes (n=290), or within genes not listed in the Cancer Gene Census (n=381)(see Figure S5).
A gene level analysis of the in-house only curated variants reflects the mix of genes in our custom targeted gene panels (see Figure 7). Key genes associated with haematological cancers contribute significant numbers of in-house only variants. In particular, the tumour suppressor TET2 is implicated in haematological malignancies18 and 134 TET2 unique variants were reported, none of which were seen in external databases. Other genes frequently mutated in haematological malignancy included ASXL119, RUNX120 and WT121. This may be attributed to the large number of haematology assays within PathOS and the underrepresentation of haematological genes within the compared public resources.
Commercial systems may increase misclassification risk
A subset of novel in-house only curated somatic and germline variants (n=307) were submitted to a commercial tertiary analysis platform (CTAP) for annotation and pathogenicity assessment. The CTAP only used ACMG classifications for both germline and somatic variants. Although this framework is not a relevant categorisation for somatic variants, these were compared to our in-house classifications that were mapped to ACMG categories.
The subset comprised four pathogenicity classes using the ACMG classifications (‘benign’ n=2, ‘VUS’ n=249, ‘likely pathogenic’ n=18 and ‘pathogenic’ n=38). Although 81.1% (n=249) variants were concordant for pathogenicity, 18.9% (n=58) were discordant (see Table 2). Discordant classifications included 29 classified as ‘VUS’ by CTAP but ‘pathogenic’ by PathOS and 17 variants classified as ‘VUS’ by CTAP but ‘likely pathogenic’ by PathOS (see Table S2). Of these 29 discordant classifications, 17 were non-synonymous, 11 nonsense non-synonymous and one within a splice site; 15 were substitution variants and 14 were insertions.
A particular example is chr1:g.45799193dup (HGVSc:NM_001128425.1:c.240dup, HGVSp:NP_001121897.1:p.(V81Cfs*12)) classified as pathogenic due to frameshift resulting in stop codon leading to loss of function in tumour suppressor MUTYH22 but CTAP has this annotated as VUS. Another example is chr16:g.23641608T>A (hgvsc: NM_024675.3:c.1867A>T, hgvsp: NP_078951.2:p.(Lys623*) which we predicted a truncated PALB2 protein by approximately 46%, resulting in loss of significant functional domains. Literature suggests ovarian, breast and other malignancies with loss of HR proteins including PALB2 have shown to confer clinical sensitivity to PARP inhibitors and platinum agents23,24,25. CTAP had this variant classified as VUS which may lead to potential therapeutic approaches for the patient being missed.
Comparison of gene distributions by tumour stream
From the 10,965 somatic assay patients, 3,939 variants were curated according to the clinical context reported with the patient sample. The top ten clinical contexts with the most variants show that these variants are dominated by VUS classifications (see Figure S6).
To examine the concordance at the gene level between databases in specific clinical contexts, we compared the top 20 genes across melanoma, colorectal and hematological malignancies in our in-house knowledgebase (PathOS) to COSMIC and ICGC by matching the primary tumour site (see Figure S7). The patient gene counts were positively correlated for the melanoma (ICGC: Pearson’s r=0.80, p<0.01; COSMIC: r=0.81, p<0.01) and also for colorectal (ICGC: r=0.74, p<0.01; COSMIC: r=0.81,p<0.01) cohorts (see Table S3). In contrast, the haematology stream shows marked difference in gene distributions and did not show a significant association with ICGC but did show a weak correlation with COSMIC (r=0.63,p<0.01). This may be attributed to the custom gene panels of the PMCC haematology assays and differing ranges of blood cancers incorporated into ICGC and COSMIC analysis.