Clinical characteristic of samples
In total, 45,048 samples across 17 cancer types, including CUP were included in this study. The sample type distribution was 24,567 primary and 15,484 metastasis tumours in GENIE cohorts. The hotspot regional mutations and copy number variations of these samples were available from GENIE and cBioportal. According to the information provided by GENIE, we divided samples into 17 broader cancer types, including CUP samples (Fig. 1A). The cancer categories containing the most samples were non-small cell lung cancer (9,085 (15.3%)), breast invasive ductal carcinoma (8,712 (14.7%)), colorectal cancer (5,961 (10.0%)), Glioma (3,214 (5.4%)), Melanoma (2,492 (4.2%)), prostate cancer (2,214 (3.7%)). The number of CUP samples registered in this cohort was 1709 (2.9%), dividing to 1222 metastatic (71.5%), 288 primaries (16.9%), 182 (10.6%) not applicable or heme and 17 (1.0%) unspecified (Fig. 1B). For gender information among CUP patients, 50.5 % of patients were female and 49.5% were male (Fig. 1C).
Significantly mutated genes (SMG) in CUP samples
We analyzed the most genomic mutations of hotspot regions at the gene level in CUP samples in GENIE according to the previously developed method 8,9. In total, 52 SMG was identified (Fig. 2A, Supplementary Table 1). Among SMGs, the mutation rate of TP53, KRAS, ARID1A, SMARCA4 and KMT2D were recorded significantly higher than other identified SMGs (Fig. 2B, Supplementary Table 1). The pathway enrichment analysis of identified SMGs resulted to the involvement of SMGs in a wide range of cellular processes, (Fig. 2C, Supplementary Table 2), including transcription factors/regulators, receptor tyrosine kinase signalling, cell cycle, IGF pathway-protein kinase B signalling, phosphatidylinositol-3-OH kinase (PI(3)K) signalling, Wnt/β-catenin signalling, PDGF, FGF, EGF, TGF-beta, and Notch signalling pathways and integrin signalling pathway. The identification of MAPK, PI(3)K and Wnt/β-catenin signalling pathways is consistent with classical cancer studies. Notably, almost all samples had at least one non-synonymous mutation in at least one SMG. The average number of point mutations in these genes varies across samples, with the highest (512 mutations for TP53 across 727 cases) and the lowest (15 mutations for GLI3 across 15 cases) (Fig. 2B. Supplementary table 1). This suggests that the numbers of both cancer-related genes (52 identified in this study) and cooperating driver mutations required during oncogenesis are few, although large-scale structural rearrangements were not included in this analysis. Interestingly, in line with the previous study performed by Zehir et al. 2017 8 highlighting TERT promoter mutations across few primary tumours, we observed similar mutation of TERT promoter among CUP samples (n=91) (Fig. 2D). Although the clinical relevance of mutations in the TERT promoter remains incompletely understood, our results reaffirm the high prevalence of these alterations in patients with advanced solid tumours and suggest an association with disease progression and poor outcome. Additionally, the presence of similar mutation of TERT promoter in CUP and NSCLC samples suggests these mutations may serve as a diagnostic marker for identification of primary tumour in CUP patients.
Mutual exclusivity and co-occurrence among SMGs
The 1,035 pair-wise exclusivity and co-occurrence analysis of the 52 SMGs found 198 mutually exclusive (P-value < 0.001) and 837 co-occurring (P-value < 0.001) pairs (Fig. 3 and supplementary Table 3) among cup samples. Pairs with significant exclusivity were include KRAS and FAT1, KRAS and NOTCH3, KRAS and NF1, KRAS and DMD and CDKN2A and RB1 in CUP samples. Additionally, the cohort analysis identified pairs with significant co-occurrence, including KRAS and APC, TP53 and APC, KRAS and CDKN2A, KRAS and STK11, KRAS KEAP1, and SMARCA4 and KEAP1 highlighting the importance of this oncogene in cup tumours.
Copy number alteration among cup samples
The copy number variation differences within cup samples resulted into identification of 624 frequently amplified/deleted regions. Significant amplification of MYC, FGF4 and FGF19 observed in a small fraction of patients (Fig. 4A) while deletion of cell cycle-related genes CDKN2B and CDKN2A were detected in only 10 and 20 percent of patients respectively (Fig. 4A). Further, we analyzed copy number alteration of the CUP-SMGs within CUP samples (Fig. 4B) and across primary tumours of 14 cancer types registered in GENIE (Fig. 4C, Supplementary Table 4). Among CUP samples, a deep deletion of TP53, RB1, CDKN2A, and STK11and amplification of KRAS and PIK3CA were observed. In a pan-cancer analysis, amplification of KRAS and PIK3CA in the breast ( 66 and 114 of cases) and non-small cell lung cancer (46 and 48 of cases), TERT in non-small cell lung cancer (114 of cases) and ATR in breast cancer (36 of cases), were the most amplified genes, while deletion in CDKN2A in glioma (676 of cases), RB1 and TP53 in small cell lung cancer (15 of cases) were observed in these 14 different cancer types (Fig. 4C). Among these genes with significantly altered copy numbers between CUP and primary tumours, a significant amplification of TERT promoter was observed in both CUP and non-small cell lung cancer samples compared to glioma and breast primary tumours suggesting that copy number variation of TERT may play diagnostic role for identification of origin of CUP tumours (Fig. 4D).
Mutation frequency of CUP-SMGs across 17 known primary tumours
To identify similar and targetable mutation pattern in CUP, we analyzed and compared genomic alteration frequency of identified CUP-SMGs in primary tumour types across 17 cancer types in GENIE (Fig. 5A). The majority of CUP-SMGs mutations were enriched in non-small cell lung cancer (4,221 cases) colon cancer (4,011 cases) and breast cancer (3,376 cases) (Fig. 5A).
The most frequently mutated gene in this cohort was TP53 (44% of total samples) (Fig. 5B). Its mutations predominate in non-small cell lung cancer (46.36%, 2,517 cases), colon cancer (65.55%, 2,365 cases) and breast cancer (36.26%, 2,060 cases) (Fig. 5B). KRAS is the second most commonly mutated genes, occurring frequently (>10%) in most cancer types (pancreatic:74.6%, colon cancer:44.24%, non-small cell lung cancer:30.93%) except hepatobiliary carcinoma, cervical cancer, bladder cancer, thyroid cancer, melanoma, small-cell lung cancer, head and neck carcinoma, prostate and breast cancer (Fig. 5B).PIK3CA mutations was frequented in breast cancer (36.7%) and cervical cancer (25.14%), being specifically enriched in luminal subtype tumours. Many cancer types carried mutations in chromatin re-modelling genes. In particular, histone-lysine N-methyltransferase genes KMT2D, KMT2C and KMT2B in bladder, lung and endometrial cancers, whereas the KMT2A is mostly mutated in non-small cell lung cancer and colon cancer. Mutations in ARID1A are frequent in non-small cell lung cancer, colon cancer, bladder cancer and breast cancer, whereas mutations in KEAP1 and STK11 was predominate in non-small cell lung cancer (8.62% and 11.75% respectively) (Fig. 5B). KRAS mutations are typically mutually exclusive, with recurrent activating mutations (KRAS (Gly 12) and KRAS (Gly 13) common in colon cancer, non-small cell lung cancer and pancreatic cancer. We compared the most common hotspot mutations in KRAS between CUP and other KRAS mutation enriched cancer types (Fig. 5C). Comparing hotspot mutations resulted to enrichment of G12D and G12R in pancreatic cancer, G12C, G12F and G13C in non-small cell lung cancer and CUP samples. These data highlight similarity of KRAS hotspot mutations between CUP and NSCLC.
Targetable mutations and drug candidates
To identify or predict possible therapeutics based on genomic alterations identified in SMG in CUP samples, we performed a gene-drug association analysis using PanDrugs platforms 17. The gene-drug associations classified into two groups called “Drug targets” in which drugs can directly target genes that contribute in disease phenotype, and “Biomarkers” where genes are representing a drug-response associated status while its protein products are not targetable 17. From 262 identified interactions, 8.7 % (23/262) was classified as a direct drug target, while 91.3 % (239/262) of gene-drug interactions identified as Biomarker (Fig. 5D). Interestingly, we found five FDA approved drugs, Crizotinib (GScore: 0.76. Dscore: 0.95) and Copanlisib (GScore: 0.76. Dscore: 0.92), Debrafenib, Sorafenib, Vemurafenib, and Regorafenib as best candidate for targeting ALK/MET, PIK3CA, and BRAF inhibitors respectively (Fig. 5D. Supplementary Table 5). Moreover, various off-label and clinically investigating compounds for targeting mutated KRAS were identified, although the GScore and DScore of these compounds did not reach a high score (Supplementary Table-5). Everolimus (mTOR inhibitor), Bortezomib (26S proteasome inhibitor), and Pemetrexed (chemotherapy agent), were identified with the highest GScore and DScore compared to the other drugs candidates in this group (Fig. 5D. Supplementary Table-5). Taken together, these data highlight presence of at least one druggable variants and potential of using genomic alteration guided targeted therapy in CUP patients.