Spreading pattern of pan-cancer metastasis
To comprehensively profile the spreading pattern of metastatic cancers and enhance statistical power to identify underlying prognostic genomic biomarkers, we integrated the clinical and genomic data of 32,176 primary and metastatic cancer samples from four studies (Supplementary Fig. 1a): 10,946 samples from MSK-IMPACT (MSK), 18,004 samples from Foundation Medicine Inc. (FMI), 500 metastatic samples from the University of Michigan (MET500) and 3,336 primary samples of lung, breast, colon, and prostate cancers from The Cancer Genome Atlas (TCGA). Grouping tissue of origin (Supplementary Table 1) and sampling location (Supplementary Table 2) of each tumor sample into general anatomic organ sites and removing the minority for analysis convenience, we constructed a spreading diagram of pan-cancer metastasis (Fig. 1a), originating from 16 distinct primary sites and migrating toward 16 metastatic sites. Lymph nodes, serving as an intermediary station of distant metastasis, are found to be seeded by all the 16 types of cancers (Supplementary Fig. 1b). Besides lymphatic spread, this spreading pattern can in part be interpreted by canalicular connection, such as gallbladder and pancreas cancers spreading to liver (Supplementary Fig. 1c), and body compartments, such as lung and breast cancers to chest cavity (Supplementary Fig. 1d), and colorectal and ovarian cancers to abdomen (Supplementary Fig. 1e). Besides lymph nodes, soft tissue and body cavities as the locoregional metastatic sites, the top ranking distant metastatic organs are liver, bone, lung, and brain, all of which were intensively studied in organ-specific metastasis 15.
To investigate what types of cancers exhibit common spreading patterns, we constructed a primary cancer network (PCN, Fig. 1b) based on the similarity of fractional distribution at metastatic sites among the 16 primary cancers. Through network clustering analysis, we identified four clusters of primary cancers, within which the primary-cancer organs exhibiting similar preference of metastatic direction are from the same functional system. For example, the organs in digestive system consisting of esophagus, stomach, gallbladder, pancreas, and colon, are tightly clustered together, so are those in urinary system (kidney, bladder, and prostate) and gynecological system (ovary and uterus). Similarly, we constructed a metastatic site network (MSN) by clustering the 16 metastatic sites based on their similarities of metastatic cancer types they receive, and identified three major clusters (Fig. 1c): a brain-lung centric cluster at the upper part of the body, a liver centric cluster at the lower part of the body, and a bone-lymph-node centric cluster in between. This cluster separation is consistent with the pattern interpreted by the body compartments observed in Fig. 1a.
To further investigate what are the genetic factors mediating this complex spreading pattern and the underlying clinical implication, we developed MetaNet, a computational framework to predict when and where a tumor spread based on its genomic profile at the primary stage (Fig. 1d). In general, MetaNet consists of two models: Model 1 to predict when a primary tumor will metastasize via learning the genomic difference between primary and metastatic tumors, and Model 2 to predict where a primary tumor will colonize via capturing the genomic features among organ-specific metastases (Fig. 1d). Through scoring metastatic competence and organ-specificity of each tumor based on its genomic profile by the models, we further evaluate the prediction accuracy of the metastasis from the primary, and interrogate what are the associated genetic factors that contribute to the prediction. We finally validate our models through prognostic analysis using independent cohorts of primary tumors that are classified into different risk groups by MetaNet.
Identification and characterization of metastasis-featuring primary tumors
To identify genomic variants associated with metastasis, we compared the proportion of each mutation, copy-number alteration (CNA), chromosome-arm alteration, and the ten oncogenic pathway aberrations 16 in the 16 cancers in primary and metastatic stages (Supplementary Fig. 2a). In general, there are more variants significantly enriched in the metastasis than those in the primary (Fig. 2a and b), indicating that metastasis evolving from primary cancer is a selective process along which the tumor gains metastatic competence through additional variations. Notably, the most significantly enriched variants in the metastasis include ESR1 (estrogen receptor 1) mutation in metastatic breast cancer (FDR < 1e-6, z-test, Benjamini-Hochberg (BH) correction) and AR (androgen receptor) mutation and copy-number amplification in metastatic prostate cancer (FDR < 1e-6, z-test, BH correction). Previous studies reported that the ESR1 mutations were commonly observed in recurrent breast cancer with resistance to hormonal therapy 17,18. Similarly, AR variations have also proven to be the molecular mechanism of resistance to androgen-depletion therapy 19,20. In addition, we also observed an increased number of copy-number and chromosome-arm alterations in the metastatic tumor genomes (Fig. 2b), which is consistent with previous study reporting a highly instable genomic structure in metastatic tumors 21. Within those significant CNAs, we found that MYC amplification is the only variant across two different cancer types: metastatic prostate and pancreas cancers (Supplementary Fig. 2a), suggesting its common role in promoting cancer metastasis 22.
Given the observed differences in the genomic profile between primary and metastasis (Fig. 2a and b, and Supplementary Fig 2a), we established MetaNet Model 1, a machine-learning module based on xgboost 23, a gradient boosting tree model to stratify patients with primary tumor into different metastatic risk groups using the tumor genomic profiles (Fig. 1d). In particular, we first trained the model to identify metastatic tumors from primary tumors in four common cancer types (breast, lung, colon, and prostate) using clinical, histological and genomic features of the tumors. Learning the distinct features between primary and metastatic tumors, the model was then able to estimate the likelihood of one tumor being metastatic, termed Metascore. Using cross-validation, we showed that compared to the baseline models trained only by the clinical and histological features without the genomic data, the genomics-based model can accurately identify the metastatic tumors from the primary in the breast and prostate cancers (Area Under the Receiver Operating Characteristic curve, AUROC > 0.8), rather than in the lung and colon cancers (Fig. 2c and Supplementary Fig. 2b). This result demonstrated that the primary and metastatic breast and prostate tumors are genomically different, while in lung and colon cancer the genomes are alike, which is similar with our observation in the comparison of the genomic profiles (Fig. 2b). From an evolutionary perspective, it suggests that unlike lung and colon cancers, breast and prostate cancers may follow certain evolutionary modes in which only novel clones resistant to hormone treatments can thrive in the metastasis. In terms of clinical implication, disease-free survivals of breast and prostate cancer patients are generally longer than those with lung and colon cancers 24, during which the metastases of breast and prostate cancers have longer time to evolve and acquire more variants than those of lung and colon cancers under the assumption of constant mutation rate.
To further understand how the model predicts metastatic risk and what are the associated genomic variants used in the model, we used SHapley Additive exPlanations (SHAP) value 25 to untangle the tree-based model, providing gene-wise contribution to the metastatic risk of breast cancer (Fig. 2d), in which the model achieved the highest accuracy among the four common cancer types. Consistent with our genomic comparison between primary and metastasis (Supplementary Fig. 2a), the ESR1 mutation is found by the mean SHAP value as the most predictive feature of metastatic breast cancer, followed by FGFR4 mutation, SOX2 amplification, ERBB2 mutation, and FGFR1 mutation (Fig. 2d). ERBB2 mutation has been found to be associated with resistance to hormone therapy through a distinct mechanism from the ESR1 mutation 26. In contrast, the top predictive variants of low metastatic risk in breast cancer are DAXX, MCL, and GATA3 amplification. Interestingly, a previous study has uncovered that GATA3 plays a suppressive role in breast cancer metastasis by inducing microRNA-29b expression which targets a set of pro-metastatic regulators 27.
Even though the genomics-based model achieved AUROC of 0.82 and 0.8 in distinguishing metastatic versus primary breast cancers and prostate cancer, respectively (Supplementary Fig. 2b1-2), the misclassification rate is not ignorable. Examining the Metascore distributions in the true primary and metastatic sample categories, we showed that the misclassification rate is mainly contributed by a high false positive rate in breast cancer (Fig. 2e) and prostate cancer (Supplementary Fig. 2c), meaning that the model overrates a subset of primary tumors as metastatic ones. This overrated subset of primary tumors, even though labeled as primary, might carry metastasis-enriched features, which makes them genomically more similar to the metastatic tumors other than the primary tumors. We therefore deemed this subset of primary tumors as Metastasis-Featuring Primary (MFP) tumors (n = 382, Fig. 2f) and the other primary as Conventional Primary (CP) tumors (n = 1,255, Fig. 2f) based on a Metascore cutoff of 0.5 (Fig. 2e). To illustrate whether the MFP tumors in fact carry the metastasis-enriched features, we calculated the fraction of the top predictive variants (Fig. 2d) in the MFP, CP, and metastatic (M) breast cancers. Consistently, we found that the MFP tumors harbor more metastasis-enriched features, such as the ESR1 and ERBB2 mutations than the CP tumors (Fig. 2g). Conversely, the MFP tumors carry less primary-enriched features, such as MCL1 and GATA3 amplifications than the CP tumors (Fig. 2g), which indicates that more metastasis-enriched features and less primary-enriched features together shift the MFP tumors away from the conventional primary toward real metastasis on the genomic scale defined by our Metascore (Fig. 2f).
Transcriptomic characteristics and prognostic value of metastasis-featuring primary tumors
To explore the biological and clinical implications of the MFP tumors, we collected the genomic, transcriptomic, and clinical data of TCGA breast cancer cohort 28,29 consisting of 1,079 primary breast cancer samples. Feeding the identical features from the clinical, histological and genomic data of TCGA samples into the trained model, we estimated the metastatic risk of each TCGA sample by computing their Metascore. The top predictive genomic features identified in the training phase (Fig. 2d) contributed similar predictive power to the metastatic risk estimation of each TCGA sample (Fig. 3a), highlighting the robustness of these predictive features regardless of the variation caused by batch effect and other covariates. In particular, the seven TCGA primary breast cancer samples carrying ESR1 mutation are all deemed as MFP tumors, highlighting ESR1 mutation is a remarkable feature that can provide early warning signal of high metastatic risk in the patients with primary tumors, but their metastatic lesions are not obvert yet. Indeed, ESR1 mutation has been used to monitor the resistance of hormone treatment via liquid biopsy, the measurement of cell-free DNA in the blood of cancer patients 30.
To understand the functional consequence of primary tumors harboring metastatic features we subsequently studied gene expression data. As different receptor-defined subtypes of breast cancers exhibit distinct expression patterns, we split TCGA breast cancer samples into the four classical subtypes 28,29: luminal A, luminal B, HER2-enriched, and basal-like, and then compared the transcriptomic profiles between the MFP and the CP tumors within each subtype. Notably, we found that the upregulated genes in the MFP tumors are significantly enriched in the epithelial–mesenchymal transition (EMT) in both HER2-enriched and basal-like subtypes, while the downregulated genes are significantly enriched in the functions related to cell cycle proliferation, such as G2M checkpoint and E2F targets (FDR < 0.0001, Gene Set Enrichment Analysis (GSEA), Fig. 3b and c). This pattern was not found in the GSEA of the other two hormone-related subtypes and the breast cancer in general (Supplementary Fig. 3a-c). A previous study observed the same reverse pattern between EMT and cell proliferation through modulating CDH1 expression in MDA-MB-468, a triple-negative breast cancer cell line 31, which in part supports our observation.
To validate whether the MFP tumors have high risk of metastasis, we compared the clinical outcomes of the patients classified into the MFP and the CP groups. Filtering the samples with the survival data available in TCGA, we showed that the patients with MFP tumors have significantly shorter disease-free survival (DFS) than those with CP tumors (p value < 0.0001, log-rank test, Fig. 3d). Similarly, worse clinical outcomes with shorter DFS were found in the patients with the MFP prostate (Supplementary Fig. 3d) and the MFP lung cancers (Supplementary Fig. 3e), which collectively demonstrates that the MFP tumors are more progressive than the CP tumors. To validate that the Metascore, our genomic estimation of metastatic risk, is an independent predictor of disease progression, we compared the DFS between the patients with MFP tumors and those with CP tumors within each breast cancer subtype, and found consistently worse DFS in the MFP group within each of the four subtypes (Fig. 3e and Supplementary Fig. 3f). Moreover, we used multivariate Cox regression to collectively evaluate the predictive power of the breast cancer subtypes and the Metascore-defined MFP/CP stratification. Strikingly, the hazard ratio of MFP over CP is 3.9 (2.2~7.0, 95% confidence interval), which is significantly higher than the base value of 1, and is independent of the subtypes (Fig. 3f). Even though previous study has demonstrated significantly worse overall survival of basal-like breast cancer than those of hormone-positive subtypes 32, the subtypes, however, did not exhibit strong predictive power to the disease progression when standing with our genomics-based stratification (Fig. 3f). Taken together, we demonstrated that our genomic stratification of metastatic risk is significantly powerful and independent of conventional hormone-based subtyping in breast cancer.
Profile of metastatic organotropism
To explore the spreading preference of cancer metastasis, we curated two large-scale and independent metastatic cancer datasets from MSKCC (n = 3,636) and Foundation Medicine Inc. (n = 6,402), in order to investigate whether this organ-specific metastasis, namely metastatic organotropism, is a statistically robust phenotype. Comparing the fractional differences of metastatic cancers located in the 16 metastatic sites from the 16 tissue of origins between the two independent cohorts (Supplementary Fig. 4a), we discovered a remarkably significant correlation (Pearson Correlation Coefficient, PCC = 0.81, p < 0.0001, Fig. 4a) in a pan-cancer scale. Notably correlated metastatic cancer types include the liver metastasis of pancreatic cancer (140 out of 203 in MSK versus 263 out of 416 in FMI, p = 0.16, proportion test), the liver metastasis of breast cancer (213 out of 794 in MSK versus 386 out of 1388 in FMI, p = 0.62, proportion test), and the chest-cavity metastasis of lung cancer (123 out of 652 in MSK versus 353 out of 1689 in FMI, p = 0.27, proportion test), none of which is significantly different between the two independent collections. Collectively, given that 15 out of the 16 cancer types (9,849 out of 10,038 samples) exhibit significant correlation of metastatic site distributions between MSK and FMI cohorts (PCC > 0.5, p < 0.05, Supplementary Fig. 4a), we concluded from a big-data perspective that dissemination direction in majority of metastatic cancers is strongly organotropic in a statistically robust manner, which implies that organotropic metastasis is highly nonrandom and in part driven by certain potential factors including tissue of origin, vascular pattern, genetic background and congenial microenvironment 15,33,34.
To further visualize which organ is the predominant metastatic destination in each cancer type, we compared the fractions of cancer samples at the four common distant metastatic organs: bone, brain, liver and lung (Supplementary Fig. 4b), by projecting the normalized fractions into a tetrahedron space (Fig. 4b). Interestingly, we uncovered two cancer groups: one is liver-tropic and the other is lung-tropic. The liver-tropic group consists of five cancer types all from the digestive system (gallbladder, pancreas, stomach, colon and esophagus), which is in part due to vascular structure and anatomic proximity. The lung-tropic group consists of the cancer types from head and neck, thyroid, uterus, skin and kidney, most of which are located close to the lung. These two groups indeed explained that the cluster formation in the primary cancer network (Fig. 1b) is due to the predominant single-organ tropisms in liver and lung. In addition, we observed widely reported organotropisms, including bone metastasis of prostate cancer and brain metastasis of lung cancer 35,36. The other cancer types consisting of bladder, ovary and liver cancers, were not located close to any single corner, indicating their metastatic organotropisms are not dominated by one single organ (Supplementary Fig. 4b).
Given that metastatic organotropism is a stable biological phenomenon (Fig. 4a), we further investigate its underlying clinical value, i.e., whether the metastases at different organs impact patient survival. Using the overall survival (OS) data available in the MSK cohort, we compared the OS differences of the four common cancers spreading to the four common metastatic organs based on the metric of the area under the Kaplan-Meier plot (equivalently as mean survival) instead of median survival which cannot be computed in long-surviving cancers, such as prostate cancer. Generally, in all the four cancer types the patients with metastatic cancer have remarkably shorter survival than those with primary cancer (Fig. 4c and Supplementary Fig. 4c1-4). Particularly, the patients with metastatic prostate cancer at the liver have significantly worse survival than those at the bone (Fig. 4c), which implies that predicting potential metastatic sites can provide prognostic value for the prostate cancer patients. Unexpectedly, brain metastasis does not always indicate worse survival than the other metastases: in breast and colon cancers the brain metastases have worse survival than the liver metastases, while this observation is opposite in the metastatic lung cancer (Fig. 4c).
We next investigate whether genetic variation contributes to the metastatic organotropism. Given the abundance of copy-number and chromosome-arm alterations significantly enriched in metastatic cancers (Fig. 2b), we further investigate whether those alterations are uniformly distributed in all metastatic sites or specifically enriched in certain one. Using the fraction of genome altered (FGA) estimated in the MSK-IMPACT study 37, we found a dramatic increase of FGA in the brain metastases compared to the non-brain metastases and the primary tumors in 10 out of the 16 cancer types (Fig. 4d and Supplementary Fig. 4d), especially in the lung, breast, and colon cancers (Fig. 4d), all of which are the top-ranking origins in brain metastasis (Supplementary Fig. 4e) 38. Previous study has demonstrated that chromosomal instability, featured by high FGA, is a driver of metastasis through a cytosolic DNA response 21. Recent study discovered MYC amplification, in particular, is required in the brain metastasis of lung cancer using patient-derived xenograft mouse models 39, which enlightened us to further characterize each organ-specific metastasis from gene-wise perspective.
Genomic characterization of metastatic organotropism
To comprehensively identify variants associated with metastatic organotropism in our curated large-scale dataset, we selected the metastatic samples located at bone, brain, liver and lung, and screened for the variants whose fractions in the four metastatic sites have significant deviation from the average fraction in the metastases. Using false discovery rate control, we identified 89 organotropic variants and features in total with fractional bias in certain metastatic sites significantly deviating from the average (Chi-square test, FDR < 0.1, variant fraction in the metastases > 1%, Supplementary Fig. 5a and Supplementary Table 3), most of which are found in the metastatic cancers originating from the colon (n = 18), the breast (n = 15) and the lung (n = 13). Almost two thirds of the organotropic variants have fractional enrichments in the brain metastasis (59 out of 89), whereas only 6 variants are enriched in the lung metastasis and 4 in the liver metastasis (Supplementary Fig. 5a). Within those 59 brain-tropic variants, 17 are CNAs, suggesting those altered genes might be the key factors among the abundant CNAs enriched in the brain metastasis (Fig. 4d).
Among the 15 organotropic variants in breast cancer, 11 are most abundant in brain metastasis, 3 in liver metastasis and 2 in bone metastasis (Supplementary Table 3). To visualize the organ-specific enrichment of each variant, we calculated the odds ratio of the variant fraction in one metastatic site over that not in the site, termed organotropic odds ratio (OGTOR), and projected the normalized OGTORs into a tetrahedron with the four corners representing the four metastatic sites (Supplementary Fig. 5b). None of the variants are shown to locate close to the lung metastasis corner, and the most abundant variants and features in the lung metastasis, such as p53, Ras and Myc patways (67%, 59% and 56%), are found to be more abundant in the brain metastasis (80%, 77% and 73%). Projecting the OGTORs into a triangular space (Fig. 5a) instead and highlighting the variants with variant fraction larger than 5% and FDR less than 0.05, we clearly showed that the top liver-tropic variant of the breast cancer is ESR1 mutation, the bone-tropic variant is CDH1 mutation, and the brain-tropic variants include TP53 mutation, CDK12 and ERBB2 amplifications. Particularly, the ESR1 mutation dramatically shifts the distribution of metastatic breast cancer destinations with a sharp increase in liver (from 52% to 75%) accompanied with fractional decreases in brain and lung (p < 0.0001, Chi-square test, Fig. 5b). Further examining each ESR1 mutation in the breast cancer samples (n = 405, 129 from MSK, 262 from FMI and 14 from MET500), we found that the liver metastasis enrichment is in fact primarily contributed by four hotspot positions located at the ligand-binding domain (LBD): D538, Y537, L538 and E380 (n > 20, Fig. 5c), all of which are spatially close to each other and have been demonstrated to give rise to estrogen-independent activation of downstream signaling and promote cellular proliferation 40. The CDH1 mutation enriched in the bone metastasis, was identified as a featuring loss-of-function mutation in the invasive lobular carcinoma 29 that leads to dysregulation of cell-cell adhesion with a discohesive phenotype 41. The brain-tropic variants TP53 mutations and ERBB2 amplification are in fact enriched in triple-negative and HER2-enriched breast cancers, respectively 28. Previous epidemiological study showed that bone metastasis of breast cancer is common across all the subtypes except the basal-like one 42, whereas liver metastasis is enriched in hormone-positive subtypes 43 and brain metastasis is enriched in HER2-enriched 44 and triple-negative subtypes 45, all of which are highly consistent with our organotropic variation enrichment analysis.
Among the 13 organotropic variants in lung cancer, 10 are most abundant in brain metastasis and 3 in bone metastasis (Supplementary Table 3). No variant is significantly enriched in the liver metastasis, and we ruled out the lung metastasis due to its small sample numbers (n = 32) and missing annotation regarding regional relapse or distant metastasis from one site of the lung to the other. A significantly high mutation burden (larger than 20 mutations per megabase) was identified in the brain metastasis (p < 0.0001, Chi-square test, Supplementary Fig. 5c), featured by the enriched CREBBP and EPHA5 mutations (Fig. 5d). The STK11 mutation was significantly enriched in the bone metastasis (p = 0.0002, Chi-square test, Supplementary Fig. 5d), and the aberration of its involved PI3K pathway was found to significantly increase the fraction of brain metastasis (p = 0.0379, Chi-square test, Fig. 5e). Interestingly, the STK11 mutation, even though more enriched in the bone metastasis, are found to be the most abundant variant in the brain metastatic samples of lung cancer harboring the aberration of PI3K pathway (n = 146, Supplementary Fig. 5e). Previous study has demonstrated that STK11 is a tumor suppressor and its loss-of-function mutation is involved in the morphological change from adenocarcinoma to squamous cell carcinoma, which further promotes lung cancer metastasis 46. Through analyzing our previous pharmacogenomic dataset that screened 60 anti-cancer drugs in 462 patient-derived cell lines (PDCs) 47, we showed that five PI3K-pathway inhibitors rank within the top seven most efficacious drugs for the 23 PDCs of lung cancer brain metastasis (LUBM). Three of the five PI3K-pathway inhibitors (gedatolisib, everolimus and vistusertib) target MTOR 48, a downstream effector of the PI3K pathway. Further comparison of the drug efficacies in the 23 LUBM PDCs versus those in the other 439 PDCs showed that the three MTOR inhibitors exhibit significantly high specificity (FDR < 0.01, t-test, Fig. 5f and Supplementary Fig. 5f). Collectively, this result verified an enrichment of aberrantly activated PI3K pathway and its clinical actionability in the brain metastasis of lung cancer.
Among the 18 organotropic variants in colon cancer, 10 are most abundant in brain metastasis, 7 in bone metastasis and 1 in liver metastasis (Supplementary Table 3). No significant variants are found in the lung metastasis (Supplementary Fig. 6a). Besides the aberration of TGF-β pathway enriched in the liver metastasis, featured by SMAD4 mutation and deletion, the other significant variants are mainly amplifications located at chromosome 13q, together with Ras pathway activation featured by KRAS mutation (Fig. 5g). Among those brain-tropic amplifications, the most significant one is CDK8 amplification which gives rise to a fractional increase of the brain metastasis from 2% to 11% (p = 0.0007, Chi-square test, Fig. 5h). Even though CDK8 is amplified together with its chromosomal neighbors FLT1 and FLT3, we collected the transcriptomic data from TCGA 49 and showed that only the amplification of CDK8, rather than those of FLT1 and FLT3, are functional through elevation of the corresponding expression (Supplementary Fig. 6b). Previous study has demonstrated the oncogenic role of CDK8 amplification in colon cancer cell proliferation as a positive mediator of β-catenin-driven transformation in WNT pathway 50. Using paired samples of primary and brain metastasis colon cancer from the same patients in two recent studies 8,51, we demonstrated that the CDK8 amplification is not a newly emerged event in the brain metastasis but inherited from the primary tumors (12 out of 14, Supplementary Fig. 6c). Comparing the transcriptomic profile of CDK8-amplification primary colon cancer in TCGA versus the non-amplified cases, we showed that CDK8 amplification is in fact associated with the promotion of epithelial mesenchymal transition (GSEA, FDR < 0.0001, Supplementary Fig. 6d) and downregulation of cell proliferation (GSEA, FDR < 0.05, Supplementary Fig. 6e), implying a role of CDK8 amplification in distant metastasis of colon cancer. Furthermore, we used the clinical data of TCGA to show that the colon cancer patients with CDK8 amplification were diagnosed with more lymph node spread (p = 0.006, proportion test, Supplementary Fig. 6f) and have significantly shorter DFS (p = 0.003, log-rank test, Fig. 5i). Collectively, all of the evidences pinpointed that the colon cancers with CDK8 amplification are more progressive with strong potential in distant migration toward brain. Given that colon cancer metastasis follows sequential cascade from colon to liver, and lung 15, we inferred that brain metastasis of colon cancer also follows this cascade driven by the blood vasculature and keeps migrating from lung into heart and eventually into brain through neck artery. The CDK8-amplification fractions in the primary, liver, lung and brain metastases exhibit a significant increased trend (p = 0.003, trend test, Supplementary Fig. 6g), which suggests that CDK8 amplification is positively selected during the cascade.
Organotropic stratification of primary tumors
Even though we only identify one organotropic variant, the MSH6 mutation in lung metastasis of prostate cancer (FDR = 0.07, Supplementary Table 3), we observed from the survival analysis (Fig. 4c) that the patients with the primary prostate cancer, the bone metastasis and the liver metastasis suffer a dramatic decrease of the mean survival time as 33 months, 24 months and 13 months, respectively, given a 40-month follow-up (Fig. 4c and 6a). The pairwise comparisons between these three groups all yielded significant differences (p < 0.001, log-rank test, Fig. 6a), which is consistent with previous epidemiological statistics 14. Enlightened by this fact, we therefore developed MetaNet Model 2 (Fig. 1d), an organotropism-based prognostic system that stratifies patients into different risk groups depending on the propensity of metastatic destination. Using an ordinal regression model framework, we trained the MetaNet Prognosis module using the combined dataset of the MSK and FMI prostate cancer cohorts (Fig. 6b), and achieved an accuracy of 64.3% in the three-class prediction task. Next, we applied our MetaNet prognosis module to an independent cohort of primary prostate patients from TCGA 52, and stratified them into three risk groups (Fig. 6b): conventional primary (CP, n = 237), bone-metastasis-featuring primary (Bone-MFP, n = 174), and liver-metastasis-featuring primary (Liver-MFP, n = 83). Strikingly, the three risk groups have a similar decreasing trend in DFS, and pairwise comparison of the DFS between the three groups yields significant differences: p = 0.04 in CP vs. Bone-MFP, p = 9e-4 in Bone-MFP vs. Liver-MFP, and p = 6e-8 in CP vs. Liver-MFP (Fig. 6c). This suggests that the organotropism stratification can inform clinicians to perform organ-specific examination during the follow-ups of high-risk patients.
To display the mechanism of our genomics-based stratification model, we showed the corresponding fractions of the predictive variants in each stratified group (Fig. 6d). In general, the fractions of these predictive variants in each risk group exhibits an increased trend from CP to bone- and liver-MFP groups, featured by the aberration of cell-cycle, p53 and PI3K pathways, indicating a sequential process of malignancy that is concordant with the survival pattern (Fig. 6c). In particular, we noticed that the FGA over 5% is a highly distinguishable feature for CP (31%) versus bone- (88%) and liver-MFP (99%) patients, together with two featuring CNAs: CDKN1B deletion and AR amplification (Fig. 6d), which is consistent with previous study solely using FGA to predict patient survival of prostate cancer 53. The predictive and significantly more abundant variant of bone-MFP group than that in liver-MFP group is SPOP mutation (p = 0.0007, proportion test), which has been found to represent a distinct subtype of prostate cancer that is mutually exclusive to the common E26 transformation-specific (ETS) transcription family fusions 54,55. The CDK12 mutation, even though has low fractions in all the three groups (0% in CP, 2.8% in bone-MFP and 5.9% in liver-MFP), has also been demonstrated to increase genomic instability 56 and aggressiveness 57 in prostate cancer.
Current standard grading system of prostate cancer primarily relies on the Gleason score based on the morphological features of two lesions in histological images. We compared our genomics-based organotropic stratification with the Gleason grading in TCGA cohort and found a strictly increasing median score of our genomics stratification within each Gleason grade, indicating a high consistency between the two independent systems using genomics and histology, respectively (Fig. 6e). In particular, our genomics-based system stratifies more CP patients into the low Gleason-grade group of and more liver-MFP patients into the high Gleason-grade group. This suggests that integrating genomic profiles of metastatic prostate cancers to stratify metastatic risks of primary prostate cancer patients can provide additional dimension for more precise diagnosis and prognosis.