Genomic events shaping epithelial-to-mesenchymal trajectories in cancer

9 The epithelial to mesenchymal transition (EMT) is a key cellular process underlying cancer 10 progression, with multiple intermediate states whose molecular hallmarks remain poorly 11 characterized. To fill this gap, we explored EMT trajectories in 8,778 tumours of epithelial 12 origin and identified three macro-states with prognostic and therapeutic value, attributable to 13 epithelial, hybrid E/M (hEMT) and mesenchymal phenotypes. We show that the hEMT state is 14 remarkably stable and linked with increased aneuploidy, APOBEC mutagenesis and hypoxia. 15 Additionally, we provide an extensive catalogue of genomic events underlying distinct 16 evolutionary constraints on EMT transformation, including novel pan-cancer dependencies of 17 hEMT on driver genes PRRX1, BCOR and CNOT3 , as well as links between full 18 mesenchymal transformation and REG3A and SHISA4 mutations in lung and breast cancers, 19 respectively. This study sheds light on the aetiology of the lesser characterised hybrid E/M 20 state in cancer progression and the broader genomic hallmarks shaping the mesenchymal 21 transformation of primary tumours.


26
The epithelial to mesenchymal transition (EMT) is a cellular process in which polarized 27 epithelial cells undergo multiple molecular and biochemical changes and lose their identity in 28 order to acquire a mesenchymal phenotype 1 . EMT occurs during normal embryonic 29 development, tissue regeneration, wound healing, but also in the context of disease 1,2 . In 30 cancer, it promotes tumour progression with metastatic expansion 3 . Recent studies have 31 uncovered that EMT is not a binary switch but rather a continuum of phenotypes, whereby 32 multiple hybrid EMT states underly and drive the transition from fully epithelial to fully 33 mesenchymal transformation 4,5 .

34
Elucidating the evolutionary trajectories that cells take to progress through these states is key 35 to understanding metastatic spread and predicting cancer evolution.

36
The transcriptional changes accompanying EMT in cancer have been widely characterised  Table S2). Therefore, our method is able to detect an additional 10% 117 of cases presenting early evidence for the phenotypic transformation required for metastasis.

118
Indeed, multiple studies have demonstrated the activation of the EMT transcriptional 119 programme in the early stages of cancer 10,27 . Even the hEMT phenotype is hypothesised to be 120 sufficient for promoting metastatic dissemination 28 , although our analysis of the MetMap tumours. In line with this hypothesis, these tumours also showed the highest exhaustion levels

172
Beyond the broader hallmarks discussed above, we sought to identify specific genomic 173 changes creating a favourable environment for EMT transformation. We prioritised cancer 174 driver mutations, focal and arm-level copy number changes that may be linked with EMT, and 175 implemented a lasso-based machine learning framework to identify those drivers able to and FGFR2 mutations was markedly increased at the mesenchymal level, suggesting that a 186 clonal fixation of these events may be key for the establishment of a fully mesenchymal state           Table S9).

293
Interestingly, the progression-free interval after oxaliplatin treatment was shorter in patients 294 with hEMT tumours (Supplementary Figure S8c), suggesting hEMT might be linked with poor 295 responses to this chemotherapy drug. To further explore potential links between EMT and 296 therapy responses, we investigated whether EMT progression might confer different levels of 297 sensitivity to individual cancer drugs using cell line data from GDSC 22 . We found 22 298 compounds whose IC50 values were significantly correlated with the EMT score (Figure 6e).

299
The strongest associations were observed with Acetalax, a drug used in the treatment of triple  ability to capture them, and this will be best studied in single cell datasets.

331
Our study confirmed previously established molecular hallmarks of EMT, including increased 332 genomic instability and hypoxia in hEMT, and cytotoxicity/exhaustion in mesenchymal 333 tumours 17 , which could inform immunotherapy strategies. We also highlighted mutational 334 processes that have increased activity in specific EMT states. Beyond an expected 335 association with ageing-induced damage along the EMT progression axis, we also found that

401
To evaluate the "robustness" of the EMT states we applied the same procedure described

438
The results of ConsesusTME were used as input for a multinomial logistic regression model.

439
The function multinom() (from the nnet R package) was used to determine the probability of 440 each sample belonging to a macro-EMT state based on the cellular content of the sample.

441
Fibroblasts related gene-sets were manually curated as described in Supplementary Table   442 S3. Fibroblasts enrichments scores were calculated via ssGSEA using the GSVA 67 R

463
(1) Considering the results obtained from deconstructSigs, the signatures with average 464 contribution (across all samples) greater than 5% were taken forward in the analysis.

465
(2) We combined the signatures obtained in (1) and by SigProfiler to obtain a final list of 466 signatures for the given tissue. If SBS1 and SB5 were not present, we added these 467 signatures manually.

468
To identify EMT-associated mutational processes we used a similar approach to the one

487
To search for genomic events linked with the described EMT macro-states, we considered all 488 somatic mutations, focal and arm-level copy number events in driver genes from the COSMIC 489 database that were obtained in the previous steps. Two parallel methodological approaches 490 based on lasso and random forest were used to identify events that could be predictive of employed as predictors. A similar approach was employed for feature selection and model 500 building with random forest.

501
In the second step, ROC curves were generated on the test dataset (20% of the data). In 502 addition, the predictors obtained from the two pipelines were also used as input for random 503 forest (ranger implementation), gradient boosting (gbm) and Naïve Bayes models. In these 504 cases, the trainControl() function (from the caret R package) was used in a 5-fold cross-

529
First, we removed from the analysis all the samples where the metastatic site was 1) Blood 530 vessel, 2) Brain, 3) Spinal Cord or 4) Not available. Copy number events were classified as 531 broad or focal as described above for the MET500 data. Only regions where the absolute 532 copy number was greater than 1 and which were overlapping driver genes as annotated in 533 COSMIC were considered for downstream analysis. The frequency of each marker (e.g.,

534
TGFBR2_focal) was compared between the primary and metastatic tumors using a Chi-535 square test. The resulting p-values were adjusted using Benjamini-Hochberg multiple testing 536 correction. Finally, only the markers with p-value<0.05 and odds ratio >1 were selected. The 537 same approach was applied for point mutations.

544
Silent mutations were excluded.

546
To understand relevance of the hypothesised biomarkers to the metastatic dissemination of 547 various cancer cell lines, we downloaded the experimentally measured metastatic potential 548 levels for cancer cell lines from MetMap 19 . We compared metastatic potential between samples with and without a specific EMT marker event (mutations or copy number 550 alterations), pan-cancer and by tissue. Only the markers that were linked with the hEMT or 551 MES states and that showed a statistically significant difference (p < 0.05) in metastatic 552 potential between the two groups (with and without alteration) have been considered.   and therefore obtain a "reference" for the single cell EMT trajectories. Using the "derived" X, Y

603
To identify the genes positively selected in each EMT tissue-specific cluster, we ran dNdScv 73 604 in each cluster separately. To detect recurrent copy number alterations in lung 605 adenocarcinoma and breast cancer, we considered all genes with alterations in at least 10 606 patients and applied a Fisher's exact test to identify those that appeared enriched in a 607 particular EMT state. Only the genes with a q-value < 0.05 have been reported.

609
We downloaded the drug sensitivity data from the Genomics of Drug Sensitivity in Cancer

886
The corresponding EMT scores are displayed above. (j) Signature contributions from SBS1 887 and SBS13 compared between the three EMT states.

891
GISTIC to prioritise mutated genes and copy number events, respectively. These genomic