Comprehensive analysis of oncogenic fusions in mismatch repair deficient colorectal carcinomas by sequential DNA and RNA next generation sequencing

Background Colorectal carcinoma (CRC) harboring oncogenic fusions has been reported to be highly enriched in mismatch repair deficient (dMMR) tumors with MLH1 hypermethylation (MLH1me+) and wild-type BRAF and RAS. In this study, dMMR CRCs were screened for oncogene fusions using sequential DNA and RNA next generation sequencing (NGS). Results Comprehensive analysis of fusion variants, genetic profiles and clinicopathological features in fusion-positive dMMR CRCs was performed. Among 193 consecutive dMMR CRCs, 39 cases were identified as MLH1me+ BRAF/RAS wild-type. Eighteen fusion-positive cases were detected by DNA NGS, all of which were MLH1me+ and BRAF/RAS wild-type. RNA NGS was sequentially conducted in the remaining 21 MLH1me+ BRAF/RAS wild-type cases lacking oncogenic fusions by DNA NGS, and revealed four additional fusions, increasing the proportion of fusion-positive tumors from 46% (18/39) to 56% (22/39) in MLH1me+ BRAF/RAS wild-type dMMR cases. All 22 fusions were found to involve RTK-RAS pathway. Most fusions affected targetable receptor tyrosine kinases, including NTRK1(9/22, 41%), NTRK3(5/22, 23%), ALK(3/22, 14%), RET(2/22, 9%) and MET(1/22, 5%), whilst only two fusions affected mitogen-activated protein kinase cascade components BRAF and MAPK1, respectively. RNF43 was identified as the most frequently mutated genes, followed by APC, TGFBR2, ATM, BRCA2 and FBXW7. The vast majority (19/22, 86%) displayed alterations in key WNT pathway components, whereas none harbored additional mutations in RTK-RAS pathway. In addition, fusion-positive tumors were typically diagnosed in elder patients and predominantly right-sided, and showed a significantly higher preponderance of hepatic flexure localization (P < 0.001) and poor differentiation (P = 0.019), compared to fusion-negative MLH1me+ CRCs. Conclusions We proved that sequential DNA and RNA NGS was highly effective for fusion detection in dMMR CRCs, and proposed an optimized practical fusion screening strategy. We further revealed that dMMR CRCs harboring oncogenic fusion was a genetically and clinicopathologically distinctive subgroup, and justified more precise molecular subtyping for personalized therapy. Supplementary Information The online version contains supplementary material available at 10.1186/s12967-021-03108-6.


Background
Colorectal carcinoma (CRC) represents one of the most common malignancies worldwide, ranking third and fifth for cancer-related deaths in United States and China, respectively [1]. Nowadays, there is an increasing recognition that AJCC-TNM staging is insufficient for personalized therapy. The molecular heterogeneity of CRCs has been widely emphasized, and proved to be of critical prognostic and therapeutic significance.
Oncogenic fusions have long been well-recognized as not only diagnostic or prognostic markers, but also potential therapeutic targets in different cancer types, including CRCs [2]. With the emerging introduction of fusion targeted therapy, efficient and accurate detection of druggable gene fusions is becoming increasingly important for clinical decision making. Fusion gene diagnosis was traditionally performed by fluorescence in situ hybridization (FISH) or quantitative real-time polymerase chain reaction (RT-PCR) assay. Despite the high sensitivity, these methods typically test for only one specific fusion gene, and provide very limited information of the fusion partners and breakpoints [3]. Targeted DNA-based next generation sequencing (NGS) has been proved to effectively detect common oncogenic fusions with high confidence. However, some gene fusions of high clinical relevance may be missed due to the insufficient coverage of large introns and blind-spot within the targeted areas [4]. By comparison, RNA NGS can overcome many of these limitations by conducting genomewide inspection of gene fusions with nucleotide-level resolution of genomic breakpoints, identifying both known and novel fusion genes, and delineating the fusion transcripts directly at the mRNA level [3,5]. Currently, RNA NGS has been proved to be an indispensable testing in routine diagnostics for sarcoma [6], and an important complement to DNA NGS for high yield detection of targetable gene fusions in non-small cell lung cancers [7,8]. Nevertheless, reports regarding RNA NGS in fusion gene diagnosis of other cancers, including CRCs, are still limited.
Previously, oncogenic fusions were considered to be rare molecular events in CRCs, presenting in less than 1% of unselected patients [9]. Due to the extremely low prevalence, universal assessment for gene fusions utilizing high-throughput methods in routine clinical practice could be expensive and time-consuming. A practical and efficient strategy to screen for such rare but clinically critical molecular alteration was highly warranted. Notably, we and others have recently uncovered that gene fusions were nearly exclusively detected, and significantly enriched in a specific molecular subtype of mismatch repair deficient (dMMR) CRCs, characterized by hypermethylated MLH1 (MLH1 me+ ) and wild-type BRAF/RAS [9][10][11]. A preliminary screening protocol using routine molecular pathological assays has also been proposed by us [10]. In the present study, we enlarged the sample size and incorporated RNA NGS in complement to DNA NGS for fusion detection, aiming to improve our prior fusion screening strategy, and achieve more comprehensive understanding of this rare CRC subtype.
In this study, DNA NGS was performed in a retrospective consecutive cohort of dMMR CRCs, whilst RNA NGS was sequentially conducted in MLH1 me+ BRAF/ RAS wild-type dMMR CRCs lacking oncogenic fusions by DNA NGS. We revealed that additional RNA NGS could efficiently enhance fusion detection, and accordingly proposed an optimizing strategy to screen for potential targetable gene fusions in CRCs using combined DNA NGS and RNA NGS. A complete review of fusion genes and variants was presented. Molecular genetic features and clinicopathological features in dMMR CRC with oncogenic fusions were also analyzed.

Patient selection
This retrospective study involved consecutive CRC cases (n = 2230) from July 2015 until June 2020 in Peking Union Medical College Hospital (PUMCH). All patients with materials included in the study underwent a partial colectomy for primary CRC. None of the patients were known to have received neoadjuvant therapy or tyrosine kinase inhibitor therapy prior to surgery. This study was approved upon ceding review by the PUMCH Institutional Review Board for review.

DNA and RNA extraction
DNA and RNA were isolated from formalin-fixed paraffin-embedded (FFPE) CRC specimens using Direct FFPE DNA Kit (Qiagen #A31133) and RNeasy FFPE Kit (Qiagen #73504), respectively, according to the manufacturer's protocols. oncogenic fusion was a genetically and clinicopathologically distinctive subgroup, and justified more precise molecular subtyping for personalized therapy.
Keywords: Mismatch repair, Colorectal carcinoma, RNA next generation sequencing, Gene fusion  [12] and OncoKB [13] annotation. Mutational significance of tumor suppressor genes was determined according to protocols described in our previous study [14], and only "predicted deleterious" mutations were included in the analysis.

RNA NGS
NEBNext rRNA Depletion Kit (Human/Mouse/Rat) (NEB #Z1955E) was chosen to remove the targeted ribosomal RNA (rRNA). All RNA with a percentage of RNA fragments > 200 nucleotides (DV200) ≤ 50% skipped fragmentation and proceeded to library preparation. After rRNA depletion and fragmentation, cDNA synthesis and NGS library preparation were performed using NEBNext ® Ultra ™ II Directional RNA Library Prep Kit (NEB#E7760L). The library was quantitated using Qubit 3.0 (life Invitrogen, USA) and quality was assessed with LabChip GX Touch (PerkinElmer, USA). After removal of terminal adaptor sequences and low-quality data by using fastp (version: 0.19.5) [15] and removal rRNA reads through aligning clean reads to rRNA database (download from NCBI) by using bowtie2 (version:2.2.8) [16], clean reads without known rRNA were aligned to the reference human genome (hg19) through STAR (version 020201) [17]. Fusions were detected by a customized version of Arriba 1.1.0. and annotated by in house software annoFilterArriba (version:1.0.0) with NCBI release 104 database. All final candidate fusions were manually verified with the integrative genomics viewer browser. A series of quality control metrics was computed by using RNA-SeQC assessment [18]. A threshold of ≥ 80 million mapped reads and ≥ 10 million junction reads per sample was set.

MLH1 promoter hypermethylation analysis
MLH1 promoter hypermethylation analysis was performed using methylation-specific PCR, with the protocol as previously described [10,14].

Statistical methods
Continuous variables were presented as mean ± standard deviation, and categorical variables were expressed as percentages. Chi-square test, Fisher's exact test, or Mann-Whitney test was used when appropriate for comparison between dMMR CRCs with fusion and dMMR CRCs without fusion. Statistical processing was performed using SPSS version 24 (SPSS Inc., Chicago, IL, USA) and P < 0.05 (two-sided) was considered statistically significant.

Complete review of gene fusions detected by sequential DNA and RNA NGS
DNA NGS was conducted in all 193 dMMR tumors, and identified eighteen genetic fusions (detailed in Additional file 2: Figure S1 and summarized in Fig. 1A). All gene fusions were exclusively presented in tumors harboring MLH1 promoter hypermethylation and lacking concurrent BRAF or RAS driver mutations. These fusionpositive cases by DNA NGS represented 9% (18/193) of all dMMR tumors, 19% (18/91) of MLH1 me+ tumors, and 46% (18/39) of MLH1 me+ tumors with wild type BRAF or RAS. NTRK1 fusions were the most frequent fusion events detected by DNA NGS, presenting in nine cases. All NTRK1 fusions were intrachromosomal rearrangements involving known NTRK1 partners. Six of these cases (6/9, 67%) harbored TPM3-NTRK1 fusions with three different fusion breakpoints: exon(e)7 to e10 (3/9, 33%), e7 to e9 (2/9, 22%) and e5 to e11(1/9, 11%). LMNA-NTRK1 fusions were found in two cases, with e9 to e12 and e10 to e10 fusion breakpoints, respectively. PLEKHA6-NTRK1 fusion with e22 to e10 fusion breakpoint was found in one case. NTRK3 gene fusions were identified in three cases, which were interchromosomal translocations with identical fusion breakpoints involving ETV6 e1-5 on chromosome 12 and NTRK3 e15-20 on chromosome 15. In-frame ALK gene rearrangements were found in three cases. Two of them were well-reported fusions connecting STRN e3 to ALK e20. Another one showed a fusion between EML4 e1-2 and atypical breakpoint at ALK e19. NCOA4-RET fusion gene involving NCOA4 e1-11 and RET e12-19 were observed in two cases. CUL1-BRAF fusion gene were found in one case, with the BRAF breakpoint located in intron 8, preserving the portion encoding the BRAF kinase domain. Additional RNA NGS was performed in 21 MLH1 me+ CRCs where neither oncogenic gene fusions nor BRAF/ RAS driver mutations were detected by DNA NGS. Gene fusions were identified by RNA NGS in four (4/21, 19%) cases (detailed in Additional file 3: Figure S2 and summarized in Fig. 1B). Among them, two cases presented EML4-NTRK3 fusions, which were formed through reciprocal translocation that joined the e1-2 of EML4 with e14-19 of NTRK3. One case showed MET gene rearrangement involving a novel partner gene SNRNP70, with fusion breakpoints of SNRNP70 e8 to MET e15. In another case, a novel in-frame fusion involving YPEL1 and the extracellular signal-regulated kinase gene MAPK1 was detected. This YPEL1-MAPK1 chimeric transcript contained only part of the MAPK1 C-terminal kinase domain by connecting e1 of YPEL1 to e5 of MAPK1. EML4-NTRK3 fusion was validated by RT-PCR and Sanger sequencing on FFPE samples of two cases. (Additional file 4: Figure S3).

Development of screening strategy for gene fusions in CRC using integrative DNA NGS and RNA NGS
Comparing to DNA NGS alone, additional RNA NGS increased the proportion of detected fusion-positive tumors from 9% (18/193) to 11% (22/193) in dMMR cases, 19% (18/91) to 24% (22/91) in MLH1 me+ dMMR cases, and from 46% (18/39) to 56% (22/39) in MLH1 me+ BRAF/RAS wild-type dMMR cases, respectively. Based on these and our previously published findings, we developed an improved strategy with combined use of DNA NGS and RNA NGS to screen for potentially targetable gene fusions in CRCs (Fig. 3). In the molecular workup for MLH1 me+ dMMR CRCs, when BRAF/KRAS/ NRAS driver mutation testing was performed by DNA NGS, sequential RNA NGS was indicated when no gene fusions were found. Additionally, direct RNA NGS was suggested in BRAF/RAS wild-type cases when PCR assay was performed instead of DNA NGS for BRAF/RAS genotyping.  (Fig. 4).

Discussion
It has been documented in our previous study that oncogenic fusions were significantly enriched in dMMR CRCs harboring hypermethylated MLH1 and wild-type BRAF/ RAS [10]. Herein, we conducted further study using integrative DNA and RNA sequencing, aimed for more accurate and comprehensive characterization of gene fusions in CRCs. We proved that RNA NGS was a valuable addition to DNA NGS for enhancing fusion detection (46-56% in MLH1 me+ BRAF/RAS wild-type dMMR CRCs), as well as identifying novel or atypical fusion types. An optimizing strategy incorporating RNA NGS to screen for oncogenic fusions in CRCs was thus proposed. Next, we presented a detailed analysis of molecular genetic profile and clinicopathological features of fusion-positive dMMR CRCs. All fusions involved RTK-RAS signaling pathway, predominantly RTKs, and were mutually exclusive to other RTK-RAS driver mutations. WNT pathway alterations were also frequently detected. Fusion-positive tumors were typically diagnosed in elder patients, predominantly right-sided, preferentially occurred at hepatic-flexure and showed histologically poor-differentiated components. Considering the distinct advantages over other techniques in gene fusion detection, the latest National Comprehensive Cancer Network guideline for non-small cell lung cancer recommended RNA-based NGS in patients with no identifiable driver oncogenes detected by broad panel DNA NGS [19]. In the present study, we revealed that nearly 20% (n = 4) MLH1 me+ dMMR tumors with neither oncogenic fusions nor BRAF/RAS driver mutations detected by DNA NGS were positive for gene fusions by RNA NGS. In all of these four cases, the genomic breakpoints were located at large introns or intronic repetitive elements, which were typically not sufficiently covered by large hybrid-capture based DNA NGS panel. In our cohort, fusion-positive tumors by integrative DNA and RNA NGS represented 11% of dMMR cases, 24% of MLH1 me+ dMMR cases, and 56% of MLH1 me+ dMMR cases with wild-type BRAF/RAS. These proportions were much higher in comparison to that reported in prior DNA-based large-scale clinical research using MSK-IMPACT assay [9], suggesting that optimizing fusion detection process by incorporating additional RNA NGS was able to achieve a considerably higher yield of gene fusions in CRCs. In addition, RNA NGS successfully identified two potentially actionable kinase fusions (SNRNP70-MET and YPEL1-MAPK1) which have not been reported in CRCs before. Therefore, we suggested the sequentially combined use of DNA NGS and RNA NGS as a highly effective strategy to uncover oncogenic gene fusions in MLH1 me+ CRCs, which were suggested as markers for unfavorable prognosis and targets for personalized therapy [20]. In clinical settings where BRAF/ RAS PCR was applied as an alternative to DNA NGS, direct RNA NGS was recommended in BRAF/RAS wildtype cases for maximized cost-efficiency.
RNA extracted from fresh-frozen (FF) tissue was preferentially used for gene expression study. However, the availability of FF tissue was very limited in clinical practice. FFPE specimens represent more accessible and exploitable sources for molecular studies. Despite that RNA isolated from FFPE samples often suffer degradation and chemical modification due to fixation and   archiving method, recent comparative studies have reported high correlation of RNA NGS detected gene expression profile between paired FFPE and FF samples [21,22]. Notably, artifacts introduced during library preparation and sequence alignment might hamper the reliable prediction of gene fusions by RNA NGS, leading to unaligned or out-of-frame transcripts. In clinical practice, sequential cross-validation using PCR or Sanger sequencing might be considered for RNA-NGS detected novel fusions, especially those with low abundance transcripts and with multiple breakpoints within the same exon of the fusion partner [22]. Aberrant activation of RTK-RAS signaling pathway has been well-recognized as key molecular event in CRC tumorigenesis. Previously, among MLH1 me+ dMMR CRCs, RTK-RAS activation was generally considered to be mediated by BRAF oncogenic mutation, occurring at the early stage of serrated neoplasia pathway [23]. In this and our prior studies [14], we revealed that almost all gene fusions were detected in dMMR CRCs harboring hypermethylated MLH1, which presented as the only RTK-RAS driver alteration in these tumors. It is rational to suggest gene fusions as one major mechanism of RTK-RAS oncogenic activation in MLH1 me+ dMMR CRCs, second only to BRAF mutation. Most of the fusion-positive cases harbored RTK fusions susceptible to tyrosine kinase inhibition therapy. In spite of the rarity, it is worth noting that a minority of fusions involved MAP3K(BRAF) and MAP1K, genes encoding key components of downstream mitogen-activated protein kinase (MAPK) cascade which were essential for intracellular RTK-RAS signal transduction. Due to the potential feedback activation of EGFR [24,25], combination therapy consisting of both EGFR and RAS/RAF inhibitors might be required in these cases [26][27][28].
Despite that dMMR was typically considered as a favorable prognostic marker in CRC patients, oncogenic fusions have been shown to be associated with poorer clinical outcome [29,30]. The detected genetic fusions primarily affected RTKs, and rendered those tumors amenable to FDA approved targeted therapy that might reverse the otherwise poor prognosis. Therefore, efficient identification and detailed characterization of fusion variants is of key clinical significance. In our dMMR CRC cohort, TRK fusions, particularly NTRK1 fusions, were the most frequently detected fusion events. We observed that TPM3 was the most common fusion partner of NTRK1 in CRCs (66%), which was in consistent with previous reports [31,32]. NTRK1-LMNA and NTRK1-PLEKHA6, two other NTRK1 fusion types documented in CRCs before [31], were found to take a lesser proportion in our cases. We did not detect NTRK1 fusions with SCYL3 and TPR, which have been reported rarely before [32]. In previously published reports, NTRK3 fusions were found in only a few CRCs, accounting for two out of 21 fusion events in cases assessed by MSK-IMPACT testing [9], and one out of 16 NTRK fusion events in cases screened by pan-TRK IHC testing [32]. However, it has been implicated that substantial numbers of NTRK3 gene rearrangements occurred at large introns (NTRK3 intron 13 and 14), and might be omitted by DNA NGS alone [7]. Also, large scale clinical researches have documented a lower sensitivity of pan-TRK IHC assay for NTRK3 fusions comparing to NTRK1/2 fusions [33,34]. In the present study, using sequentially combined DNA NGS and RNA NGS, we observed a much higher proportion of NTRK3 fusions in all detected fusion events (5/22). This finding further justified incorporating RNA NGS in clinical practice to more efficiently identify fusion-positive tumors, especially those harboring NTRK3 fusions. Although several rare NTRK3 fusion types were previously identified in CRCs, including KANK1-NTRK3, COX5A-NTRK3 and VPS18-NTRK3 [11,32], here we observed that NTRK3 exclusively formed fusion with its main partner gene ETV6 or EML4. As far as we can see, two of the gene fusions affecting RTKs presented in our cohort were not welldocumented in CRCs previously. An EML4-ALK fusion was found to involve atypical ALK breakpoint within exon 19 that encoded transmembrane domain. ALK rearrangements at exon 19, instead of usual site within intron 19 or exon 20, has only been rarely described in malignant stromal sarcoma [35] and lung adenocarcinoma [36,37] before. Except for a case demonstrating a partial response to targeted therapy [36], reports on clinical implication of this breakpoint were very limited. A MET fusion with novel partner gene SNRNP70 encoding a key component of spliceosome was identified in one case.
Although MET gene copy number gain and protein overexpression were proved to drive CRC tumor malignant progression [38], MET gene fusions have not been noted in CRCs before. Apart from RTKs, gene fusions involving the downstream MAPK cascade were also potentially actionable. Both of the two fusions affecting MAPK cascade detected in our cohort have been rarely reported before. The CUL1(e7)-BRAF(e9) fusion was previously observed in a few cases of melanoma [39] and low-grade serous carcinoma (LGSC) [40], and only once in CRC [9]. Tumor cells harboring CUL1-BRAF fusion have been found to show activation of MAPK signaling pathway and sensitivity to MEK/RAF inhibition. Moreover, complete response We observed that RNF43 was the most frequently mutated one among all genes analyzed in this study. This result strengthened our previous finding that RNF43 inactivation was directly correlated with MLH1 hypermethylation, instead of BRAF mutation status [14]. Nearly 90% of the fusion-positive cases were presented with WNT pathway alterations. Additionally, four out of 12 top recurrently mutated genes (RNF43, APC, FBXW7 and ARID1A) were found to be involved in WNT signaling. It is rational to assume that synergistic cooperation of WNT pathway components might play an important role in tumorigenesis of fusion-positive CRCs. A very recent in vitro study revealed susceptibility to poly (ADPribose) polymerase (PARP) inhibitors in a subset of poor prognostic CRCs with DNA homologous recombination repair (HRR) pathway deficiency [42]. Our data showed that one third of fusion-positive tumors harbored mutations in crucial HRR genes ATM and BRCA2, and lay a rationale for further clinical studies investigating PARP inhibitors as a potential therapeutic option for these tumors.
Based on large sample size and detailed molecular subclassification, we further conducted comparison between fusion-positive and fusion-negative tumors within MLH1 me+ CRCs. Fusion-positive tumors were found to exhibit characteristic clinicopathological features, including old age, preferential hepatic flexure localization and poor differentiation. Typically, dMMR tumors were considered as a relatively homogeneous molecular entity characterized by vulnerability to immunotherapy, which have recently been approved by FDA as first-line treatment for metastatic dMMR CRCs. Our findings highlighted the delicate yet noticeable heterogeneity within dMMR CRCs, and justified more precise molecular subtyping for personalized diagnosis and therapy in CRCs. In addition, a recent study has uncovered the continuum variation of tumor molecular profile along the large intestine, and necessitated more precise classification of CRCs by tumor location [43]. In this study, we not only confirmed that fusion-positive CRCs were primarily right-sided lesions, but also specified that more than half of them were localized at hepatic flexure. In clinical practice, these results implicated that CRC patients with above-mentioned clinicopathological features might be prioritized for molecular assay for gene fusions, including RNA NGS.
In the present study, we found that fusion-positive tumors showed a significantly higher preponderance of hepatic flexure localization. Variations of microbiome, clinicopathological features and molecular profiles have been reported to be associated with primary tumor localization along the large intestine. Several studies have documented the emerging role of gut microbiota in CRC formation and progression [43,44]. However, as far as we know, the microbiome characterization of hepatic flexure has not been well described. The mechanism underlying the preferential localization of fusion-positive in hepatic flexure remained to be further explored.
In summary, our study presented a practical and highly effective screening procedure for genetic fusions through integrated DNA NGS and RNA NGS in a selected subset of dMMR CRCs harboring hypermethylated MLH1. With a detailed description of fusion variants, molecular profile and clinicopathologic features, we further characterized fusion-positive CRCs as a distinctive subtype with key clinical significance.