Patients and Treatment
Patients with advanced desmoid tumors, defined as patients with radiographic progression after previous treatment, were eligible for prospective phase II study. Key inclusion criteria were as follows: age ≥ 10 years, Eastern Cooperative Oncology Group performance status of 0 to 2, and adequate hematologic and renal function. Patients were treated with 400 mg of imatinib mesylate (Glima; Boryung Pharmaceutical Co., Ltd, Seoul, Korea) daily until progression or unacceptable toxicity. Toxic effects were graded according to the National Cancer Institute—Common Toxicity Criteria v. 4.03. Disease was assessed every 8 weeks for the initial 32 weeks and then every 16 weeks according to RECIST (response evaluation criteria in solid tumors) v1.1 . Briefly, patients who experienced grade 3/4 toxicity or intolerable grade 2 toxicity stopped treatment and then restarted at a reduced dose (300 mg/day or 200 mg/day). The study was approved by the institutional review board (IRB No. 2013-1417-001).
Retrospective cohort study included 24 additional patients with desmoid tumor and approved by the institutional review board (IRB No. 2020-2244-001). Surgically resected formalin-fixed paraffin-embedded (FFPE) tissue samples obtained prior to radiotherapy or chemotherapy were subjected to transcriptome sequencing (Fig. 1). Of those, 4 cases were treated with imatinib and remaining 20 were treatment naïve.
DNA Extraction and Quality Assessment
Whole genome sequencing was performed using pretreatment tumor excision samples as well as matched blood samples. Briefly, 4-mm-thick sections with a tumor content of ≥80% were obtained, and ≥2 µg of DNA was extracted using the Maxwell 16 FFPE Plus LEV DNA Purification Kit (Promega, Madison, WI, USA). For peripheral blood mononuclear cells (PBMCs), genomic DNA was extracted using the QIAamp DNA Mini Kit (Qiagen, Hilden, Germany) as per the manufacturer’s instructions.
DNA Library Construction and Whole Genome Sequencing
Library preparation was performed using the TruSeq Nano DNA Library Preparation Kit (Illumina, San Diego, CA, USA) following the manufacturer’s instructions. Illumina utilizes a unique "bridged" amplification reaction on the surface of the flow cell. A flow cell containing libraries was prepared using the cBot Fluidics Station and was then loaded into the HiSeq X-10 sequencer (Illumina) for automated cycles of extension and imaging. Sequencing-by-Synthesis cycles were repeated to achieve a paired-end read length of 2 × 150 bp.
RNA Library Construction and Whole Transcriptome Sequencing
Total RNAs were extracted and purified from frozen tumor samples with ReliaPrep FFPE Total RNA Miniprep System (Promega, Inc.) according to the manufacturer's procedures. Amount of RNA and its quality were checked on an Agilent RNA 6000 Nano Kit (Agilent Technologies). For analysis of RNA-sequencing data, we prepared mRNA sequencing libraries as paired-end reads with a length of 100 bases using the SMARTer Stranded Total RNA-seq kit v2-Pico Input Mammalian according to the manufacturers’ protocols. Briefly, mRNA molecules were purified and fragmented from 2 µg of total RNA. The libraries were sequenced as paired end reads (2 x 150 bp) using the NovaSeq 6000 (Illumina).
Whole Genome Data Processing
To process whole genome sequencing data of desmoid tumors, we adopted the Genome Analysis Tool Kit (GATK) v3.7 best practice provided by the Broad Institute . Briefly, we mapped qualified paired-end reads to the human reference genome (hg19) with Burrows-Wheeler Aligner (BWA) 0.7.15 . Subsequently, we filtered PCR duplicates using Picard tools 2.8.2 to remove potential bias that occurred during sequencing processes. Then, we performed recommended procedures, such as local realignment and base quality recalibration to extract analysis-ready reads.
Somatic Variant Detection
MuTect2  of the GATK pipeline with default parameters was used to identify somatic single nucleotide variants (SNVs) and small insertions/deletions (indels). The processed whole-genome-sequencing data for tumor and matched normal samples (PBMCs) were used in BAM format as inputs for Mutect2 . Somatic variants were annotated using ANNOVAR . Some candidate variants were manually inspected using Integrative Genomics Viewer (IGV) . Population-level allele frequencies of candidates were obtained using Genome Aggregation Database (gnomAD) . For two samples with tumor data only, normal sample data for one of the nine other patients was used as matched normal control for variant calling. A sample that was sequenced in the same batch with a read depth of greater than 30 was used. The variants were further filtered using gnomAD to obtain putative somatic mutations in the tumor-only samples.
Scoring Gene-Wise Recurrence of Functional Variants
Our previously developed gene-wise recurrence model was used . Conventionally, mutations are considered recurrent if and only if they occur at the same genomic location across multiple samples. Mutations are considered oncogenic when their recurrence exceeds a certain threshold . However, this definition of recurrence is inappropriate for analyses of noncoding regions owing to their vast size. Thus, we consider mutations recurrent if they occur in functional regions of the same gene, even if they are not recurrent in a site-specific manner. In particular, we focused on mutational events in cis-regulatory regions of a mammalian gene dispersed across a long range in the genome . Genes were defined based on the GENCODE v.19 gene set mapped to GRCh37 .
To identify coding and noncoding mutations with significant functional consequences, deleterious effects of each SNV were predicted using two algorithms, Combined Annotation-Dependent Depletion (CADD)  and Deleterious Annotation of genetic variants using Neural Networks (DANN) . Both models were trained to distinguish benign variants from deleterious variants [27,28]. For multiple mutations in the same gene, the one with the highest score for deleteriousness was selected to represent the functional consequence.
Reference Whole-Genome and Transcriptome Datasets
To characterize the functional effects of NOTCH2 noncoding mutations, a large-scale pan-cancer dataset consisting of somatic variants from whole-genome sequencing data and transcriptome data for tumor and matched normal samples were used. VCF files for somatic variant calling and gene expression matrices containing FPKM-upper quantile values were obtained from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Project.
RNA-sequencing data processing and QC
We generated RNA-sequencing data of 31 desmoid tumor patients. We removed adapter sequences using Cutadpt , and aligned the trimmed reads using STAR  with hg19. Gene expression was quantified using RSEM . Quality control check at pre-alignment step was conducted using FASTQC and at post-alignment step using RSeQC . QC results were visualized with MultiQC . At post-alignment step, we noticed two patients with potential problems in read distribution, and infer experiment criteria. Thus, we excluded those samples from future analysis.
Bioinformatics and Statistical Analyses
The χ2 test was used to assess correlations between marker status and clinical significance. All correlation analysis was conducted using spearman correlation. To assess statistical significance between responders and non-responders of imatinib, we calculated Mann-Whitney U test. All tests were two-sided and P < 0.05 is considered significant. Cleveland, scatter and box plots were generated by using ggplot2 R package and matplotlib python package.
To conduct enrichment analysis, we adopted two approaches. First of all, we identified genes that are significantly correlated with imatinib sensitivity and used those genes as input for EnrichR . As an alternative step, we conducted Gene Set Enrichment Analysis (GSEA) between responders and non-responders using C2, C5, C6 and Hallmark MSigDB gene sets . C5 Notch category was defined as Notch-related terms present in C5 category.
Progression-free survival (PFS) was calculated from start date of imatinib to date of progression or death and progression free rate at 16 weeks (PFR 16) was defined as proportions of patients without progression at 16 weeks, analyzed using the Kaplan–Meier method [SPSS version 18.0 (IBM, Chicago, IL, USA)].