Genetic analysis of sinonasal undifferentiated carcinoma discovers recurrent SWI/SNF alterations and a novel PGAP3-SRPK1 fusion gene

Background Sinonasal Undifferentiated Carcinoma (SNUC) is a rare and aggressive skull base tumor with poor survival and limited treatment options. To date, targeted sequencing studies have identified IDH2 and SMARCB1 as potential driver alterations, but the molecular alterations found in SMARCB1 wild type tumors are unknown. Methods We evaluated survival outcomes in a cohort of 46 SNUC patients treated at an NCI designated cancer center and identify clinical and disease variables associated with survival on Kaplan-Meier and Cox multivariate survival analysis. We performed exome sequencing to characterize a series of SNUC tumors (n = 5) and cell line (MDA8788–6) to identify high confidence mutations, copy number alterations, microsatellite instability, and fusions. Knockdown studies using siRNA were utilized for validation of a novel PGAP3-SRPK1 gene fusion. Results Overall survival analysis revealed no significant difference in outcomes between patients treated with surgery +/− CRT and CRT alone. Tobacco use was the only significant predictor of survival. We also confirmed previously published findings on IDH and SMARC family mutations and identified novel recurrent aberrations in the JAK/STAT and PI3K pathways. We also validated a novel PGAP3-SRPK1 gene fusion in the SNUC cell line, and show that knockdown of the fusion is negatively associated with EGFR, E2F and MYC signaling. Conclusion Collectively, these data demonstrate recurrent alterations in the SWI/SNF family as well as IDH, JAK/STAT, and PI3K pathways and discover a novel fusion gene (PGAP3-SRPK1). These data aim to improve understanding of possible driver mutations and guide future therapeutic strategies for this disease. Supplementary Information The online version contains supplementary material available at 10.1186/s12885-021-08370-x.


Background
Sinonasal Undifferentiated Carcinoma (SNUC) is a highly aggressive disease involving the anterior skull base, nasal cavity and paranasal sinuses. It is a rare tumor, with only a few hundred cases in the literature (1). Patients usually present at an advanced stage, and have poor outcomes (2,3), with two-year overall survival rates as low as 25% in some cohorts (1,(4)(5)(6)(7)(8)(9). Validated prognostic factors are limited to traditional clinical variables (overall stage, high grade, and poor differentiation), and no additional data on possible informative biomarkers is currently in clinical use (10). Current treatment modalities including surgery, radiation, and systemic chemotherapy alone or in combination with radiation (CRT) have poor outcomes and carry signi cant toxicity to patients (11)(12)(13). A recent study reveals improved survival with chemoselection paradigms, with ve-year disease free survival rates of 59% in the total cohort and rates as high as 81% in responders (14). However, despite these promising results, patients who did not show initial response to induction chemotherapy had a 0% and 39% ve-year DSS when treated with CRT and surgery +/-CRT respectively. These results indicate the urgent need for novel therapeutics particularly for this subset of patients with aggressive disease. Importantly, there have been no novel or targeted agents introduced for SNUC treatment since its initial identi cation, which is partially due to a limited investigation into the underlying genetics de ning SNUC pathogenesis.
To date, only a few case reports describing mutations associated with disease pathogenesis have been published. The most commonly reported mutations include IDH2 and SMARCB1 which have been identi ed in small case series via traditional sequencing approaches or targeted sequencing panels (15)(16)(17). There have been additional case reports of potentially actionable mutations in isolated SNUCs including ERBB2 and FGFR1 (18,19), but previous efforts have been limited in their scope of sequencing (4) and currently there have been no comprehensive whole exome or genome sequencing studies performed on SNUCs.
As such, this rare, devastating disease has limited treatment options currently available and characterizing genomic pro les of SNUCs may have signi cant bene t for the future development of rational therapeutic strategies. By understanding the genomic architecture behind this disease process, we may also begin to identify prognostic biomarkers that help identify the patients that fail current treatment paradigms. Here, we provide survival data from 46 patients treated at our tertiary referral center and report the rst whole exome sequences pro ling the mutational landscape of SNUCs.

Patient Population
A single-institution retrospective case series informed by a prospectively maintained database of patients with SNUC was performed. The study was approved by the University of Michigan Institutional Review Board (HUM00080561). Patients with a history of sinonasal undifferentiated carcinoma treated at the University of Michigan were included in the clinical dataset (n = 46). Inclusion criteria for genomic, copy number and transcriptome analysis is as follows: 1) Patients with sinonasal undifferentiated carcinoma as con rmed by our board-certi ed pathologist (J.B.M.); 2) Blocks maintained in the University of Michigan pathology archive; 3) Su cient DNA or RNA yield for next generation sequencing. Additionally, a prospective patient was consented to our University of Michigan IRB-approved MiOTOseq precision medicine program (HUM00085888) as described (20). In total, there were 5 patients who met inclusion criteria for analysis, and demographics are shown in Supplemental Table 1.
Survival Analysis Statistics: Survival was calculated using Kaplan-Meier analysis and outcomes were compared using Log-rank analysis. Multivariate cox regression analysis was performed using Backward Wald method with an inclusion of variables with p-values < 0.1. Statistical analysis was performed using SPSS v26 (IBM, Armonk, NY). Kaplan-Meier curves were created using Prism v8 (Graphpad, San Diego, CA).

Cell Line
The patient derived SNUC cell line, MDA8788-6, was generously provided by MD Anderson. Generation of this cell line was previously described by Takahashi et.al (18). Cells were cultured in a humidi ed incubator at 37 °C with 5% (vol/vol) CO2 in DMEM with 10% FBS, 1X Pen/Strep, 1X NEAA. Cells were genotyped to con rm the STR pro le of the cell line (Supplemental Table 2) as previously described (21).

DNA Isolation
DNA was isolated from formalin xed, para n-embedded (FFPE) samples following the manufacturer's protocol for AllPrep DNA/RNA FFPE kit (Qiagen, Hilden, Germany) as previously described (22,23). Tumor and adjacent normal regions were identi ed on H&E stained slides and aligned to tissue para n blocks.
An 18-gauge sterile needle was used to core 2-4 samples from each region. Depara nization was performed using the xylene/ethanol method with the only modi cation being that samples were digested using proteinase K at 56°C for 20-24 minutes. DNA isolation was then completed using the Allprep Isolation kit (Qiagen, Hilden, DE) following manufacturer protocol. Each sample was analyzed using a Nanodrop spectrophotometer for purity (260:230 and 260:280 ratios) and concentration was determined using 1uL of sample with the Qubit 2.0 Fluorometer and measured with a bioanalyzer as described (24). DNA extraction for MDA8788-6 cell line was performed using Wizard® Genomic DNA Puri cation Kit (Promega, Madison, WI).

DNA Sequencing
Genomic DNA from each tumor and adjacent normal specimen was submitted for sequencing to the University of Michigan's DNA sequencing core for exome sequencing using both the DNA TruSeq Exome Library Preparation kit (Illumina, Catalogue number FC-150-100x; SNUC2, SNUC5, SNUC8, SNUC10) and the Roche NimbleGen V3 capture kit (SNUC1). DNA from the MDA 8788-6 cell line was sequenced as described (25). Libraries were prepared according to the manufacturer's instructions. Libraries were then paired end sequenced to 125 nucleotides as part of pool with an average of 4 samples per lane on an Illumina HiSEQ4000 yielding an average depth of greater than 90x per sample. Exome Variant Calling Quality of the sequencing reads was assessed using FastQC v.0.11.5. Because the reads had adapter contamination as well as a high k-mer content at the start of the reads, trim galore v0.4.4 was used to remove adapters and trim reads. Reads were aligned to the hg19 reference genome using BWA v0.7.1.
Mapping was followed by marking duplicates using PicardTools v1.79. Base quality score recalibration was done using GATK v3.6 and this was the last step in preparing the reads for variant calling. Samtools v1.2 was used to create pileup les for each tumor-normal pair. Varscan v2.4.1 was used to call variants from these mpileup les using the somatic mode of the variant caller. Goldex Helix Varseq v1.4.6 was used to annotate variants. All variants in the introns and intergenic regions were ltered out. Variants with more than 5 reads supporting the alternate allele in the tumor samples were considered as true positives.

Copy Number Analysis
Aberration Detection in Tumour Exome (ADTEx) v.2.0 was used to make copy number estimation calls from the pre-processed tumor-normal BAM les which were also used for variant calling. A state from 0 to 4 was assigned by the software based on its estimated copy number. State 0 corresponds to a homozygous deletion, 1 corresponds to a heterozygous deletion. A normal copy number is denoted by state 2. States 3 and 4 represent a gain and ampli cation respectively.

Microsatellite Instability (MSI) Detection
MSIsensor was used to detect somatic MSI loci from the tumor-normal sample pairs as described previously (26). The software assigns a status to each sample pair based on an instability score calculated based on a threshold of more or less than 3.5% of called microsatellites having alterations. We present this score as well as the overall percentage of microsatellite alterations for each tumor-normal pair.

Sanger Sequencing
Excess DNA from above was used to validate mutation calls in novel genes. Primers were designed using MITprimer3 to amplify a small region surrounding the nominated single nucleotide variants (SNVs) as described in Supplemental Table 3. Polymerase chain reactions (PCRs) were optimized for each primer pair on cell line genomic DNA and then used to amplify the regions from tumor and adjacent normal DNA using Platinum Taq DNA High Fidelity polymerase (ThermoFisher, Waltham, MA). PCRs products were then visualized on an eGel (Invitrogen, Waltham, MA) and puri ed using the a PCR puri cation kit (Qiagen, Valenica, CA) as described (27) and submitted for Sanger sequencing at the University of Michigan's DNA sequencing core. Results were visualized using the LaserGene software suite.

Cell line RNA Sequencing and Analysis
Total RNA from the MDA8788-6 sample underwent standard QC and was submitted for RNA sequencing to the University's DNA sequencing core as previously described (28,29). Brie y, the Illumina Stranded RNAseq kit was used and libraries were sequenced on an Illumina HiSEQ4000 using 75nt paired end approach. Quality of the RNA sequencing reads was determined using FastQC v0.11.5 and we did not identify any quality issues. We then used a two-step alignment protocol of Star v2.5.3a to map the reads and genome index les were rst generated using the reference human genome and annotated transcriptome les. In the second step, we then used the index les to guide read mapping. Samtools v1.9 and Picard v2.4.1 were used to retain only uniquely mapped reads and FPKM was computed using Cu inks v2.2.1 with default parameters, with the exception of modifying "--max-bundle-frags" to 100,000,000. This modi cation was made to avoid raising of the HIDATA ag at loci that have more fragments than the pre-set threshold for every locus. Fusion Gene Analysis FusionCatcher (v1.00) is a software package designed to look for gene fusions, translocations and rearrangement events using paired end RNA-Seq data and was used to identify novel gene fusions in the MDA8788-6 cell line.

Linked Read Sequencing
High molecular weight DNA was isolated from the SNUC cell line by lysing 1.5 million cells overnight at 37° with lysis buffer (10 mM Tris-HCl, 400 mM NaCl, 2 mM EDTA), 10% SDS, and a proteinase K solution (1 mg/mL Proteinase K, 1% SDS, 2 mM EDTA). Following overnight lysis, DNA was salted out of the solution with 5 M NaCl for 1 hour at 4° and precipitated with ice cold ethanol for 5 hours at -20 °C. High molecular weight DNA was eluted in TE buffer; the quality and integrity of the DNA was assessed using the Tapestation Genomic DNA ScreenTape kit (Agilent). The DNA was submitted to the University's DNA sequencing core for 10x based linked read library generation and sequencing on an Illumina NovaSeq6000 with 300nt paired end run. Samples were de-multiplexed and FastQ les with matched index les were generated using Long Ranger Version 2.2.2. Data was visualized using Loupe software package, Version2.1.1 (2.4).

Fusion Gene Knockdown
All siRNA including ON-TARGETplus Non-targeting Control siRNA, ON-TARGETplus GAPDH Control siRNA, and a custom siRNA targeting the PGAP3-SRPK1 fusion site were purchased through Dharmacon (Lafayette, CO). Each siRNA was reconstituted at a concentration of 1 nmol/50 uL in 1X siRNA buffer (DHarmacon, Lafayette, CO). MDA8788-6 cells were plated at a concentration of 250,000 cell per well in 3 mL growth media. The following day all media was removed and cells were starved in 1 mL of serum DMEM for 3 hours. Each siRNA was prepared by adding 400uL of OPTI-MEM (Gibco, Waltham, Massachusetts) with 24uL of siRNA and left to equilibrate for 5 minutes. Separately, 24uL of oligofectamine (Invitrogen, Carlsbad, CA) was added to 96uL of OPTI-MEM. After ve minutes the two mixtures are added together and allowed to equilibrate at room temperature for 20 minutes. Cell were then treated with 250uL of siRNA mixture containing buffer only, Non-targeting siRNA, PGAP3-SRPK1 fusion siRNA, or GAPDH siRNA. After 3 hours 2.5 mL of growth medium was added to each well. The following day cells were harvested in 700uL of QIAzol Lysis Reagent (Qiagen, Valencia, CA) and proceeded directly to RNA extraction or stored at minus 80 °C for future extraction. RNA extraction was performed using RNeasy Mini Kit (Qiagen, Valenica, CA) per manufacturer's instructions. RNA sequencing of the fusion knockdown was also performed as above. Brie y, extracted total RNA from MDA8788-6 NT siRNA and PGAP3-SRPK1 fusion siRNA were submitted to the University's DNA sequencing core and processed as above (Illumina Stranded RNAseq kit was used and libraries were sequenced on an Illumina NovaSEQ6000 using 300 cycle paired end approach).

Survival Analysis
Forty-six patients were included in the survival analysis. The median age at diagnosis 53 years with a range from 19-87 years. Median follow up time was 28 months (range < 1 month -23 years). Two patients (4.3%) were treated with surgery alone, 21 patients (44%) with surgery in addition to adjuvant radiation, chemotherapy, or both (CRT) and 23 patients (49%) with CRT alone. Twenty-ve patients (53%) had persistent or recurrent disease after treatment. The median time to recurrence was 2.8 months. Of these 64% were locoregional recurrence or persistence while 32% were distant failures. Five-year overall survival was 42% (95% CI 27-56%) and 5-year disease speci c survival was 46% (95% CI 30-61%) as seen in Fig. 1a. There was no signi cant difference in 5-year DSS when stratifying by use of surgery +/-CRT and CRT alone (50% [95% CI 27-69%] vs. 49% [95% CI 25-68%], p = 0.85), Fig. 1b. Multivariate analysis was performed using backward Wald cox regression. Variables included in the model were Tstage, nodal disease, tobacco use and treatment type (surgery +-/ CRT vs. CRT alone). Only tobacco use was found to be signi cantly associated with decreased survival (HR 5.1 [95% CI 1.7-15], p = 0.004). We were able to isolate high quality DNA that met our quality control standards for sequencing from 5 retrospective SNUC and matched normal samples that were advanced for full exome sequencing. Within our 5 sample cohort, we had 2 patients that died within two years of diagnosis and 3 patients that survived for more than 5 years after diagnosis (Table 1). Using these tumor and adjacent normal DNA samples, we sequenced exomes to an average depth of 256,352,493 yielding an average of 218,566,135 uniquely mapped reads in each sample, for an average coverage of > 100X per tumor (Supplemental Table 5). Using this data, we rst generated copy number calls for broad genomic regions and assessed the global view of the copy number variation in Circos plots (Fig. 2a, Supplemental Tables 6A-E). These plots demonstrated that 3/5 tumors showed a much higher level of copy number variation as compared to the other samples with frequent high level ampli cations in chromosomes 4, 17, 19 and 22. Genes with copy gain or ampli cations in multiple tumors included: CD300C which plays a role in innate immunity and antigen presentation via MHC class I and is a negative regulator of CD4 and CD8 T cells (31), JAK3, E2F4 and GLI1, which have canonical and non-cannonical roles in tumor cell proliferation respectively (32,33), MDM2 which targets p53 for degradation (34), and MLL2 which may play a role in epithelialmesenchymal transition (EMT) (35). We also identi ed copy gains in ERBB2 in 3/5 samples (Fig. 2b) suggesting a potential role of ERBB2 function in a subset of SNUCs, which has been previously noted (18). Moreover, we also identi ed copy number losses in SMARC family genes (SMARCA1, SMARCA2, SMARCA5, SMARCB1, SMARCC1 and SMARCE1) in numerous samples consistent with previous reports of the recurrence of alterations to this gene family in SNUC (16). Focal ampli cations were identi ed in WEE1, FGFR3 and MAPK15 while focal copy losses were identi ed in FAT1 and SMARCA2 (Fig. 2c).
Analysis of SNV data revealed an average of 23.6 (range of 5-63) high con dence non-synonymous somatic alterations per tumor (Supplemental Fig. 1). Oncoplots for recurrently altered as well as targetable or cancer-relevant genes are shown in Fig. 3a and an extended list of the mutations with functional annotations and pathogenicity scores is shown in (Supplementary Tables 7-9). From this analysis, we identi ed a mutation in IDH2 (Arg172Gly), which is a gene previously associated with SNUC (15,17), as well as new mutation disrupting SMARCAL1 (Thr742Met; Fig. 3b,c) suggesting alternative pathways may be implicated in tumorigenesis in these samples. Further, we identi ed potentially targetable alterations including an ALK Gly872Ser (Fig. 3d). We also identi ed two different ABC transporter genes, ABCA10 (Glu507Ter) and ABCB7 (GLU342GLN), suggesting a general pathway for drug resistance that may explain why some tumors respond poorly to chemotherapy.
Finally, we characterized microsatellite instability in each of the tumors using the MANTIS software package, which assigns an instability score to each tumor/normal pair, and determine that all of the SNUC patient samples were microsatellite stable (Supplemental Table 10). Exome and RNAseq Analysis of SNUC Cell Line (MDA8788-6) Our colleagues recently derived the rst SNUC cell line, MDA8788-6, and we performed exome sequencing on this cell line model with subsequent SNV annotation using our previously established informatics pipelines for head and neck cell lines without available matched normal DNA. This sequencing yielded 225,772,526 reads, of which 99.7% uniquely mapped to the genome yielding an average coverage of > 100X (Supplemental Table 11). As no matched normal was available, joint calling was completed with our UM-SCC cell lines as described (25,28), and ltering to retain only heterozygous calls and remove any SNV previously reported in dbSNP, yielded 563 potential single nucleotide variants, of which 182 were categorized as missense, frameshift or stopgains (Supplemental Table 12). Similar ltering for insertion/deletion calls yielded 779 potential INDELs, of which 39 were categorized as frameshift INDELs (Supplemental Table 13). Among these alterations, we identi ed KMT2B S421P and NOTCH1 T2483 missense mutations, which may have an important role in pathogenesis of this tumor model given the previously established role of these genes in tumorigenesis.
Next, we performed paired end RNA sequencing to discover potential gene fusions in the cell line using the FusionCatcher algorithm. This analysis called 107 potential gene fusions in the cell line to a variety of known oncogenes and previously described fusion genes from other cancers (Supplemental Table 14). Importantly, we were able to validate the presence of a novel fusion between PGAP3 exon 2 and SRPK1 exon 13 (Fig. 4a). The resulting fusion gene creates an in-frame fusion gene predicted to encode a 244 amino acid, with predicted pI of 5.87 and molecular weight of 27.6 kDa protein. The resulting fusion protein is predicted to retain the SRPK1 protein kinase domain, which normally regulates constitutive and alternative pre-mRNA splicing machinery through phosphorylation of SR proteins (36).
To then test the hypothesis that large scale structural rearrangements drove fusion formation, we performed 10X linked read genome sequencing on high molecular weight DNA isolated from the MDA8788-6 cell line. We obtained 848,842,686 total reads and averaged 36.9x coverage, with 96.1% of the DNA molecules of > 20 kb in length and 58% greater than 100 kb (Supplemental Table 15). Long ranger pipeline analysis phased 99.2% of SNPs resulting in a maximum phase block of 47.9 MB (Fig. 4b).
The analysis identi ed 888 large structural variant calls in the genome. Contrary to our hypothesis, there were no structural rearrangements around PGAP3 or SRPK1 identi ed (Fig. 4c-e) suggesting that the fusion is formed at the RNA level, possibly by trans-splicing events. However, we did identify structural alterations involving ZNF546 and AXL (Chr19:40,490,000 and Chr19:41,760,000, 1.27 MB duplication, quality score 1000) as well as CREB3L2 and BRAF (Chr7:137,630,000 and Chr7:140,490,000, 2.86 MB duplication) suggesting a potential role for these oncogenes in pathogenesis of this SNUC model (Supplemental Fig. 2).
Finally, to test for potential functional roles of the PGAP3-SRPK1 fusion, we performed siRNA-mediated knockdown of the fusion transcript and submitted RNA for complete transcriptome sequencing. This data demonstrated that knockdown of PGAP3-SRPK1 fusion results in signi cantly decreased HSDL2, NAGK, and CCND1 RNA expression and a modest increase in LINC01006 and FOXO4 suggesting PGAP3-SRPK-1 may play a role in regulation of these genes (Fig. 5a). In fact, we identi ed 122 genes that were > 2-fold upregulated and 112 genes > 2-fold downregulated in the knockdown relative to control. Further, gene set enrichment analysis of the differentially expressed rank list found enrichment of KOBAYASHI_EGFR_SIGNALING_24HR_DN, HALLMARK_E2F_TARGETS, and HALLMARK_MYC_TARGETS_V1 pathways (Fig. 5b, Supplemental Table 16). Collectively, this RNA sequencing data analysis suggests that the gene fusion has a pivotal role in the hallmark signaling pathways.

Discussion
Survival analysis from our cohort is congruent with previous reports of low survival rates (1, 4-9) and shows little differences in survival when stratifying by treatment modality. In this study, we con rm the presence of previously noted alterations in ERBB2, IDH2, and SMARC family members from initial targeted sequencing studies. Notably, we did not nd a high frequency of SMARCB1 mutations that have been previously identi ed, and only noted one IDH2 mutation in our samples, suggesting genetic diversity for SNUC etiology. However, our data, identi ed not only copy number alterations in SMARCB1, but also to additional SMARC family members, and suggest that deregulation of the SWI/SNF nucleosome remodeling complex (consisting of known tumor suppressors SMARCB1, SMARCA4, PBRM1, ARID1B, and ARID2), through one of its many components, is a critical step in disease progression in SNUCs (37). SMARCB1 has been implicated in numerous other solid cancers as a tumor suppressor gene, including sarcomas, carcinomas and rhabdoid tumors of varying sites (38). It appears to have tumor suppressor functions in inhibiting cell cycle and proliferation via the p16-Rb-E2F and Wnt/Beta-catenin pathways, among others (38). SMARCA2 function and expression may also play a critical role in response to speci c targeted therapies (particularly with EZH2 inhibition) in tumors with SWI/SNF dysregulation (39), suggesting a potential role for EZH2 inhibitors in SNUCs.
Previous isolated reports of SNUCs have identi ed overexpression or ampli cation of growth factor receptors (18,19,40) and in this study, we have identi ed genetic alterations in ERBB2 and FGFR family growth factor receptor genes as well as ALK, suggesting potential targetable option in SNUCs. In a previous study of a SNUC cell line, high ERBB2 expression was identi ed with a notable response to ERBB2 inhibition (18). FGFR3 alterations have been implicated in head and neck cancer and in vitro and in vivo studies suggest a promising role for FGF inhibition in head and neck tumors (41)(42)(43). Given the diverse, but potentially actionable set of alterations that our data de ned, these results suggest a role for in depth molecular analysis of this rare disease in order to gain insight into molecular alterations that may drive discovery of future therapeutics, and potentially guide individual patient treatment options.
Finally, this study identi es a novel fusion of PGAP3-SRPK. SRPK1 has been previously characterized to drive cell proliferation, migration, and invasion in colorectal and gastric cancers (44)(45)(46)(47) suggesting that the fusion protein may have oncogenic consequences in the SNUC cell line. CNV analysis additionally revealed copy gain in one tumor in the SPRK1 gene. Knockdown of the PGAP3-SRPK1 fusion gene resulted in changes in expression of CCND1, FOXO4 and most signi cantly a decrease in HSDL2 and NAGK suggesting a functional role for this novel fusion gene. Unfortunately, insu cient RNA prevented evaluation of SNUC tissues for presence of the fusion.

Conclusion
Given the rarity of this tumor, it will be challenging to characterize a large cohort of patients. Nevertheless, we believe this initial analysis of ve SNUCs represents a valuable preliminary guideline of the mutational landscape of SNUCs and identi es multiple recurring mutations and pathway alterations. These may be of particular interest both as prognostic biomarkers in larger cohort studies, and as potentially targetable therapeutic options. Consequently, the alterations identi ed here represent promising targets for future SNUC studies and support a potential pathogenic role in other cancers. Due to both the infrequency and