RNA-Seq Reveals Allele-Specific Expression of Somatic Mutations in Neuroblastoma


 Neuroblastoma (NB) is one of the most common solid tumors in children, accounting for approximately 8% of all pediatric malignancies and 15% of childhood cancer deaths. Somatic mutations in several genes such as ALK have been associated with neuroblastoma progression and can facilitate the discovery of novel therapeutic strategies. However, differential expression of mutated and wild-type alleles on the transcriptome level is not well studied. In this study, we analyzed 219 whole-exome sequencing datasets (with somatic mutations detected by MuTect from paired normal and tumor samples) and prioritized mutations in eight candidate genes as potential driver mutations. Meanwhile, we analyzed 127 RNA-seq samples (of which 85 also had DNA-seq data available) for allele-specific expression levels of each mutation. Our integrated analysis of somatic mutations and allele-specific expression levels confirmed the presence of allele-specific expression of somatic mutations in neuroblastoma including MYCN, ALK and PTPN22. The allele-specific expression of mutations suggests that the same somatic mutation may have different effects on clinical outcomes of tumors. Our study also suggests possible involvement of ZNF44 as a candidate driver gene for neuroblastoma. In summary, this study demonstrates the value of examining allele-specific expression levels of somatic mutations through the analysis of RNA-Seq data to assess the effects of somatic mutations in different patients. Improved understanding of allele-specific expression of somatic mutations can facilitate development of personalized treatment for neuroblastoma in precision medicine.


Introduction
With the development of precision medicine, discriminating genomic factors have gained powerful prognostic and therapeutic implications. Nowadays, numerous gene mutations, including somatic and germline mutations, are identi ed using whole-exome, genome or transcriptome sequencing and have both improved our understanding of carcinogenesis and in uenced the development of treatment plans for cancers including neuroblastoma. Neuroblastoma (NB) is one of the most common solid tumors in children, accounting for approximately 8% of all pediatric malignancies and 15% of childhood cancer deaths 1 . Although genomic studies have revealed speci c biological features of neuroblastoma, the genes mutated in high-risk NB and their complicated molecular pathways remain unclear 2 .
Whole exome sequencing (WXS, also known as WES) is a genomic technique that is gradually being optimized to identify mutations in increasing proportions of the protein-coding regions of genes 3 . It is now routinely used and has revealed some rare and common gene variants in NB 4 . The high-level ampli cation of MYCN on chromosome 2p24 was found in NB previously 5 . It occurs in approximately 20% of NB patients, and indicates aggressive disease progress and a poor prognosis. Inhibitors that downregulate MYCN/MYC proteins can suppress NB tumor growth 6 . Mutations in the tyrosine kinase domain of anaplastic lymphoma kinase (ALK) have been identi ed in NB as well 7 . A series of ALK tyrosine kinase inhibitors (TKIs) have been approved for use in ALK-driven cancers including NB 8,9 . ALK proteins can mediate different signaling outputs due to various properties such as subcellular localization and protein stability 10 . Although most ALK-driven tumors dramatically respond to ALK-TKIs, most patients eventually develop drug resistance 11 . However, the details of the resistance mechanism are not very clear. One thing for sure is that this phenomenon is not entirely due to the status of DNA mutation. That is to say, individual patients with the same somatic mutation can respond differently to targeted therapy. One of the potential reasons for variation in treatment response is the allele-speci c expression of somatic mutations. Patients carrying the same mutations may express the mutated alleles at different levels. Therefore, it is essential to investigate the expression levels of speci c alleles at the mRNA level. This is because proteins are made from mRNAs. Mutations in the DNA may result in very different levels of allele expression patterns, ranging from no expression to dominance of one allele or the other. Whole transcriptome sequencing (WTS, also known as RNA-sequencing or RNA-seq) is an emerging tool for pro ling gene transcription and has received wide adoption in cancer genetics, with signi cant prognostic and therapeutic implications 12 . Therefore, we conducted this study to explore allelic expression of somatic mutations in NB. The mutations of neuroblastoma identi ed through DNA-seq and RNA-seq were compared to identify novel mutations and their allelic expression patterns, enabling us to propose new risk factors and potential therapeutic targets for NB. for which the maximum frequency in the population is >0.001 in gnomAD 15 . We also ltered out those variants which had ags such as 'alt_allele_in_normal', 'panel_of_normals', or 'germline_risk' in MuTect2 13 , and required the remaining variants to have 'PASS' ags in more than one of the 219 WXS samples. Ultimately, using these criteria we obtained 9 variants of 8 genes.

Methods
Variant analysis for RNA-seq samples As shown in Fig 1b, we wrote a python pipeline to call variants from 127 RNA-seq samples with BAM les as input. We rst used picard 16 to mark duplicates and then used GATK 17 to split Reads with N in Cigar, generated and applied recalibration table for Base Quality Score Recalibration (BQSR), and used HaplotypeCaller to call variants. After that, variants were ltered using GATK 17 with the options '-window 35 -cluster 3 -lterName Filter -lter "QD < 2.0" -lterName Filter -lter "FS > 30.0"'. The ltered variants were annotated with ANNOVAR 14 for further analysis. Similarly, we ltered out non-exonic or synomymous SNV variants with >0.001 maximum frequency in population.
Additional methods are needed, for example, how to assess the allele-speci c or allele-imbalanced gene expression levels, how to examine the alignment les to manually examine candidate locations, where to get the GTeX data, where to get the cbioportal data for comparative analysis, etc.

Clinical characteristics
We extracted 219 WXS-identi ed samples and 127 WTS-identi ed samples from the TARGET database.
There were 134 independent NB patients in the WXS-identi ed group and 42 independent individuals in WTS-identi ed group, with 85 patients for which we were able to obtain both WTS and WXS data. As shown in Table 1, clinical characteristics such as gender, race and ploidy were similarly distributed in each group. However, MYCN status, COG risk, stage, and age at diagnosis were quite differently distributed. All 85 patients for whom we obtained both WTS and WXS results had stage 4 disease classi ed as high-risk, and were older than 18 months at diagnosis.  13 . The purpose was to identify a list of potential disease-predisposing variants and genes. We focused on the list of non-synonymous SNVs and indels in exonic regions with a maximum frequency in the population of < = 0.001 in gnomAD 15 , since these variants might be more interpretable and perhaps more likely to be disease-associated. Ultimately, we identi ed nine variants in eight candidate genes: RIMS4, RUSC2, ALK, MYCN, PTPN11, ALOX12B, ZNF44 and CNGB1. Among them, one variant of RIMS4 mutated at Chr 20:44758168/C > G was shared by 19 patients; one variant of RUSC2 mutated at Chr 9:35560530/A > G was shared by 8 patients; two ALK variants mutated at Chr 9:29209798/C > T and Chr 9:29220829/G > T were shared by 6 and 7 patients, respectively; and the remaining variants of MYCN, PTPN11, ALOX12B, ZNF44, and CNGB1 were shared by 2 to 3 patients each ( Table 2). We further assessed their allelic expression levels in NB with RNA-seq analysis.

Rna Sequencing Analysis Of Candidate Gene Allelic Expression
After these candidate genes were found by whole exome sequencing, we further used RNA-seq to check their expression levels in the patients for which we had RNA-seq results, and to check whether other nonoverlapping samples had somatic mutations from the RNA-seq data. Somatic mutations detected in WXS could be expressed at various levels and have different effects on cellular function. Therefore, we further investigated the allelic expression of those somatic mutations by analyzing a cohort of 127 NB patients with RNA-seq data including 85 patients for whom we also have WXS data. We used GATK to detect variants, and then ran ANNOVAR on mutations from the RNA-Seq VCF les to prioritize variants that occurred in the 8 genes listed in Table 2, yielding the results = shown in Table 3. It can be seen that MYCN and ALK, two well-known NB genes, had more variants in RNA-seq samples, as expected. Among them, the ALK mutation at chr2:29209798/ C > T was predominant (7/127, 5.5%). In addition to the 7 samples with chr2:29209798/C > T ( Fig. 2a and Suppl Fig. 4), there were 2 samples with chr2:29220829/G > T (Fig. 2b). MYCN has a variant at chr2:15942195 which occurred in 3 WXS samples and 2 RNA-seq samples (Suppl Fig. 5). In particular, the two RNA-seq samples both had higher levels of variant alleles than normal alleles (~ 3 fold for one sample, and ~ 1.5 fold for the other sample), as shown in Table 3, indicating allele-speci c expression of these variants. Besides MYCN and ALK, RUSC2 (RUN and SH3 Domain Containing 2), CNGB1 (Cyclic Nucleotide Gated Channel Subunit Beta 1) and ZNF44 (Zinc Finger Protein 44) also had variants that met our criteria (AF_popmax < 0.001, exonic, not synonymous_SNV). However, besides those variants which also occurred in WXS, all other variants only occurred once in the RNA-seq data (Table 3). Interestingly, in one patient (TARGET-30-PATGJU) who harbored two different RUSC2 variants, the allelic expression of normal vs. mutated RUSC2 at chr 9:35547078/G > C was 28 vs. 3, which is quite different from another SNV at chr 9: 35547698/A > T with a ratio of 4 vs. 100. Therefore, the allele-speci c expression of mutated SNVs cannot be explained by the ratio of normal to tumor cells. Besides, among these mutated sites, we observed two novel variants of ZNF44 which were not reported previously as associated with NB. Importantly, the prevalence of the ZNF44 mutated allele at chr19: 12,273,632/ C > CA (Fig. 3) was much higher (8/127, 6.3%) than other sites. After analysis of clinical data, the average of tumor percentage in all samples was 80%, ranging from 60-90%, and the average of stroma ratio was approximately 20%, ranging from 10-40%. Thus, the change of ZNF44 expression was comparable and reliable across patients. As shown in Table 3

Discussion
Neuroblastoma is a solid tumor that can develop from immature nerve cells in several areas of the body.
It most commonly affects children and rarely occurs in adults 1 . In this study, we analyzed both WXS and RNA-seq data from neuroblastoma patients to identify somatic mutations and their allele-speci c expression. Because the majority of somatic mutations are identi ed from DNA-seq techniques such as whole exome sequencing, the allelic expression of those mutations is often not known. Proteins, the functional units of a live cell, are made from mRNA, so a somatic mutation may have very different effects on cellular function that vary with its allelic expression pro le. In our study, we con rmed multiple known neuroblastoma mutations and also identi ed ZNF44 (zinc nger protein 44) as a potentially actionable somatic mutation.
Our study explored two cohorts of neuroblastoma patients with either WXS or WTS data available. The overlapping rates of the two cohorts were high, 38.8% (85/219) of WXS group and 66.9% (85/127) of WTS group. It has been reported that, using WXS identi cation, mutation frequencies of somatic genes including ALK (9.2% of cases), PTPN11 (2.9%), ATRX (2.5%), MYCN (1.7%), and NRAS (0.83%) are signi cant in neuroblastoma 18 . As expected, WXS revealed mutations in some of these genes, including ALK, MYCN and PTPN11. The MYCN mutation rate (1.4%, 3/219) was similar to a previous report. However, ALK (3.2%, 7/219) and PTPN11 (1%, 2/219) mutation rates were lower than previously reported 19 . Notably, in our study, RUSC2 presented the second highest mutation frequency (3.7%, 8/219) in analysis of WXS, and was identi ed by WTS as well. RUSC2 is a protein-coding gene which has mutations associated with mental retardation and microcephaly. CNGB1 and ZNF44 variants were revealed by both WXS and WTS as well. Although CNGB1 and ZNF44 are not commonly associated with neuroblastoma, they both presented similar mutation frequencies (1%, 2/219) to that of MYCN in the WXS analyzed in this study. ZNF44 encodes a zinc nger protein also known as gonadotropin-inducible transcription factor (GIOT-2). ZNF44 is expressed in human organs and tissues at various levels (Suppl Fig. 1). Mutations in ZNF44 have been detected cancers of the uterus, stomach, ovaries, and more (Suppl Fig. 2). However, mutations of this gene have not been reported in neuroendocrine tumors before; to the best of our knowledge, it is the rst report of ZNF44 mutation in neuroblastoma (Suppl Fig. 3). ZNF44 is reported to be involved in epilepsy susceptibility, and binds a factor which is abundant in developing nervous tissue 20 . Thus, ZNF44 may play an as-yet-undetected role in neuroendocrine tumors like neuroblastoma. In our study, ZNF44 mutations were discovered in neuroblastoma by both WXS and RNA-sEq. Speci cally in the analyzed RNA-seq data, the prevalence of ZNF44 variant chr19: 12,273,632/ C > CA, was 6.3%. Given that the average tumor and stroma percentages in samples were highly consistent at 80%/20%, respectively, the ZNF44 normal allele was expressed an average of 3.0 fold higher than the mutated allele, suggesting the ZNF44 variant might be a potential disease-related site.
As expected, there were some differences between WTS and WXS analysis. WXS identi ed a gene RIMS4 (Regulating Synaptic Membrane Exocytosis 4) with a high mutation frequency (8.7%, 19/219), but this was not detected by WTS. Notably, more variants in known disease-related genes ALK (Suppl Fig. 4) and MYCN (Suppl Fig. 5) were detected in the WTS group, but far fewer were found by WXS, suggesting WTS provided more information on gene variation. It can be understood that WXS and WTS are both useful bio-technological methods with their own advantages. WXS is considered the gold standard method and is routinely used in oncology 4 . However, it cannot re ect gene expression levels. WTS has been hailed as a promising approach that presents distinct advantages, especially for determining transcriptome characteristics 21 . However, WTS is not suitable for discovery of DNA mutations. Thus, the combination of WXS and WTS can provide complementary perspectives on gene mutations. In our study, an interesting nding from RNA-seq analysis was that, in one patient sample harboring two different RUSC2 variants, the allelic expression levels of the normal vs. mutated SNVs were quite different. Although the samples were from the same patient, the results were completely opposite, suggesting that the allelic expression levels of the two SNVs were not due to different ratios of normal and tumor cells in the sample, but due to allele-speci c expression. Proteins play their bio-functional roles via both biological structure and expression levels. Therefore, both the mutation locations and their allelic expression levels may be related to response to targeted therapy. However, the limitations of our study include the small sample size, limited clinical information, lack of original raw FASTQ les to con rm indel alignment errors, and lack of original samples for clinical validation. A further, well-designed study with a larger number of samples and clinical details is planned for the future.
In summary, to date few studies have explored gene mutations in neuroblastoma using both WXS and WTS. Our study revealed that these two methods present different perspectives and meaningful results. Speci cally, we found that allele-speci c expression assessed by RNA-seq can be quite different even for the same gene mutations, which underscores the importance of WTS in cancer research. Furthermore, we identi ed gene mutations through both methods, validating some well-known NB genes like MYCN and ALK, but also discovering novel candidate genes such as ZNF44. Importantly, mutations identi ed in this study in genes such as ZNF44, which have not previously been reported in neuroblastoma, may provide new opportunities for diagnosis and treatment that are worthy of further investigation.