Study population and data collection
From November 2015, tissue and blood samples were collected prospectively from NB patients undergoing biopsy. Samples from patients who were diagnosed before November 2015 that had been deposited at the Samsung Medical Center Biobank were also included. Medical records regarding age, sex, stage, risk group, pathology, and outcome were collected. Tumour staging was determined by following the International Neuroblastoma Staging System standards [2]. MYCN amplification was determined by performing interphase fluorescence in situ hybridization on tumour tissues. Patients older than 18 months and in stage four malignancy and patients with MYCN-amplified tumours were stratified as high-risk patients.
DNA and RNA extraction
All tumour specimens were reviewed by a pathologist to determine the percentage of viable tumours and their adequacy for sequencing. Genomic DNA from the tissue and blood was extracted using a QIAamp DNA Mini Kit (Qiagen, Valencia, CA, USA). The total RNA from the same fresh frozen tumour tissues was extracted with an RNeasy Mini Kit (Qiagen, Valencia, CA, USA), according to the manufacturer’s instructions. The quality and quantity of extracted nucleic acids were evaluated using Nanodrop 8000 UV-Vis spectrometer (NanoDrop Technologies Inc., Wilmington, DE, USA), Qubit ® 3.0 Fluorometer (Life technologies Inc., Carlsbad, CA, USA) and 4200 TapeStation (Agilent Technologies Inc., Santa Clara, CA, USA). Specimens with a yield over 100 ng were selected for whole exome sequencing (WES) and whole transcriptome sequencing (WTS). Those with a median DNA fragment size of 350 bp and an RNA integrity number (RIN) of 5 were selected.
WES and variant calling
Tumour and matched normal DNA were enriched for exon regions, using the SureSelect XT regent kit (Agilent Technologies Inc., Santa Clara, CA, USA) and SureSelect XT Human All Exon V5 kit (Agilent Technologies Inc., Santa Clara, CA, USA). The libraries were pooled, denatured, and sequenced in 100-bp paired-end mode using the HiSeq Rapid SBS Kit v2 (200 Cycles) and HiSeq® Rapid PE Cluster Kit v2 in Illumina HiSeq 2500 platforms (Illumina Technologies Inc., San Diego, CA, USA). The mean target coverages were 166x in tumours and 104x in normal blood. Reads were aligned to the human reference genome (hg19) using the Burrows-Wheeler Alignment tool (BWA) version 0.7.5a [21]. Sequence Alignment and Mapping (SAM) files were converted to Binary Alignment and Mapping (BAM) files using SAMtools (v0.1.19) [22]. Duplicate reads were removed using Picard (version 1.128), base quality was recalibrated, and local realignment was optimized using The Genome Analysis Toolkit (GATK) version 3.5 [23]. Single nucleotide variants (SNVs) and indels were identified using MuTect2 version 3.8.0 [24], Strelka2 version 2.8.2 [25], and Pindel version 0.2.5b9 [26]. Germline variants were identified using HaplotypeCaller version 3.8.0 [27]. Variants were annotated using Ensembl Variant Effect Predictor (VEP) version 87 [28]. Variants located in exons with sufficient coverage (minimum depth of coverage ≥ 8) and a significant variant allele frequency (VAF ≥ 1%) were chosen for further statistical analyses. Synonymous variants were filtered out. Read alignments were manually examined using Integrative Genomic Viewer (IGV) (http://www.broadinstitute.org/igv/).
WTS and data processing
Sequencing libraries were prepared using TruSeq RNA Sample Preparation kit v2 (Illumina Technologies Inc., San Diego, CA, USA). RNA libraries were sequenced in 100-bp paired-end mode using TruSeq Rapid PE Cluster kit and TruSeq Rapid SBS kit v2 in Illumina HiSeq 2500 (Illumina Technologies Inc., San Diego, CA, USA). Unresolved bases in FASTQ files were trimmed, reads were aligned to the human reference genome, hg19, using TopHat version 2.0.6 [29], and reference-guided assembly of transcripts was performed using Cufflinks version 2.1.1 [30]. Alignment quality was verified with SAMtools version 0.1.19 [22]. Gene expression was estimated from the RNASeq data of 56 patients using a count-based method with RSEM [31]. In total, 20,345 protein-coding genes were selected. Further, genes that were expressed in at least three samples were retained. A total of 16,120 genes were analysed. Gene counts were used as the input for Trimmed Mean of M value (TMM) normalization in the R package, edgeR [32], and normalized counts were transformed to log2-counts per million (logCPM) using the voom application in the R package, limma [33].
Gene fusions were predicted by several algorithms, such as ChimeraScan [34], deFuse [35], FusionMap [36], MapSplice [37], and TopHat [29]. Fusions predicted by more than three algorithms were considered further. The putative fusions were manually investigated using IGV.
Validation of fusions by RT-PCR and Sanger sequencing
The putative gene fusions, detected by RNA-Seq, were verified by reverse transcription PCR (RT-PCR), followed by Sanger sequencing. cDNA was synthesized from 2 µg total RNA using a QuantiTect Reverse Transcription kit (Qiagen Inc., Hilden, Germany) with primers that flank the breakpoint of the fusion, in DNA Engine Tetrad 2 Peltier Thermal Cycler (BIO-RAD, Hercules, CA, USA) with the following cycling conditions: one cycle of 5 minutes at 95 °C, followed by three-step cycles of 30 seconds at 95 °C, 30 seconds at 62 °C, 10 minutes at 72 °C, and a final extension for 20 minutes at 72 °C. PCR products were purified using a Multiscreen filter plate (Millipore Corp., Bedford, MA, USA) and sequenced in an ABI prism 3730XL Analyzer (Thermo Fisher Scientific, Waltham, MA, USA) using a BigDye (R) Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems Inc., Foster City, CA, USA). The results were accessed by Variant Reporter Software v 1.1 (Applied Biosystems Inc., Foster City, CA, USA).
Mutational signatures and tumour mutation burden (TMB)
A set of 30 mutational signatures, which represent distinct characteristics of human cancer types based on base substitutions at the site of mutation, was obtained [16]. To calculate mutational signatures from each sample, deconstructSigs (R package) was used, and the weighted combination of predefined signatures was identified to comprehend the mutational profiles [38].
TMB, defined as the number of somatic variants per megabase (Mb), was calculated by dividing the total number of mutations from WES by the size of the target coding region.
Gene-set enrichment analyses (GSEA)
Gene-set enrichment analyses (GSEA), based on gene expression data for each sample, were performed using R package, GSVA [39] on 17,810 annotated gene sets from the Molecular Signatures Database (MSigDB v6.2, http://software.broadinstitute.org/gsea/msigdb/index.jsp).
Statistical analyses
All statistical tests were performed using R software v.3.4.2 (https://www.r-project.org/). The associations between risk-group and genomic information, including the frequency of mutation, TMB, mutational signature, gene expression, and gene-set expression, were examined using the T-test or Fisher’s exact test. Multiple test correction with false discovery rate (FDR) was applied to the expression data analyses. P-value < 0.05 was considered as significant.