Sample acquisition
This study, including 99 tumor samples from 68 patients with a diagnosis of synchronous BWT (n = 18 SJCRH, n = 50 COG), was approved by the SJCRH institutional review board (IRB# XPD17-052) and approved by the COG as study AREN18B5-Q. Prior to nucleic acid isolation, sections corresponding to all frozen biospecimens were reviewed by a pathologist and determined to have greater than 50% viable tumor. Adjacent non-diseased kidney tissue was confirmed to contain histologically normal kidney and to be tumor-free. We first performed analysis on the discovery cohort of SJCRH BWT specimens from 18 patients. Using preliminary data from this SJCRH discovery analysis, we applied for additional specimens from the COG to establish an expansion cohort. Of note, after determining that somatic variants were almost never shared among synchronous BWT in the SJCRCH cohort using whole exome sequencing, we switched to whole genome sequencing of the COG samples (n = 50 patients) to determine if variants outside coding regions were shared among synchronous BWT.
Genomic DNA and total-RNA were isolated by the SJCRH Biorepository or the COG Biopathology Center. Qiagen DNA extraction kits were used for DNA isolation and Trizol for RNA isolation. DNA was quantified using PicoGreen and visualized in agarose gel for quality control. RNA was quantified using Qubit fluorometry assay and quality and integrity were evaluated using RNA integrity number (RIN) measurements performed using an Agilent Bioanalyzer System.
Tumor RNA was used for total strand RNA-seq and tumor DNA for methylation analysis using the 850K methylationEPIC beadchip array (Illumina). Tumor DNA with available matched germline DNA from peripheral blood lymphocytes was used for whole exome (SJCRH) or whole genome sequencing (COG). Germline DNA derived from peripheral blood lymphocytes and adjacent histologically non-diseased kidney was used for whole exome (SJCRH) or whole genome sequencing (COG), and methylation analysis. A detailed account of specimens and sequencing or array modalities utilized in this study is shown in Fig. 2.
Clinical Data
Clinically relevant details were obtained from each BWT case including patient age at diagnosis, biological sex, tumor laterality, associated congenital anomalies or syndromes, neoadjuvant chemotherapy received, tumor histology, SIOP (Societe Internationale D'oncologie Pediatrique) post-treatment pathology risk stratification, and presence of nephrogenic rests.25
Whole exome and whole genome sequencing
Whole exome sequencing (SJCRH) or whole genome sequencing (COG) were performed on BWT DNA with available paired germline DNA from peripheral blood (n = 61 patients; n = 87 total tumor samples; Fig. 2). For variant discovery, a paired analysis was performed comparing tumor-derived DNA to germline DNA obtained from peripheral blood leukocytes. Single nucleotide variants, insertion/deletion/frameshift, and noncoding variants calls were made as previously described.26 To analyze the pathways affected by somatic variants in BWT we used the functional annotation tool DAVID to generate a set of enriched pathways combining KEGG, Reactome, and Wikipathways.27 Using whole exome sequencing or whole genome sequencing data, a copy number variant (CNV) analysis was performed using a threshold of CNV \(\ge\) 0.5 or \(\le\) -0.5 for full copy number gain or loss at a given chromosomal locus, respectively. Low-level copy number gain or loss was defined as CNV\(\ge\) 0.1 and \(\le\) 0.5 or CNV \(\le\)-0.1 and \(\ge\)-0.5 respectively. Areas of copy neutral loss of heterozygosity (cnLOH), loss of heterozygosity (LOH) due to copy loss, or copy number gain were ascertained using the CONSERTING algorithm.28
Total-strand RNA-seq
Total-strand RNA-seq was performed on all BWT samples in the study (n = 99). Total RNA-seq library preparation, sequencing, read mapping, and generation of gene level read counts and Fragments per kilobase million (FPKM) values were generated as previously described.26
Methylation analysis
Genomic tumor and germline DNA were bisulfite converted using the EZ DNA Methylation kit (Zymo Research Corp). Converted samples were processed and hybridized to the Infinium MethylationEPIC Beadchip (850K) array (Illumina) according to the manufacturer’s instructions. Raw IDAT files containing summarized information from the beadchip array were pre-processed using subset-quantile within array normalization (SWAN) function as previously described.29 The methylation score of each CpG site in the array is represented as a beta (β) value (methylated signal/methylated + unmethylated signals) and was computed using the R package minifi.30 Methylation M values (log2 ratio of the intensity of methylated signal/unmethylated signal) were also computed using the R package minifi and used for EPIC-based differential methylation analyses including unsupervised hierarchical clustering, TSNE (t-distributed stochastic neighbor embedding), and Spearman correlation matrix analyses.30,31
The imprinting status at the chromosome 11p15.5 locus was determined using methylation data as previously described.32 Briefly, the average β value for H19/ICR1 was calculated using CpG probes located within the chr11:2,019,974-2,024,738 (GRCh38/hg38) range and the average β value for 11p15.5 KCNQ1OT1/ICR2 was calculated using probes located within the chr11:2,721,228-2,722,228 (GRCh38/hg38) range. Samples with average β value H19/ICR1 < 0.7 and KCNQ1OT1/ICR2 > 0.3 were determined to have normal retention of imprinting (ROI), samples with H19/ICR1 > 0.7 (hypermethylation) and KCNQ1OT1/ICR2 > 0.3 were determined to have loss of imprinting (LOI) at H19/ICR1, and samples with H19/ICR1 > 0.7 (hypermethylation) and KCNQ1OT1/ICR2 < 0.3 (hypomethylation) were determined to have loss of heterozygosity (LOH) at 11p15.5. LOH at 11p15.5 was also designated if samples had LOH or partial LOH detectable using the CONSERTING algorithm from whole genome sequencing data.28
Unsupervised hierarchical clustering, TSNE clustering, Spearman Correlation Matrix
Unsupervised hierarchical clustering was performed using methylation M values from the 850K MethylationEPIC beadchip array data set using all samples from the current BWT data set and BWT and unilateral samples from our prior WT xenograft analysis.26,31 The top 10,000 most variable probes in the data set were used for clustering analysis. Probes located on the X and Y chromosomes were excluded to reduce biological sex bias.
Germline genomic analysis
Germline genetic variants were queried for all patients with an available leukocyte-derived DNA sample from peripheral blood (total n = 61; SJCRH n = 11, COG n = 50). Analysis was performed to query for single nucleotide substitution, nonsense, and insertion/deletion variants in 565 previously described cancer-related genes which specifically include the WT predisposition genes DICER1, IGF2, TP53, WT1, ASXL1, BRCA2, CDC73, FBXW7, PIK3CA, BLM, BUB1B, CTR9, DIS3L2, GPC3, KDM3B, NYNRIN, PALB2, REST, TRIP13, TRIM28, and TRIM37.33,34 All insertion/deletion and nonsense variants were included in germline predisposition variant counts. The Clinvar database (https://www.ncbi.nlm.nih.gov/clinvar/) was queried to determine pathogenicity of single nucleotide variants.35 Variants reported as benign in Clinvar were excluded and those reported as pathogenic or probably pathogenic were included in germline predisposition variant counts. Variants of uncertain significance (VUS), reported with no assertion, or unreported variants in Clinvar were further analyzed using the PROVEAN and PolyPhen2 algorithms.36–38 Variants classified as deleterious by PROVEAN, possibly damaging or damaging by PolyPhen2 prediction score, were included in germline predisposition variant counts. Furthermore, unreported variants or VUS were included in germline predisposition variant counts if a significant increase in variant allele frequency in the tumor was identified compared to the germline tissue consistent with retention of mutated allele in the tumor (loss of heterozygosity).
Long-term Survivorship cohort analysis
Blood-derived germline DNA methylation data from the 61 patients in the current study were combined with blood-derived germline DNA methylation data from 282 healthy community controls and 171 long-term Wilms tumor survivors (> 5 years from cancer diagnosis; n = 154 unilateral, n = 17 bilateral) from the St. Jude Lifetime Cohort Study (SJLIFE).39 These data were normalized and processed with the subset-quantile within array normalization (SWAN) method using the R package minifi. The average β values from the 11p15.5 H19/ICR1 and KCNQ1OT1/ICR2 regions defined above were computed as detailed above. Using these data, the relationship between age and methylation at 11p15.5 H19/ICR1 was determined using a linear regression model. The relationship between age and methylation from unilateral WT and BWT samples were compared against the learned model from the healthy community control population.