Patient Identification
A total of 4,581 cases diagnosed as RCC who underwent surgery for the kidney at our center between 2009 and 2019 were reviewed by two experienced uropathologists (Ni Chen and Mengni Zhang). Among them, 1,006 suspicious non-clear cell RCC cases were identified via morphological evaluation and were selected for further TFE3 immunohistochemistry (IHC). As a result, 68 TFE3 positive cases were confirmed as TFE3-tRCC by break-apart FISH assay (Figure S1). Untreated primary tumor tissues and adjacent normal samples were collected. RNA-seq and WES of formalin-fixed paraffin-embedded (FFPE) tissues were subsequently performed. RNA-seq was performed on 63 tumors and adjacent normal tissues. WES was performed on 53 FFPE tumor tissues and matched adjacent normal/blood samples. The study was conducted in accordance with the Declaration of Helsinki and was approved by the Ethics Committee of West China Hospital of Sichuan University. All patients or family members provided written consent for genetic analysis.
Clinicopathological characteristics and outcomes
Clinicopathological data were retrospectively collected, including age, gender, tumor size, TNM stage, morphological features, international society of urological pathology (ISUP) grade and systemic treatment strategies. For patients with localized disease, regular evaluations were performed every 6-12 months post-surgery. For those with metastatic disease, regular evaluations were carried out every 4-6 weeks after receiving systematic therapy. For patients receiving systemic treatments, the first-line progression-free survival (PFS) was defined as the time from receiving first-line systemic treatment to disease progression or death. Tumor response was defined by Response Evaluation Criteria in Solid Tumors (RECIST) version 1.1 31. Overall survival (OS) was defined as the time from surgery to the date of death.
Immunohistochemistry (IHC) and multiple immunofluorescence
IHC and multiple immunofluorescence were performed as previously described32. Commercially available primary anti-TFE3 (clone MRQ-37, MXB biotechnologies, Fujian, China), CD8 (clone C8/144B, Dako, Copenhagen, DEN) and PD-L1 (clone 22C3, Dako) were used in this study. Multiplex immunofluorescence staining was performed with an PANO 7-plex kit (0004100100, Panovue, Beijing, CHN). CD8, macrophage and NK cells expression were quantified as positive cell density (cell number per mm2). PD-L1 expression was assessed by tumor proportion score, which was defined as the percentage of tumor cells with membranous PD-L1 staining. PD-L1 expression > 1% was defined as positivity.
Fluorescence in situ hybridization (FISH) assay
FFPE tissue sections were examined by using interphase FISH to investigate the rearrangement of the TFE3 region with an LSI dual-color break-apart probe (GSP TFE3, Anbiping company, Guangzhou, China). TFE3 gene rearrangement would result in break-apart of the normal fused green-red signals, resulting in one green/one red break apart signal pattern (male), or one green/one red break apart signal and one remaining normal fused green-red signal pattern (female) in a normal cell. One hundred interphase nuclei were evaluated. The cut-off value was 10% as previously described33.
Fusion validation by RT-PCR
cDNA was synthesized using PrimeScriptTM RT Master Mix (RR036A, Takara, Shiga, Japan) according to the manufacture’s instruction and then subjected to PCR reactions. Primer sequences for fusion validations are listed below. ZC3H4 exon 14 primer 5’- CGGTGTCCCTGACTTCCTG-3’ (forward primer) and TFE3 exon 7 primer 5’-GCCTTTGCCTCGGTCT-3’ (reverse primer) were used to detect ZC3H4-TFE3. FUBP1 exon 15 primer 5’-ACCAGGCCCGGCTCCTCA-3’ (forward primer) and TFE3 exon 3 primer 5’-GCCTGTTCCCGACGCTC-3’ (reverse primer) were used to detect FUBP1-TFE3. SETD1B exon 4 primer 5’- GCGTGGGGCGAGTCTCAT-3’ (forward primer) and TFE3 exon 4 primer 5’- GGACCCGATGGTGAGCAGC-3’ (reverse primer) were used to detect SETD1B-TFE3. PCR was performed with the following thermal cycling conditions: a 98°C for 10 s; 35 cycles of 95°C for 10 s, 52°C for 30 s and 72°C for 1 min; and a 72°C for 5 min. RCP products were separated by electrophoresis in agarose gels, purified with ChargeSwitch™ PCR Clean-Up Kit (CS12000, Invitrogen, Oberhausen, GER) and then sequenced by ABI 3730XL automatic sequencer (Life Technologies).
Total RNA isolation, library preparation and sequencing
Total RNA was isolated from each sample (63 tumor samples and 24 paired adjacent normal samples) using the Qiagen RNeasy formalin-fixed paraffin-embedded (FFPE) Kit (73504, Qiagen, Hilden, Germany), following the protocol from the manufacturer. Purity and quantity of total RNA were measured by Nanodrop. Integrity of RNA was evaluated using the RNA Nano6000 Assay Kit on the Bioanalyzer 2100 system (Agilent Technologies, CA, USA). 1μg RNA of per sample was used as input for the RNA sample preparations. Strand-specific RNA sequencing libraries were generated using the Whole RNA-seq Lib Prep kit for Illumina (RK20303, ABclonal, Shanghai, China). Library quality was evaluated on the Agilent Bioanalyzer 2100 system (Agilent, USA). Final libraries were sequenced at the Novogene Bioinformatics Institute (Beijing, China) on an Illumina Hiseq X10 platform by 150bp paired-end reads.
Gene expression quantification and Fusion Detection
The raw RNA-sequencing reads were filtered by FastQC, Reads were aligned using STAR v2.7.0f34 with default parameters to the Ensembl human genome assembly GRCh37. Gene expression levels were estimated by raw count and Transcripts Per Kilobase Million (TPM). Annotations of mRNA in the human genome were retrieved from the GENCODE (v19) database. Paired trimmed/clipped and de-duplicated RNAseq reads were used to identify gene fusion events, and the aligned output was used as input to STAR-Fusion v1.9.1 using the developer-supplied gencode v33 CTAT library from April 6, 2020. Fusion gene supported by at least two reads were selected.
Analysis of Differentially Expressed Genes (DEG) and enrichment of signaling pathways
DEGs were determined using the R package “limma” with cutoff p-value < 0.0535. Up-regulated genes and down-regulated genes were used to perform ontology and pathway enrichment analysis based on Gene Ontology and KEGG databases using R package “ClusterProfiler” 36.
Gene Set Enrichment Analysis
DEG analysis results were used in GSEA analyses. For gene set analysis, hallmark genesets from Msigdb database 37 were collected. The top 1,500 DEGs were chosen by p-value ranked by log2 fold change, and then fed into “fgsea” R packages 38.
Non-negative Matrix Factorization (NMF) Clustering
We selected 1,500 genes with the highest variability across tumors, using Median Absolute Deviation (MAD) analysis. Subclasses were then computed after reducing the dimensionality of the expression data from thousands of genes to a few metagenes using consensus NMF clustering. This method computes multiple k-factor factorization decompositions of the expression matrix, and evaluates the stability of the clustering result using the cophenetic coefficient index. The robust consensus NMF clustering of 54 patient tumors using the top 1500 variable genes, and k=5 was identified through testing k=2 to k=10.
Gene signatures
Marker genes for immune cell types were obtained from Bindea et al39. As previously published by Şenbabaoğlu et al40. The T cell infiltration score (TIS) was defined as the mean of the standardized values for nine T cell subtypes, and the immune infiltration score (IIS) for a sample was similarly defined as the mean of the standardized values for innate and adaptive immune scores. A previously published angiogenesis signature was utilized to measure an angiogenesis score41. And epithelial–mesenchymal transition (EMT) score was calculated following the method mentioned in Gibbons et al.42.
Implementation of single-sample gene set enrichment analysis (ssGSEA)
ssGSEA was used for quantifying immune infiltration and activity in tumors using bulk RNAseq data43. The ssGSEA method is an extension of the GSEA44 method, which works at single sample level rather than a sample population. Normalized RNA-Seq data was used as input without further processing (i.e. no standardization or log transformation).
Quantitative Set Analysis for Gene Expression (QuSAGE)
To reveal latent biological pathways of NMF clustering, we conducted QuSAGE (R qusage v2.20.0) to compare each cluster to all others, leveraging MSigDb hallmark gene sets to identify enriched pathways within each cluster.
DNA extraction
Representative sections of formalin-fixed paraffin-embedded (FFPE) tumor (n=53) and matched normal tissues (n=42) or blood samples (n=11) were collected. Tumor sections were reviewed by a pathologist to ensure tumor sections with > 70% tumors and < 10% necrosis. High-quality genomic DNA was extracted by using the GeneRead DNA FFPE Kit (180134, QIAGEN, Hilden, GER) according to the manufacturer’s instructions. Germline DNA (gDNA) was extracted from white blood cells using the Blood Genomic DNA Mini Kit (CW2087, Cwbiotech, Beijing, China).
Whole-exome sequencing
WES was used to determine the mutational landscape of TFE3-tRCC. Exome capture was performed using the Agilent SureSelect Human All ExonV5 kit (Technologies, CA, USA) according to the manufacturer’s instructions. This was followed by paired-end sequencing using Illumina Novaseq6000 sequencer (Illumine Inc, CA, USA). Mean clean sequence data obtained was 38.30 Gb for tumor samples and 14.95 Gb for normal tissue/blood samples. Details of somatic and germline mutation analysis are described below.
Read alignment and BAM file generation
Clean reads were aligned to the reference human genome hg19 (Genome Reference Consortium GRCh37) using the BWA v0.7.17 (Burrows-Wheeler Aligner) MEM algorithm with default parameters. BAM was coordinate sorted and PCR duplicates were removed with Sambamba version 0.6.8.
Post-alignment optimization
After the initial alignment of WES data, we followed GATK v3.8 Best Practice to process all BAMs from the same patient together for a post-alignment optimization process called “co-cleaning” which includes GATK IndelRealigner and BaseQualityScoreRecalibration (BQSR). IndelRealigner performed local realignment to further improve mapping quality across all reads at loci close to indels, and BQSR detected and fixed systematic errors made by the sequencer when it estimated the quality score of each base call 45, 46, 47.
Somatic mutations analysis
The GATK MuTect2 pipeline was run for paired tumor-normal somatic mutation calling with gnomAD database and a panel of normal made from all normal samples to filter common germline mutations and recurrent technical artifacts. The resulting VCFs were filtered by Mutect2 FilterMutectCalls module, variants outside of the capture kit were removed, and FilterByOrientationBias module was used to filter out false-positive calls from OxoG and FFPE. The resulting somatic SNVs and indels were further filtered according to the flowing criteria 48, 49, 50, 51: read depth ≥ 10 in both tumor and normal samples, mapping quality ≥ 40 and base quality ≥ 20, variants allele frequency (VAF) ≥ 5% and supporting reads ≥ 5 in tumor, VAF in tumor was ≥ 5 times than that of the matched normal VAF. Variants were annotated with Oncotator v1.9.9.0. To further avoid miscalling germline variants at least 19 read depth in the normal sample in dbSNP sites. Variants were excluded with minor allele frequency (MAF) > 0.01 in public databases, including 1000G (May 2013), Exome Sequencing Project (ESP6500) and Exome Aggregation Consortium (ExAC version 0.3.1). Variants were filtered which annotated as 3'UTR, 5'UTR, 3'Flank, 5'Flank, IGR, Intron, lincRNA, Silent, RNA.
Somatic mutation signature profiling
The R package SomaticSignatures 52 v2.18.0 was used to extract the somatic motifs of these samples. In brief, the somatic motifs for each variant were retrieved from the reference sequence and converted into a matrix. Non-negative Matrix Factorization (NMF) was used to estimate the somatic signature and then plotted. The set of COSMIC signatures [cancer.sanger.ac.uk/cosmic/signatures] active in each sample were identified.
Somatic copy number alterations (SCNA) analysis
SCNA analysis was performed using CNVKit v0.9.6 53 with default parameters. To identify significantly recurrent SCNAs, the GISTIC2 v2.0.23 54, which considers both the frequency and amplitude of every SCNA, was employed with the following modified parameters “-smallmem 1 -broad 1 -brlen 0.7 -cap 1.5 -conf 0.99 -ta 0.2 -td 0.25 - armpeel 1 -genegistic 1 -savegene 1 -gcm extreme -js 4 -maxseg 2000 -qvt 0.25 -rx 0”.
Copy Number (CN) gains were defined as alterations showing 0.1< log2 CN ratio < 0.7 and CN losses were defined as alterations showing -1.1< log2 CN ratio < -0.1. Amplifications are defined by log2 CN ratio ≥ 0.7. Deletions are defined by log2 CN ratios ≤ -1.1. Log2 CN ratio between -0.1 and 0.1 was considered as copy number neutral. To measure CNV burden, fraction of copy number‐altered genome (FCA) was calculated by dividing the number of bases in segments with mean log2 CN ratio >0.1 or <−0.1 by the number of bases in all segments.
Tumor suppressor gene screening
Tumor suppressor genes (TSGs) were obtained from TSGene v2.0 (https://bioinfo.uth.edu/TSGene/) and IntOGen (https://www.intogen.org) database. Genes mutated in at least two samples were shown in barplot and TSGs were marked with bold font. KEGG enrichment analysis was done with those genes.
Data availability
The WES data generated in this paper have been deposited at the NCBI Sequence Read Archive (SRA) hosted by the NIH (PRJNA701236). The gene expression data reported in this paper was deposited in the Gene Expression Omnibus (GEO) database (accession numbers GSE167573). Clear cell renal cell carcinoma, papillary renal cell carcinoma, and chromophobe renal cell carcinoma sequencing data were obtained from the Cancer Genome Atlas (https://portal.gdc.cancer.gov/projects/).
Statistics
All analyses were conducted using R software v3.6.0 and SPSS v.16.0. All comparisons for continuous variables were performed using the two-sided Mann-Whitney test for two groups and the Kruskal-Wallis test for more than two groups. For categorical variables, Pearson’s Chi-squared test with continuity correction or Fisher’s exact test was used. Survival analyses were conducted using Kaplan–Meier method and the difference was tested using log-rank. Cox proportional hazards regression (forward likelihood ratio model) was used to determine the independent predictor of OS. A p-value less than 0.05 was considered statistically significant.