Sample Collection and Processing
Ascites or pleural effusions were drained and collected from nine ovarian cancer patients longitudinally over the course of patient treatment. Samples were collected under IRB # 07047 & 17334 (City of Hope), 41030 and 89989 (University of Utah), or HREC # 01/60, 16/161 by the Australian Ovarian Cancer Study (AOCS) which were analyzed under HREC # 15/84 (Peter MacCallum Cancer Centre). Malignant fluids were centrifuged at 500 x g for 5 minutes to pellet cells. Red blood cells were removed by lysis in Tris-Ammonium Chloride Buffer (17mM Tris, pH 7.4, 135mM Ammonium Chloride) and incubated for 5 minutes in a 37°C water bath. Cells were then centrifuged at 500 x g for 5 minutes at room temperature and repeated until red blood cells were removed. Cells were washed in 1x PBS (Gibco, Cat # 10010) before frozen viably in 50% RPMI-1640 (Gibco, Cat # 11875) + 40% Fetal Bovine Serum (FBS, Sigma, Cat# 12306C) + 10% DMSO (Fisher Scientific, Cat# D2650). Ascites fluid collected by the Australian Ovarian Cancer Study (AOCS) was centrifuged at 1500rpm for 10 minutes at 4ºC. Red blood cells were removed by incubation in ice-cold lysis buffer (14.4uM NH4Cl, 1uM NH4HCO3) at room temperature for 10 minutes. Cells were centrifuged at 1500rpm for 10 minutes at 4ºC, washed in 10% FBS in 1x PBS, and centrifuged again. Cells were frozen viably in FBS + 10% DMSO.
Cancer Cell Isolation
Frozen viable ascites or pleural effusion cells were thawed, centrifuged at 300 x g, and resuspended in 1x PBS to determine, concentration, viability, and cancer cell purity by trypan blue staining. In some cases, cancer cells were purified by Miltenyi Biotec QuadroMACS by negative selection of CD45+ (CD45 MicroBeads, Miltenyi Biotec, Cat# 130-045-801), CD90+ (CD90 MicroBeads, Miltenyi Biotec, Cat# 130-096-253), and podoplanin expressing cells (biotinylated anti-podoplanin antibody, Biolegend, Cat# 3370015 and Miltenyi Biotec anti-biotin MicroBeads Cat# 130-105-637) using LD columns (Miltenyi Biotec, Cat# 130-042-901) according to manufacturer instructions. The samples were processed using the StemCell EasyEights EasySep column-free magnet to remove CD45+ (EasySep CD45 Depletion Kit II, Cat# 17898), and/or dead cells (EasySep Dead Cell Removal (annexin V) Kit, cat# 17899) as appropriate. To isolate cancer cells using StemCell EasySep antibody Kits, cells were centrifuged and resuspended in 1x PBS + 2% FBS + 1mM Calcium Chloride (G-Biosciences, Cat# R040) to a concentration of <108 cells per 2mL total volume and transferred to a round bottom 5mL FACS tube. Sequentially Dead Cell Removal Cocktail (50µL/mL sample), and Biotin Selection Cocktail (50µL/mL sample) was added and incubated at room temperature for 3 minutes, followed by CD45 Depletion Cocktail (50 µL/mL sample) and incubated at room temperature for 5 minutes. StemCell RapidSphere magnet beads were added (75µL/mL for CD45 RapidSpheres and 100µL/mL for Dead Cell RapidSpheres) and incubated at room temperature for 3 minutes off the magnet. Cell samples were then incubated on EasyEight magnet for 5 minutes, collected supernatant, and repeated additional EasyEight magnet column cleanup. Collected cells were then centrifuged and resuspended in 1x PBS and maintained at 4°C.
Nuclei Isolation
After cancer cell isolation, patient samples that did not dissociate into single-cell suspensions, or had a high proportion of cancer cell clusters, were then processed to isolate single nuclei suspensions. To isolate nuclei, cells were resuspended in (4:1) Lysis Buffer (10mM Tris-HCl, pH 7.8 (Teknova, Cat# T1078), 146mM NaCl (Alfa Aesar, Cat# J60434AK), 1mM CaCl2 (G-Biosciences, Cat# R040), 21mM MgCl2 (G-Biosciences, Cat# R004), 0.05% BSA (EMD Millipore, Cat # 12661525), 0.2% Igepal CA-630 (MP Biomedicals, Cat# 198596), DNase/RNase free water (Gibco, Cat# 10977)) : DAPI Buffer (106mM MgCl2, 50 µg/mL 4’, 6-diamidino-2-phenylindole (DAPI, Invitrogen, Cat# D1306), 5mM Ethylenediaminetetraacetic acid (EDTA, Quality Biological Inc., Cat# E522100ML), DNase/RNase free water)) supplemented with fresh 0.2 U/µL SUPERase∙In RNase Inhibitor (Invitrogen, Cat# AM2694). Cells were incubated for 15 minutes at 4°C to release nuclei. The lysate was then filtered through a 40 µm mesh filter (Falcon, Cat# 352340) collecting nuclei in flow-through. All downstream nuclei processing utilized Eppendorf LoBind DNA tubes to prevent nuclei loss. Nuclei were centrifuged 500 x g, at 4°C, for 5 minutes and washed two times with 500 µL of 1x PBS + 1% BSA + 0.2U/µL SUPERase∙In RNase Inhibitor. Nuclei were resuspended in 1x PBS + 1% BSA + 0.2U/µL SUPERase∙In RNase Inhibitor at a target of 1,000 cell/µL, re-filtered using a 40 µm mesh filter, and counted on a hemocytometer by DAPI fluorescence using an Invitrogen Countess equipped with DAPI filter cube and maintained at 4°C.
Single-cell RNA Sequencing (scRNA-seq)
Single-cell RNA-Sequencing (scRNA-seq) was performed on single cell or single nuclei suspensions using either the Takara Bio ICELL8 Single-Cell System or the 10X Genomics Chromium to prepare cDNA sequencing libraries. Samples processed on the ICELL8 Single-Cell System (Takara Bio) were prepared using the SMARTer ICELL8 3’ DE Reagent Kit V2 (Takara Bio, Cat # 640167) from isolated nuclei. DAPI stained nuclei were diluted to a concentration of 60,000 cell/mL in 1x PBS + 1% BSA + 1x Second Diluent + 0.2U SUPERase∙In RNase Inhibitor and dispensed onto the ICELL8 3’ DE Chip (Takara Bio, Cat# 640143) using the ICELL8 MultiSample NanoDispenser. Single nuclei candidates were selected using the ICELL8 imaging system with CellSelect Software (Takara Bio) selecting for DAPI positive nuclei and reverse transcription and sequencing library preparation was performed according to manufacturer instructions. ICELL8 cDNA sequencing libraries were sequenced at a depth of 200K reads/cell on Illumina HiSeq 2500, read #1= 26nt, and read #2= 100nt.
Samples processed on the 10X Genomics Chromium were processed using the Chromium Single Cell 3’ V3 Kit (10X Genomics, Cat # 1000075) using whole cells or isolated nuclei. Single cells or nuclei were diluted to a target of 1,000 cell/µL in 1x PBS (whole cells) or 1x PBS + 1.0% BSA + 0.2U/µL SUPERase∙In RNase Inhibitor to generate GEM’s prepared at a target of 5,000 cells per sample. Barcoding, reverse transcription, and library preparation were performed according to manufacturer instructions. 10X Genomics generated cDNA libraries were sequenced on Illumina HiSeq 2500 or NovaSeq 6000 instruments using 150 cycle paired-end sequencing at a depth of 10K reads/cell. scRNA-seq was performed at the Integrative Genomics Core at City of Hope, Fulgent Genetics, and the High Throughput Genomics Core at Huntsman Cancer Institute (HCI) of University of Utah.
Genomic DNA Isolation and Whole Genome Sequencing (WGS)
Genomic DNA was isolated using the QIAamp DNA Micro Kit (Qiagen, Cat # 56304) according to manufacturer instructions for isolated cancer cells and nuclei suspensions from scRNA-seq, as well as patient-matched buffy coat for germline DNA. Germline DNA was also isolated from patient matched isolated peripheral lymphocytes using the salting-out method. Briefly, lymphocytes were resuspended in nuclei lysis buffer (0.1M Tris pH8, 2mM EDTA pH8, 0.1M NaCl, proteinase K & SDS), and incubated at 56°C for 1hr followed by 37°C for 3hrs. Saturated salt solution (~6M NaCl) was added to lysed cells, which were centrifuged at 14000rpm for 15min at 4°C after vigorous mixing. The supernatant was transferred to ice-cold 100% ethanol and the tubes were rocked gently until the DNA precipitated. After overnight incubation in ethanol at -20°C, DNA was rinsed twice by placing in 70% ethanol, centrifugation, and removing the ethanol. DNA was air-dried and resuspended in sterile water. WGS DNA libraries were prepared using either NEBNext Ultra II DNA Library Prep Kit (New England Biolabs), KAPA Hyper Prep PCR Free Library Prep Kit (Roche), or Nextera DNA Flex Library Prep Kit (Illumina), and sequencing performed on Illumina NovaSeq 6000 instruments at 150 cycles and paired-end sequencing to read depth of 40-60X coverage. Sequencing was performed at Admera Health, Fulgent Genetics, and the High Throughput Genomics Core at HCI of University of Utah.
Cell Culture
To create stable patient-derived primary cell lines, frozen patient ascites were processed and then immediately placed in media as specified below. All cells were maintained in RPMI 1640 (Gibco; cat # 11875085) supplemented with 10% heat-inactivated FBS (Sigma, cat # 12306C) and 1% antibiotic/antimycotic solution (Gibco; cat #15240062) in uncoated filter top polystyrene flasks and maintained at 37°C in 5% CO2, patient cells were additionally kept in 5% 02 hypoxic simulated humidified air.
Metabolic Assays
ATP production rates were assayed with the XF Real-Time ATP Rate Assay Kit (Agilent, cat # 103592-100) as per the manufactures’ instructions. Briefly, cells were plated down in the Seahorse XF96 cell culture microplates at 10000 cells/well/80 µL and placed back in 37°C, 5% CO2 incubator. After 24 hours cells were washed in assay media made up from Seahorse XF RPMI Media, pH 7.4 (Agilent, cat # 103576-100) containing 10 mM glucose (Agilent, Cat# 103577-100), 1 mM pyruvate (Agilent, cat # 103578-100), and 2 mM L-Glutamine (Agilent, Cat# 103579-100) and incubated for 1 h in a non-CO2 incubator at 37°C before a final wash in the assay media. The Seahorse XFe96 analyzer was calibrated and the assay run using a standard XF Real-Time ATP Rate template created using the WAVE software (V2.6.1) and assay standard drug injections were used of 1.5 uM Oligomycin in port A and 0.5 uM Rotenone/Antimycin A in port B.
Results for each well were normalized by cell count using 1ug/mL Hoechst that was added to port B with the Rotenone/Antimycin A cocktail and injected automatically, then visualized by imaging the wells at 4X on the Cytation5 multimode cell imager (BioTek) and analyzed with GEN5 software (BioTek; V3.0.5) for cell count. If multiple plates were needed for comparison, OAW42 cells were plated down at 5000 cells/well in triplicate 24 hours before the assay for environmental variable normalization between plates. Analysis for the ATP rate assay was performed using the Agilent ATP report generator as per manufacturer's recommendations.
Cell Growth and Viability Assays
Cell viability of the matched samples from patient 4 (2 samples) and patient 8 (3 samples) was assessed by the CellTiter-Glo 2.0 cell viability assay (Promega; cat # G9241) as per the manufacturer's instructions. Briefly, 1000 cells/well were plated in triplicate, flat-clear bottom 96 well plate with media as previously described. After 12 days the cells were equilibrated to room temperature for 30 mins and then equal volumes of the CellTiter-Gloe reagent to media was added to each well and placed on an orbital shaker for 2 mins, then allowed to incubate for a further 10 mins at room temperature and luminescence was read on a plate reader (Tecan infinite M1000). The growth of the cells in the 96 well plates was also assessed by imaging each well every 24 hours in a Cytation5 multimode cell imaging system (BioTek). Specifically, a phase-contrast image was taken with a 4 x 4 montage, and then the GEN5 software (BioTek, V3.0.5) was used to stitch the image together and cell analysis calculated the cell count of each well. Both cell growth and viability were plotted with GraphPad (Prism V8.4.3).
Single-cell RNA sequencing analysis
Raw scRNA-seq data were pre-processed in the Bioinformatics ExperT SYstem (BETSY) (56) using the Cell Ranger v2.1.1 pipeline for 10X data, aligned to the hg19 reference genome using the STAR aligner (57), followed by extraction of read counts using featureCounts (58). The resulting count matrix of cells was used for downstream analysis using the R package Seurat v3 (59). High-quality cells were identified based on the following criteria: a minimum of 1000 total number of expressed genes per cell, a minimum of 2000 UMIs per cell, and a percentage of mitochondrial genes less than 25%. Counts matrix from individual patient samples were normalized and integrated using the canonical correlation analysis (CCA) algorithm for batch correction (59). This was followed by a principal component analysis of the variable genes in the integrated dataset, clustering using unsupervised graph-based clustering and dimensionality reduction using uniform manifold approximation (UMAP) or t-distributed stochastic neighbor embedding (T-SNE).
The cell-type identities of the clusters were determined using a two-step approach. A first pass prediction was performed using the SingleR reference-based classification approach (23) using references based on the ENCODE (60) and HPMC (61) datasets. Next, the individual markers corresponding to predicted cell-types were mapped on to the clusters to confirm their classification. Additionally, we classified malignant epithelial cells and normal cells by inferring chromosomal copy number aberrations (CNAs) from the scRNA-seq data using the method described by Patel et al. (16). The copy numbers were inferred using the R package InferCNV, using predicted fibroblasts as reference. For pathway enrichment, raw counts were first normalized using the method proposed by Rizzo et al. (62). Then, a single sample gene set enrichment scores were calculated for hallmark (63) and curated molecular signature (64) gene sets using the GSVA package for R (65).
Whole-genome sequencing analysis
Germline and tumor WGS sequencing raw reads were pre-processed using the Bioinformatics ExperT SYstem (BETSY) to add read-groups, mark duplicates, perform indel realignment, base quality recalibration, sorting and indexing, and alignment to the hg19 reference genome using BWA MEM to generate BAM files. Allele-specific CNVs calls, along with ploidy and cellularity estimates were called from the BAM files using Sequenza (66) or Facets (67) CNV callers using the corresponding germline BAM files of that patient as reference. For each sample, the CNV calls were z-transformed (allele-specific copy number – mean sample copy number / standard deviation of sample copy number) and rounded to the nearest integer for comparison. Copy number alterations were defined as z-transformed copy numbers of ≥2 for gains and £-2 for losses.
Germline variants in homologous repair genes (ATM, ATR, CHEK1, CHEK2, BRCA1, BRCA2, BARD1, BRIP1, FAM175A, MRE11A, NBN, PALB2, RAD51C, RAD51D) along with TP53 and RB1 were determined by genotyping the germline BAM files using GATK, platypus, varscan and freebayes. Variants detected by at least two callers and with a VAF ≥0.05 were retained and annotated using SnpEff (68) to determine non-synonymous variants. Somatic SNVs and small insertions or deletions were determined from the BAM files using strelka (69), mutect2 (70) and muse (71) variant callers. Genes with a variant allele frequency ≥ 0.05 determined by at least two callers were retained for further analyses after adjusting for cellularity as determined from the CNV callers. Non-synonymous variants were first determined using SnpEff. Cancer genes were defined based on the list of cancer census genes from COSMIC (72). Potential driver mutations were defined based on the list of known or predicted drivers in the IntoGen database (73). Structural variants, including insertions, deletions, and breakpoints were called and annotated using SvABA (74). CNV and SVs were visualized as circos plots using the R package RCircos (75).
Archetype analysis and biological task classification
We analyzed the HGSOC scRNA-seq transcriptomes intending to identify distinct biological tasks that each of the cells need to perform and face evolutionary trade-offs (24). Based on the theory proposed by Shoval et al. (21), we seek to represent the transcriptome datasets as a Pareto-optimal situation by identifying that encloses the data with the vertex of the polygon representing task-specific archetypes. For this analysis, we used the first 5 principal components of the CCA-normalized scRNA-seq data from the longitudinal cohort, individual patient samples from the longitudinal and the early (treatment naïve), or late (multiline treatment) cohorts. We used the ParTI package for R (https://github.com/vitkl/ParetoTI) to first determine the minimum number of vertices required to enclose the transcriptome data based on the principal convex hull algorithm (76). Simulations with an increasing number of vertices revealed three vertices (triangle) were sufficient to enclose the data in each case, with additional components resulting in no gain in the proportion of variance explained by the resulting polygon. Subsequently, the polygon fit and archetype scores, or standardized Euclidean distance of each cell to the nearest vertex, were determined. For each archetype, specialist cells were defined as cells above the 80th percentile of archetype scores, while cells that did not meet this criterion for any archetype were classified as non-specialists. The evolution of the archetypes was represented as the percentage of specialists at each time point using the R package fishplot (77).
To determine the biological tasks that described each archetype, we used a gaussian multi-task or multinomial model with the set of archetype scores as the outcome variable and the hallmark gene set enrichment scores or CCA-normalized gene expression of each cell as the set of predictors. The multi-task model was fit using a group-lasso penalty using the R package glmnet (78). Briefly, 10-fold internal cross-validation was performed with a lasso penalty (alpha = 1) to determine the multi-task model error over varying penalty parameter (lambda) values. The contribution (coefficients) of each pathway to the model based on the fraction of deviance explained by the was also assessed over varying levels of degrees of freedom. Top pathway phenotypes contributing to the model were used to define the phenotypes associated with each archetype. Subsequently, the model was fit using a lambda value within one standard error of the minimum. The group-lasso coefficients of each hallmark pathway were then analyzed using hierarchical clustering and correlation analyses to determine clusters of related pathways that were associated with each archetype. Further, the identities of the archetypes were validated based on repeated clustering pattern of the pathway coefficients determined using multi-task learning analysis of the individual patient archetypes, co-clustering of coefficients from related pathways, and expression levels of key genes that were available in the normalized scRNA-seq dataset.
Data availability and software
Single-cell RNA-Seq data generated and analysed during this study are available from the GEO database under accession GSE158722 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE158722)
The following secure token has been created to allow review of record GSE158722 while it remains in private status:kzonuiyutvkzlgp
Whole genome sequencing and raw scRNA-seq data are available under controlled access from dbGaP. The BETSY software environment (56) used for the bioinformatic analyses is available at https://github.com/jefftc/changlab. Custom pipelines for the pre-processing of scRNA-seq, WGS, and gene set enrichment analyses with BETSY, and Seurat, archetype, and multi-task learning analyses with R are available at https://github.com/U54Bioinformatics. Analyses with R-packages were performed in R-Studio (version 1.2.5033; R version 3.6.3.).