Experimental design
Our overall experimental design is depicted in Figure S1. It entails subjecting patient AML samples to genomic and proteomic analysis and ex-vivo drug screening followed by the construction of predictive models of drug response for each type of data collected. We then use the signatures determined by the model to assess their performance in cross-validation experiments, explore their role in biological networks, and then validlate them in cell lines.
Sample collection
Samples were collected and processed as described in detail previously3. Briefly, all patients gave informed consent to participate in the Beat AML study, which had the approval and guidance of the Institutional Review Boards (IRB) from participating institutions. All samples used in this manuscript were collected at Oregon Health & Science University with a broad ‘research use’ clause. Mononuclear cells (MNCs) were isolated from freshly obtained bone marrow or peripheral blood samples from AML patients via Ficoll gradient centrifugation. Isolated MNCs were utilized for genomic (500x WES; RNA-seq) and ex vivo functional drug screens. WES and RNA-seq were performed using standard methods and data analysis was performed as previously described3. Clinical, prognostic, genetic, cytogenetic and pathologic laboratory values as well as treatment and outcome data were manually curated from the patient electronic medical records (EMR). Patients were assigned a specific diagnosis based on the prioritization of genetic and clinical factors as determined by WHO guidelines. We selected 38 unique patients from our ongoing study that had complete proteomic and phosphoproteomic measurements.
Ex vivo drug screening analysis
For drug sensitivity assays, 10,000 viable cells were dispensed into each well of a 384-well plate containing 7 point, 3 fold dilution, drug concentration series from a library of small molecule inhibitors. Cells were incubated with the drugs in RPMI media containing 10% FBS without supplementary cytokines. After 3 days of culture at 37 °C in 5% CO2, MTS reagent (CellTiter96 AQueous One; Promega) was added, the optical density was measured at 490 nm, and raw absorbance values were adjusted to a reference blank value and then used to determine cell viability (normalized to untreated control wells). Ex vivo functional drug screen data processing was performed as described, and drug fitting was carried out using the probit regression on quality-controlled data as in our previous work3.
Protein digestion and tandem mass tag (TMT) labeling
Sample preparation for proteomics was based on the protocol developed under the CPTAC consortium with minimal modifications28. Patient cell pellets were lysed with 500 mL fresh lysis buffer, containing 8 M urea (Sigma-Aldrich), 50 mM Tris pH 8.0, 75 mM sodium chloride, 1 mM ethylenediamine tetra-acetic acid, 2 mg/mL Aprotinin (Sigma-Aldrich), 10 mg/mL Leupeptin (Roche), 1 mM PMSF in EtOH, 10 mM sodium fluoride, 1% of phosphatase inhibitor cocktail 2 and 3 (Sigma-Aldrich), 20 mM PUGNAc, and 0.01 U/ m/mL Benzonase. The samples were then vortexed for 10 seconds and placed in a thermomixer for 15 minutes at 4°C and 800 RPM. Vortexing was repeated and the samples incubated again for 15 minutes utilizing the same settings. After incubation, the samples were centrifuged for 10 minutes at 4°C and 18000 rcf to remove cell debris. The supernatant was then transferred to a fresh tube. A BCA (ThermoFisher) assay was performed on the supernatant to determine protein yield.
Protein concentrations were normalized to the same concentration prior to beginning digestion. The sample was reduced with 5 mM dithiothreitol (DTT) (Sigma-Aldrich) for 1 hour at 37°C and 800 rpm. Reduced cystines were alkylated with 10 mM iodacetamide (IAA) (Sigma-Aldrich) for 45 minutes at 25°C and 800 rpm in the dark. The sample was diluted fourfold with 50 mM Tris HCL pH 8.0 and then Lys-C (Wako) was added at a 1:20 enzyme:substrate ratio, followed by incubation for 2 hours at 25°C, shaking at 800 rpm. Trypsin (Promega) was then added at a 1:20 enzyme:substrate ratio, followed by a 14-hour incubation at 25°C and 800 rpm. The sample was quenched by adding formic acid to 1% by volume, and centrifuged for 15 minutes at 1500 rcf to remove any remaining cell debris. Peptides samples were desalted using a C18 solid phase extraction (SPE) cartridge (Waters Sep-Pak).
After drying down SPE eluates, each sample was reconstituted with 50 mM HEPES, pH 8.5 to a concentration of 5 mg/ m/mL. Each isobaric tag aliquot was dissolved in 250 mL anhydrous acetonitrile to a final concentration of 20 mg/ m/mL. The tag was added to the sample at a 1:1 peptide:label ratio and incubated for 1 hour at 25°C and 400 rpm and then diluted to 2.5 mg/mL with 50 mM HEPES pH 8.5, 20% acetonitrile (ACN). Finally, the reaction was quenched with 5% hydroxylamine and incubated for 15 minutes at 25°C and 400 rpm. The samples were then combined per each plex set and concentrated in a speed-vac before a final C18 SPE cleanup. Each 11-plex experiment was fractionated into 96 fractions by high pH reversed phase separation, followed by concatenation into 24 or 12 global fractions for MS analysis.
Phosphopeptide enrichment using IMAC
The global samples were further concatenated to create 6 samples per plex for further enrichment. Fe3+-NTA-agarose beads were freshly prepared using Ni-NTA-agarose beads (Qiagen). Sample peptides were reconstituted to a 0.5 mg/mL concentration with 80% ACN, 0.1% TFA and incubated with 40 mL of the bead suspension for 30 minutes at RT in a thermomixer set at 800 rpm. After incubation the beads were washed with 100 mL 80% ACN, 0.1% TFA and 50 mL 1% FA to remove any non-specific binding. Phosphopetides were eluted off beads with 210 mL 500 mM K2HPO4, pH 7.0 directly onto C18 stage tips and eluted from C18 material with 60 mL 50% ACN, 0.1% FA. Samples were dried in speed-vac concentrator for storage and reconstituted with 12 mL 3% ACN, 0.1% FA immediately prior to MS analysis.
LC-MS/MS analysis
Proteomic fractions were separated using a Waters nano-Aquity UPLC system (Waters) equipped with a 75 um I.D. x 25 cm length C18 column packed in-house with 1.9 um ReproSil-Pur 120 C18-AQ (Dr. Maisch GmbH). A 120-minute gradient of 95% mobile phase A (0.1% (v/v) formic acid in water) to 19% mobile phase B (0.1% (v/v) FA in acetonitrile) was applied to each fraction. The separation was coupled to either a Thermo Orbitrap™ Fusion Lumos™ (patient samples) or Q Exactive™ HF (cell lines) Hybrid Quadrupole-Orbitrap™ mass spectrometer for MS/MS analysis. MS Spectra were collected from 350 to 1800 m/z at a mass resolution setting of 60,000. A top speed method was used for the collection of MS2 spectra at a mass resolution of 50K. An isolation window of 0.7 m/z was used for higher energy collision dissociation (HCD), singly charged species were excluded, and the dynamic exclusion window was 45 seconds. For the Fusion Lumos™, a top speed method was used for the collection of MS2 spectra at a mass resolution of 50K. For the Q Exactive™ HF experiments, a top 16 method was used for the collection of MS2 spectra at a mass resolution of 30K.
TMT global proteomics data processing
All Thermo RAW files were processed using mzRefinery to correct for mass calibration errors, and then spectra were searched with MS-GF+ v988129-31 to match against the human reference protein sequence database downloaded in April of 2018 (71,599 proteins), combined with common contaminants (e.g., trypsin, keratin). A partially tryptic search was used with a ± 10 parts per million (ppm) parent ion mass tolerance. A reversed sequence decoy database approach was used for false discovery rate calculation. MS-GF+ considered static carbamidomethylation (+57.0215 Da) on Cys residues and TMT modification (+229.1629 Da) on the peptide N terminus and Lys residues, and dynamic oxidation (+15.9949 Da) on Met residues. The resulting peptide identifications were filtered to a 1% false discovery rate at the unique peptide level. A sequence coverage minimum of 6 per 1000 amino acids was used to maintain a 1% FDR at the protein level after assembly by parsimonious inference.
The intensities of TMT 11 reporter ions were extracted using MASIC software32 . Extracted intensities were then linked to PSMs passing the confidence thresholds described above. Relative protein abundance was calculated as the ratio of sample abundance to reference channel abundance, using the summed reporter ion intensities from peptides that could be uniquely mapped to a gene. The relative abundances were log2 transformed and zero-centered for each gene to obtain final relative abundance values.
TMT phosphoproteomics data processing
IMAC enriched fraction datasets were searched as described above with the addition of a dynamic phosphorylation (+79.9663 Da) modification on Ser, Thr, or Tyr residues. The phosphoproteomic data were further processed with the Ascore algorithm33 for phosphorylation site localization, and the top-scoring assignments were reported. To account for sample loading biases in the phosphoproteome analysis, we applied the same correction factors derived from median-centering of the global proteomic dataset for normalization.
All proteomic data can be found on our synapse site. The cohort is spread across three tranches, as described in Table 1 below.
Patients
|
Data type
|
File
|
Table
|
Primary patient cohort
|
Proteomics
|
syn22130778
|
syn22172602
|
Patients with Sorafenib treatment
|
Proteomics
|
syn22313435
|
syn22314121
|
Patients with drug combination
|
Proteomics
|
syn25672089
|
syn22156810
|
Primary patient cohort
|
Phosphoproteomics
|
syn24610481
|
syn24227903
|
Patients with Sorafenib treatment
|
Phosphoproteomics
|
syn24227680
|
syn24228075
|
Patients with drug combination
|
Phosphoproteomics
|
syn24240156
|
syn24240355
|
Table 1: Location of processed proteomics files on Synapse.
Identifying drugs and samples for analysis
The list of available data for each patient is in Table S1. Although ~145 total compounds were tested in the drug panels, we filtered the drugs in this study to collect those that exhibited a range of responses across the 38 patients as determined by area under the curve (AUC) of the dose response. AUC represents the amount of drug required to reduce cell viability, so higher AUC values mean the samples are less sensitive to the drug, and lower AUC values indicate the samples are more sensitive. Specifically, we selected drugs for which at least 10% or 2 (whichever was greater) samples exhibited an AUC less than 100 (determined to be sensitive in previous work3). This selection produced a “balanced” distribution of AUC scores to enable our downstream analysis. We also added Gilteritinib (ASP-2215) to the panel as it is currently being evaluated in numerous clinical trials. The full range of drug responses across patients is shown in Figure 2A. While some patient samples lacked data on all 26 drugs (indicated in blue in Figure 2A), we were still able to use these samples to compare the efficacy of genomics, transcriptomics, proteomics and phosphoproteomics to model drug sensitivity based on the available data.
Linear models of proteomics and drug response
We constructed linear models for each of the 26 different drugs across up to 38 patients (depending on how many patient samples were evaluated with that drug) by regressing the AUC values (which ranged between 0 and ~300, as depicted in Figure 2A) against the molecular data as shown in Table S1. The input data for each model were each scaled slightly differently: the genetic mutations were represented as a binary matrix in which a 1 represented the presence of a somatic mutation and a 0 represented no mutation, the transcriptomics was represented by Counts per Million (CPM) of gene expression values, while proteomics and phosphoproteomics were represented as the log ratio of gene/phosphosites compared to the reference sample described above.
For each combination of drug and data type, we constructed a linear model Y~X where Y represents the vector of AUC values and X represents the molecular measurements for that patient. We used three different linear modeling approaches to reduce the number of features selected by the model: LASSO regression34, Elastic Net Regression35, and logistic regression as implemented by the `glmnet` package36. For the logistic regression, we discretized the AUC by representing Y as a binary variable, where 1 represented an AUC greater than 100 (patient is resistant to drug) and 0 if the AUC is less than 100 (patient is sensitive to drug).
For each model, we employed K-fold cross validation with K=5 on each type of data (e.g. mutations, proteomics, etc.) to assess performance. Within each K, we used leave-one-out cross-validation for each combination of data to select the alpha parameter that minimized cross-validation error. The model performance scores in Figure 2B and Table S2 represented the average correlation between predicted and actual values across all 5 models for each drug/data type. All of our analysis can be found in the `amlresistancenetworks` package we built at http://github.com/PNNL-CompBio/amlresistancenetworks and implemented at https://github.com/PNNL-CompBio/beatamlpilotproteomics. Those models that failed to select any molecular features were not included in our final analysis. The results are depicted in Figures 2B.
Signature interpretation using pathway annotations and statistical enrichment
To identify patterns in the features selected by the LASSO, Elastic Net, and logistic models we employed three main approaches. For gene, transcript, and proteomic signatures, we first used the `clusterProfiler` package37 to identify GO biological process tools that are enriched for the specific genes, transcripts, or proteins selected by the model. The results are listed in Table S2. In cases where there were no significant (corrected p<0.01) terms, the column is blank. For phosphoproteomic features, we used the `leapR` R package38 to identify specific kinases that were over-represented among the selected substrates, though none were identified. We believethis is due to the sparsity of the signatures as well as the lack of information about the kinase-substrate interactions.
Supplementing sparse proteomic signatures with interaction networks
To provide further context for the phosphoproteomic features selected by the models, we mapped selected proteins and/or selected phosphosites to published protein-protein39 and kinase-substrate40,41 interactions and then reduced this network to identify subnetworks using the Prize Collecting Steiner Forest (PCSF) R package42,43. Specifically, we used the STRING database39 together with networkKin40 and PhosphoSitePlus41 predictions of kinase substrate interactions to build a graph that combined protein-protein interactions with kinase-substrate interactions. To do this we added each phosphosite as its own node in the underlying graph. We weighted each edge from the node representing the substrate gene to the phosphosite with a cost of m/4 where m represents the mean cost of all the edges in the graph. The weight of each edge between the phosphosite node and the kinase gene was weighted with a cost of 3m/2 where m represents the mean cost of all edges in the graph. We then ran the PCSF algorithm42,43 over 100 randomizations using phosphosites, proteins, or genes from a single drug model. The results for the quizartinib proteomic and trametinib genomic logistic signatures are in Figure 3.
Using the proteins selected by the PCSF algorithm, which are a combination of those selected by the linear model as well as those selected by the PCSF algorithm, we used Cytoscape44 and the BinGO45 application to identify which GO biological process terms were enriched. The results are depicted in Tables S4.
Trametinib resistant cell line cultures
Human MOLM13 cells with FLT3-ITD mutation, were obtained from the Sanger Institute Cancer Cell Line Panel. Cell lines were maintained in RPMI 1640 (Gibco) supplemented with 20% Fetal Bovine Serum (HyClone), 2% L-glutamine, 1% penicillin/streptomycin (Life Technologies).Trametinib-resistant MOLM13 cell lines were generated by culturing MOLM13 cells in increasing concentrations of trametinib (Selleck). Cell viability was measured bi-weekly and cells were replenished with new media and trametinib. Resistance was assessed using the MTS assay for drug sensitivity. Once resistant, cell lines were maintained in 50nM trametinib added bi-weekly. Cell lines were screened for mycoplasma contamination on a monthly schedule.
For proteomic and phosphorproteomic profiling, 5 million parental MOLM13 (N=3) and resistant MOLM13 (N=3) cell lines were starved overnight in starvation media (RPMI supplemented with 0.1% BSA). Trametinib (50nM) was added to the starvation media of the resistant cell lines. Cells were washed three times in PBS, pelleted and flash frozen.
Quizartinib resistant cell line cultures
Human MOLM14 cells were generously provided by Dr. Yoshinobu Matsuo (Fujisaki Cell Center, Hayashibara Biochemical Labs, Okayama, Japan). Cells were grown in RPMI (Life Technologies Inc., Carlsbad, CA) supplemented with 10% FBS (Atlanta Biologicals, Flowery Branch, GA), 2% L-glutamine, 1% penicillin/streptomycin (Life Technologies Inc.), and 0.1% amphotericin B (HyClone, South Logan, UT). Cell line authentication was performed at the OHSU DNA Services Core facility.
To establish resistant cultures, 10 million MOLM14 cells were treated with 10 nM of quizartinib (Selleck Chemicals, Houston, TX) in media alone (N = 4) or in media supplemented with 10 ng/mL of FGF2 (N = 4) or FLT3 ligand (N = 4, FL; PeproTech Inc., Rocky Hill, NJ)46. All cultures were maintained in 10 mL of media. Every 2 or 3 days, recombinant ligands and quizartinib were replaced and cell viability was evaluated using the Guava cell counter (Millipore Inc., Burlington, MA). Following ligand withdrawal, quizartinib and media were similarly replenished and viability was monitored every 2 to 3 days. All cell lines were tested for mycoplasma on a monthly schedule.
For proteomic and phosphoproteomic profiling, naïve MOLM14 (N = 4), quizartinib-resistant parental (N = 2, no ligand), early (N = 4/ligand) and late (N = 4/ligand) cultures were washed three times with PBS to remove any trace of fetal bovine serum, pelleted, and flash frozen.