Cell experiments, transcriptomic analysis and metabolomic data preparation
The DU145 prostate cancer cells were obtained from the American Type Culture Collection. Aldrin, cell culture media and reagents were obtained from Sigma. DU145 cells were cultured in RPMI 1640 medium supplemented with 10% heat inactivated fetal bovine serum, 100 µg/ml streptomycin and 100 U/ml penicillin, the cell culture was in a humidified with 5% of CO2 at 37°C.
The DU145 PCa cells were treated as described in Bedia et al 2015 [5]. After the 50-days periode treatment, samples of both, Aldrin-exposed and non-exposed cells,were collected at two time points, at 0 and 5 hours.
- Transcriptomics: For transcriptomic analysis the Aldrin-exposed and non-exposed DU145 cells were harvested at 85% confluence using a rubber scraper into 2 ml of ice-cold PBS. Cell were centrifuged at 1300 rpm for 3 minutes at 4°C, and washed twice with cold PBS. The NucleoSpin RNA kit (Macherey-Nagel) was used to extract the total RNA [5]. Agilent 2100 Bioanalyzer platform (Agilent Technologies) to determine RNA quality. Next, 2 µg of were taken RNA and Transcriptor First Strand Synthesis Kit (Roche) was used to retrotranscribed to cDNA [5]. Finally, gene expression profiles from Aldrin-exposed and non-exposed cells were obtained by using HG-U219 array plate (Affymetrix inc. California, USA) and cDNA samples obtained from the experiment after 50 days exposure. Microarray data were normalized by using the RMA method [23] (GSE132063- https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE132063).
- Metabolomics: For metabolomics, cells and culture media were used for analysis. Cell media was collected in 1.5 ml tubes, centrifuged and supernatants were lyophilized. Cells for lipidomics and metabolomics were harvested using a rubber scraper into 2 ml of ice-cold PBS. Next cell centrifuging was done for 3 minutes at 1300 rpm and 4°C and were washed twice with cold PBS.
For metabolite extraction, 1 ml of 90% chloroform/methanol mixture 1:9 in water was added to the cell pellet or media dried samples. This mixture was fortified with 500 pmol 13C D-Glucose standard (CLM-420-PK, Cambridge Isotope Laboratories). The mixture was vortexed vigorously, sonicated for 5 min and centrifuged 10 minutes at 15000 rpm. The supernatant was transferred to another tube and solvent was evaporated under N2 stream. Next, samples were resuspended in 150 µl of methanol, centrifugated again 10 minutes at 15000 rpm, and 100 µl was transferred to glass vials for injection. For quantitative purposes, home-made standard mix solutions of metabolites (including amino acid and nucleoside commercial mixtures of Sigma and the selection of metabolites introduced in the model) at different concentrations ranging from 2.5 to 20 ppm were prepared in methanol.
Chromatographic separations were carried on an Accela UHPLC system (Thermo Scientific) using a hydrophilic interaction liquid chromatography (HILIC) column (TSK Gel Amide-80 column: 250 x 2.1 mm, 5 µm) from Tosoh Bioscience at room temperature.Two solvents were use to perform the elution gradient: acetonitrile (A) and 5 mM of ammonium acetate adjusted to pH 5.5 with acetic acid (B). Solvents A and B were mixed as follows: 0–8 min, linear gradient from 25 to 30% B; 8–10 min, from 30 to 60% B; 10–12 min, 60% B; 12–14 min, back from 60–25% B; and from 14 to 20 min, 25% B. The mobile phase flow rate was set to 0.15 mL·min − 1 and the injection volume was 5 µL. Exactive Orbitrap mass spectrometer (Thermo Fisher Scientific) with heated electrospray (HESI) as ionization source was used. HESI was used separately in positive and negative mode and metabolites are fragmented in HCD collision cell by alternating MS scans of the precursor ions and all ion fragmentation (AIF) scans. Mass spectra acquisition was done in profile mode at a resolution of 50,000 full width half maximum (FWHM) at m/z 200. The following parameters were used: sheath gas flow rate, 45 arbitrary units (a.u.); electrospray voltage, 3.0 kV; auxiliary gas flow rate, 10 a.u.; heated capillary temperature, 300°C. Range of the full scan mass range was set between m/z 80 and m/z 1000. A normalized collision energy (NCE) of 25 eV was used to perform the AIF. Raw data was exported to a cdf format using the Xcalibur file converter tool.
The cdf files from metabolomics and lipidomics were subjected to the ROI procedure [24] to extract the m/z features detected in each chromatographic run. Data were normalized by the number of cells and the amount of internal standard in each sample. Only the lipids and metabolites present in the GEM and that could be detected under the metabolomics/lipidomics analytical procedure carried out, were quantified using the appropriate calibration curves. The annotation of metabolites and lipids was performed by exact mass matching in Human Metabolome Database [25] and LipidMaps [26] online databases, and the information about retention times under the same chromatographic conditions collected in in-house databases from previous works. Further details about the analytical methodologies used for the metabolomics and lipidomics used in this work are given in [27–29]. The a resulting data was integrated into the GEM reconstruction analysis of Aldrin-exposed and non-exposed PCa cells.
Metabolic model readjustments/refinement
In this study we have used one of the most widely used reconstructions of human metabolism [12] as platform to integrate the metabolomic, lipidomic and transcriptomic data. The GEM (Recon 2.2) accounts for 1,789 enzyme-encoding genes, 7,440 reactions and 5,063 metabolites distributed over eight cellular compartments. In order to improve the omic data integration and analysis further adaptation of the model are required. More specifically, lipid associated metabolism was expanded and reactions that do not display a steady-state flux other than zero (blocked reactions) were eliminated. These steps are detailed below.
- Enabling GEM for lipidomic data integration: The intracellular lipidic profiles of control and Aldrin exposed DU145 cells were measured at two time points (0 and 5 hours). Since these measurements correspond to the whole-cell extracts/lysates, the contribution of each cellular compartment defined in the model is unknown. Thus, in order to integrate the lipidomic data into our computational analysis, the metabolic model was expanded as follows: i) first an additional boundary compartment and the corresponding boundary metabolites that gather all the measured lipids in more than one intra-cellular compartment (ie cytosolic, mitochondrial, lysosomal, etc.) were defined; ii) next, transport reactions between the intra-cellular lipids and the corresponding boundary lipid were defined ii) as well as the exchange reaction associated to each boundary lipids and iii) finally, a sink reaction associated to the measured lipids that are defined only in one intra-cellular compartment were also defined. This expanded GEM includes a boundary compartment accounting for all the measured lipids annotated in the metabolic model. Both, exchange reactions associated to boundary lipids and sink reactions enabled the integration of the lipidomic data in our model-driven analysis in the form of constraints.
- Model reduction: In order to reduce the computational time necessary to perform the analysis, the metabolic model (Recon2.2) [12] was reduced. It was achieved by removing the blocked reactions in the model. These reactions are those incapable of carrying any metabolic flux [30]. To this aim a Flux Variability Analysis (FVA) [31–33] was implemented using the 'fluxVariability' function in COBRA toolbox [21, 22]. This analysis computes the spectrum of fluxes that each reaction can carry while the value of the objective function is optimal. Thus, the reactions in which there is no feasible solution are considered as blocked reactions [30] and they are removed from the model. Finally, those metabolites that are neither a product nor a substrate of any reaction were eliminated from the model (dead-end metabolites).
Characterize the metabolic flux profile of Aldrin-expose and non-expose cells by applying Flux-Balance analysis
Before the metabolomics network reconstruction can be used to compute its properties, an important step must be taken in which the network reconstruction is mathematically represented. This conversion translates the reconstructed network into a chemically accurate mathematical format – the stoichiometric S matrix that becomes the basis for the genome-scale model. In the S matrix the reactions are in the columns and the metabolites in the rows. Each metabolite’s entry has its stoichiometric coefficient in the corresponding reaction. The stoichiometric matrix describes the quantitative relations between metabolites through the metabolic reactions and the steady state assumption imposes flux balance constraints on the network, ensuring that the overall amount of any compound being produced is equal to the overall amount being consumed. The next step in FBA analysis is to define the objective function, usually the biomass production. The biomass production represents the rate at which metabolic compounds are converted into biomass constituents such as nucleic acids, proteins and lipids. The objective of biomass production is mathematically represented by a global biomass reaction that becomes an extra column of coefficients in the stoichiometric matrix.
Once imposed on a network reconstruction, these balances and bounds define a space of allowable flux distributions in a network describing the possible rates at which every metabolite can be produced/consumed in the network. Here, the data integration plays an important part, in which FBA is able to perform simulations under different conditions by altering the constraints of the model.
Enhancing GEMs predictive capabilities via Omic data integration
The integration of biological information from different “omics” into a metabolic reconstruction analysis allows to further constraint the space of feasible solutions and provides a metabolic flux profile specific to a particular event, condition or environment. In this sense GEMs contain all known metabolic reactions encoded by an onganism's genome and provide an appropriate platform to integrate a large variety of omic data such as trancriptomic and metabolic data. In this work transcriptomic metabolomic and lipidomic data have been integrated in a metabolic network reconstruction analysis as follows.
- Transcriptomic data integration into GEMs: Gene expression profiles can be mapped onto a GEM via Gene-Protein-Reaction associations (GPR). These associations enable each reaction to be connected to a single or multiple genes associated with proteins/enzymes linked to reaction. Thus, gene expression levels can be propagated through GPRs to infer the metabolic reactions activity state with the aim of defining a metabolic flux profile that better describes the phenotype associated with a given gene expression profile.
In this work, the transcriptomic data was integrated by applying the iMAT algorithm, which incorporates gene expression data into a GEM via GPRs to predict a global metabolic flux activity [20]. This approach aims to maximize the similarity between gene expression and the activity state of the metabolic network. The genes are set to be highly or lowly expressed by imposing an upper and lower threshold (i.e. 66th and 33th percentiles as upper and lower threshold respectively). Thus, those genes with an expression level above the upper threshold are considered as highly expressed, the genes below the lower threshold are considered as lowly expressed, and the genes between these thresholds were considered as moderately expressed. On the other hand, reaction values were inferred by propagating gene expression throughout GPRs. Based on reaction values two sets of reactions were generated that consist of highly RH and lowly RL expressed reactions.
This information is used to formulate a mixed integer linear problem (MILP) [34] which is explained in more detail below:
(1) max(∑iϵRH (yi+ + yi- ) + ∑iϵRL (xi))
(2) S · v = 0
(3) vmin ≤ v ≤ vmax
(4) vi + yi+(vmin, i – ε) ≥ vmin,i; i ϵ RH
(5) vi + yi-(vmax, i + ε) ≥ vmax,i; i ϵ RH
(6) (1 – xi)vmin,i ≤ vi ≤ (1 – xi)vmax,i; i ϵ RL
In the first equation a MILP is represented which maximizes the number of reactions whose activity is consistent with their expression state. The mass balance constraint is enforced in second equation where v is the flux vector and S is a stoichiometric matrix. In the third equation maximum and minimum allowable flux is defined for each reaction by setting lower and upper flux bounds ( vmin and vmax respectively). For each highly expressed reaction the Boolean variables y+ and y- represent whether the reaction is active (in either direction, thus either y+ or y- is assigned the value 1) or inactive (when both y+ and y- are equal to 0). For each lowly expressed reaction, a Boolean variable x representing whether a reaction is inactive or active (1 or 0 respectively) is defined. A reaction associated to a highly expressed gene reaction is considered to be active if its predicted metabolic flux above or below the positive (ε) or negative (-ε) thresholds respectively (equations (4) and (5), respectively). Lastly, a lowly expressed reaction is considered to be inactive if it does not carry a flux that is greater than 0 in either direction (Eq. (6)). The optimization maximizes the number of reactions whose activity is similar to their expression state.
This approach considers the mRNA levels as clues for the likelihood that the enzyme in question carries a metabolic flux in its associated reaction(s), and then solves a constraint-based modeling optimization problem (in this case a MILP) to find a steady-state metabolic flux distribution by assigning permissible flux ranges to all the reactions in the network consistent with their expression state. As final result, iMAT algorithm provides a steady-state flux profile that maximizes the number of reactions associated to highly expressed genes and the number of inactive reactions associated to lowly-expressed genes, while the thermodynamic and stoichiometric constraints embedded into the model are still satisfied.
- Metabolomic and lipidomic data integration: After metabolomic data analysis, the intracellular and extracellular concentrations of measured lipids and metabolites were obtained. Based on this metabolomic and lipidomic data the upper and lower reaction bounds of exchange reactions were calculated to further constrain the model. Each reaction in a reconstruction has an upper and lower bound, which is a form of constraint that define the maximum and minimum allowable fluxes through the reactions. In order to calculate the upper and lower bounds, the sampling and measurements were conducted once after 50-days exposure and once again after 5 hours counting from initial measurement. Measuring the intracellular and extracellular concentrations at two sequential time points provided us with necessary metabolomics data for constraining the exchange reactions. Metabolites/lipids enter and leave the systems through exchange reactions and one of the most basic constraints imposed on genome-scale models of metabolism is that of substrate availability and its uptake rate via exchange reactions.
The metabolomic/lipidomic data-set considers two different time points with triplicate sample measurements (sampling times were at 0 and 5 hours, Supplementary material 1), allowing to determine the increase/decrease of quantity of measured species between time points [30]. This experimental omics data was used to calculate the lower and upper bounds represented in formula 7 and 8.
(7) ub = (c5 + Sd5) - (c0 - Sd0)
(8) lb = (c5 - Sd5) - (c0 + Sd0)
Here ub and lb are the upper and lb respectively, while c0 is the average concentration of a specie measured in triplicate samples at time zero and c5 the average concentration of the same specie measured in triplicate samples after 5 hours. Also, Sd0 is a standard deviation of triplicate data at time zero, while Sd5 is a standard deviation of triplicate data at 5 hours.
The metabolomic and lipidomic data were integrated as exchange reaction lower and upper bounds into our metabolic network reconstruction analysis by applying COBRA toolbox function 'changeRxnBounds' [21, 22].
Identify the activity state of the metabolic network by applying Sensitivity and Robustness analysis
- Sensitivity analysis: In the gene expression data integration analysis an optimal solution in terms of the objective function maximization is found, but this solution may not be unique. Actually, it may exist a space of alternative optimal solutions that represents alternative steady-state flux distributions obtaining the same similarity with the gene expression data (the same objective function value). To account for these alternative solutions, we employed Sensitivity analysis [20]. This is performed by solving two MILP problems (as is described in Transcriptomic data integration into GEMs sub-section) for each reaction to find the maximal attainable similarity with the expression data when the reaction is forced to be either active and inactive. Then, depending if the higher similarity with the expression data is achieved when the reaction is forced active or inactive (objective function optimality is achieved) the reaction will be considered to be active or inactive respectively. Conversely, a reaction activity state is considered undetermined if both simulations provides the same value for the objective function. From this analysis we could infer which reactions and pathways are more active in each cell group.
- Robustness analysis: The algorithm used to integrate the information from gene expression levels into a Genome-scale metabolic network reconstruction defines a threshold above which the expression of the genes is considered high and other bellow which the expression of the genes is considered low. It raises the necessity to perform a robustness analysis in order to demonstrate the lack of dependency of model’s predictions to the thresholds used in the analysis. In order to determine the robustness of model’s predictions a sensitivity analysis using the following pairs of thresholds defining the upper and lower boundaries respectively: percentiles 30thth and 70th, 33th and 66th and 40th and 60th.Thereby, a set of reactions unambiguously predicted to be either active, inactive or undetermined is defined.
Unveiling potential gene regulation and the potential effect on metabolism by analyzing gene expression data via GSEA
Transcriptomic expression data for Aldrin-exposed and non-exposed DU145 cells (GSE132063) was analyzed by using the list of curated gene signatures (Molecular Signatures Database V5.1)). Enrichment in gene set was considered significant for FDR values ≤ 0.1, NES > 1.5 and a leading edge signal > 50%.
Next, the resulting relevant gene sets were evaluated in order to identify the genes contributing to the significance of the gene set. Finally, a literature-based analysis was performed in order to connect the relevant genes extracted from the GSEA analysis and the metabolic reactions with different activity state between conditions from the model-driven analysis via signaling/regulatory mechanisms.
Using GSEA to identify correlations between the effects of the chronic exposure to Aldrin in DU145 PCa cells and PCa progression and metastasis
A metabolic gene set with significant differential expression between Aldrin-exposed and non-exposed DU145 cells was generated by applying the multivariate discriminant PLS-DA method [36–38]. Relevant genes were those having a VIP [39, 40] value equal or higher than 1.5. The resulting Aldrin-exposed-enriched metabolic gene set was used in a GSEA analysis of expression data sets for tumor types (Supplementary material 2) and retrieved from The Cancer Genome Atlas (https://tcga-data.nci.nih.gov/tcga/). For this analysis, different data sets were selected from those containing clinical tumor stage information, with samples grouped as T1,T2, T3, or T4 stages and metastatic. GSEA was performed on these data sets by applying a weighted scoring scheme and a Pearson metric with 1,000 phenotype permutations. Enrichment along tumor progression of the Aldrin-exposed DU145 cells metabolic gene set in these data sets was considered significant for FDR values ≤ 0.250.
Effects of INSIG2 inhibition on HMGCoA levels and cholesterol metabolism
- siRNA experiments: siRNA sequences (ON-TARGET plus SMART pool, 5nmol) against INSIG2 (L-021039) and non-targeting sequences (D-001810) and the transfection reagent DharmaFECT1 were purchased from Dharmacon Thermo Scientific.
- Western blot: Cells were harvested by trypsinization, centrifuged and washed twice with PBS 1X. Next, cell lysed was performed by using mammalian lysis buffer 1X (ab179835, Abcam) together with the cocktail protein inhibitor 1X (Thermo Scientific). Proteins in cell lysates were quantified using the BCA assay (Thermo Scientific) and 50 mg of proteins per sample was resolved by 10% SDS-PAGE. Next PVDF membranes (Roche) were used to transfer the proteins. Membranes were blocked with TBS 1X containing 0.1% Tween 20 and probed with the antibodies to INSIG2 (rabbit polyclonal antibody, 24766-I-AP, Proteintech), HMGCR (rabbit polyclonal antibody, ab174830, Abcam) and ∝-actin (ab8227, Abcam). Membranes were developed using the chemiluminescent signal detection kit ECL Prime Western Blotting detection reagent (GE Healthcare) and visualized using a LICOR C-DiGit blot scanner. Relative quantification of Relative western blot band intensity quantification was determined by using the software Image Studio Lite version 5.0; values were normalized to those of -actin, and differences between samples were calculated.