Study Population
Forty-three women were recruited from gynecology clinics in Winnipeg, Canada as part of the Vaginal Mucosal Systems (VMS) study. Enrollment criteria included age over 18 years, HIV negative, and no current or suspected pregnancy. All women provided written informed consent and the study was approved by the Institutional Review Board at the University of Manitoba.
Data and Sample Collection
Socio-demographic, obstetric, and gynecological data was collected by structured questionnaire. Physicians performed pelvic examinations and collected an APTIMA multi-test swab, 2 mid-vaginal swabs (Starplex Scientific Inc., Etobicoke, ON), cervicovaginal lavage (CVL), 2 endocervical cytobrush samples, and touched a pH strip to vaginal wall. The APTIMA swab was sent to CADHAM provincial laboratory in Winnipeg, Canada for chlamydia and gonorrhea testing. All other samples were immediately stored on ice and transported to the lab within 3 hours of collection.
Cervicovaginal lavage
CVL was collected by bathing the cervical os with 10mL of sterile phosphate buffered saline (PBS). CVL was aliquoted and stored at -80°C until use.
Cytokine Array
25µL CVL was analyzed using EMD Millipore’s MILLIPLEX MAP 30-plex Human Cytokine/Chemokine Magnetic Bead Panel following the manufacturer’s procedures. Samples were randomized across the assay plate, and 25 samples were randomly selected to be analyzed in duplicate. Cytokines and chemokines that were below the limit of detection in ≥ 40% of samples were removed from analysis. A curated list of 10 inflammatory cytokines (interferon α2 (IFN-α2), interleukin (IL)-1α, IL-1β, IL-6, IL-8, IFN-γ inducible protein 10 (IP-10), monocyte chemoattractant protein 1 (MCP-1), macrophage inflammatory protein (MIP)-1α, MIP-1β, and Regulated upon Activation, Normally T cell Expressed and Secreted (RANTES)) (2, 4, 5) were used to define inflammation within this study.
Endocervical cytobrush: Cytobrushes were vortexed for 45 seconds in PBS to dislodge cells, washed twice in fresh PBS and filtered twice prior to staining with an optimized antibody cocktail: CD19-APC, CD56-PE-Cy7, CD4-APC-H7, CD3-v500, CD8-BV605, CD49d-PE-Cy5, CD14-PerCP-Cy5.5, CD16-Alexa700, CD15-BUV395, Fixable viability stain 570 (all antibodies obtained from BD Biosciences). Samples were fixed in 1% paraformaldehyde prior to acquisition on an LSR II (BD Biosciences). Human peripheral blood mononuclear cells were used for staining controls and fluorescence minus one (FMO) controls. Data was analyzed using FlowJo v10 (Treestar).
Diagnosis of bacterial vaginosis
A wet mount microscopy slide was created using 30µL vaginal swab eluate and observed at 400x magnification for presence of Clue cells (57). A Whiff test was performed by mixing 30µL eluate with 30µL of 10% KOH, with detection of a strong amine odor indicating a positive result (57). A pH strip that had been touched to the vaginal wall was used to measure vaginal pH. A participant was diagnosed with bacterial vaginosis (BV) if they had 3 out of 4 Amsel criteria present.
Sample preparation for proteomics analysis
CVL supernatants were prepared for mass spectrometry as previously described (58). Briefly, BCA assay (Novagen) was used to quantify protein and then equal volumes were used. Samples were denatured with 8M urea, reduced using diothiothreitol, alkylated using iodoacetamide, and digested into peptides using trypsin. CVL pellets were collected by centrifuging 2mL CVL at 21,000 x g for 5 minutes at 4°C. Pellets were resuspended in 350µL lysis buffer (2% SDS, 0.1 M dithiothreitol, 0.5 M HEPES) then 50µL 0.1 mm glass beads were added. Samples were vortexed, heated at 95°C for 5 minutes, then vortexed for 3 minutes. Samples were centrifuged at 3,000 rpm for 3 minutes to pellet beads and cells, and the supernatant was transferred to a fresh tube on ice. This was repeated 3 times. Protein quantification was determined using 2-D Quant assay (GE Healthcare, NJ, USA), and protein was denatured using 8M urea, alkylated using iodoacetamide, then treated with benzonase solution (250 U/µL benzonase, 50 mM MgCl2, 50 mM HEPES), before being digested by trypsin. Peptides were cleaned of salts and detergents by reverse-phase liquid chromatography (LC) using the step-function gradient. Peptides were then quantified using a LavaPep Fluorescent Peptide and Protein Quantification Kit (Gel Company, CA, USA) following the manufacturer’s protocol. One microgram of peptide per sample was re-suspended in 2% acetonitrile with 0.1% formic acid submitted for nanoflow-liquid chromatography-tandem mass spectrometry (LC-MS/MS) analysis.
LC-MS/MS setup for proteomics analysis
Each sample was separately analysed using a nano-flow Easy nLC 1000 connected in-line to an Q-Exactive Plus mass spectrometer with a nanoelectrospray ion source at 2 kV (Thermo Fisher Scientific, San Jose, CA, USA). The peptide samples were loaded (1 µg) onto a C18-reversed phase Easy Spray column (50 cm long, 75 µm inner diameter, 2 µm particles, (Thermo Fisher Scientific, San Jose, CA, USA) with 100% buffer A (2% acetonitrile, 0.1% formic acid) for a total volume of 10 µl, and then separated on the same column. Peptides were eluted using a linear gradient of 5–22% buffer B (98% acetonitrile, 0.1% formic acid) over 100 min, 22%-32% buffer B for 15 minutes, 32%-90% buffer B for 5 minutes and a wash at 90% B for 10 minutes at a constant flow rate of 200 nl/min. Total LC/MS/MS run-time was about 180 minutes, including the loading, linear gradient, column wash, and the equilibration.
Data acquired using these setting: Dynamically choosing the top 15 abundant precursor ions from each survey scan for isolation in the quadrupole (1.4 m/z isolation width) and fragmentation by HCD (28% normalized collision energy). The survey scans were acquired in the orbitrap over m/z 375–1500 with a target resolution of 70000 at m/z 200, and the subsequent fragment ion scans were also acquired in the orbitrap with a resolution of 17500 at m/z 200. The lower threshold for selecting a precursor ion for fragmentation was 1.9e4. Dynamic exclusion was enabled using a list size of 500 features, a m/z tolerance of 15 ppm, a repeat count of 1, and an exclusion duration of 20 s.
Human proteome data analysis
Raw MS spectra were imported into Progenesis QI software (v2.0, Nonlinear Dynamics), aligned and filtered as described previously (21, 59). Filtered peptides were annotated using Mascot Daemon (v.2.4.0, Matrix Science), and searched against the SwissProt/UniProtKB (2015) human database, with a decoy database included to determine the rate of false discovery. Protein identifications were confirmed using Scaffold software (v4.8.3, Proteome Software) with confidence thresholds set at ≤ 1% FDR protein at the protein level, ≤ 0.1% FDR at the peptide level, and had ≥ 2 unique peptides identified per protein. Normalized relative abundances of each protein within each sample were obtained from Progenesis QI. Only proteins that had an average covariance of < 25% (925 proteins) as determined through measurements of a standard reference sample run at 10 sample intervals (n = 7) were used in downstream analysis to exclude proteins with higher technical measurement variability. Pathway analysis of host proteins was performed using ConsensusPathDB-human (60–63).
Microbial proteome data analysis: Protein database searches were conducted against an in-house vaginal metaproteome (VMP) database (64) using Mascot (v.2.4.0, Matrix Science), and included human proteins from the SwissProt/UniprotKB database to limit potential homologous identifications. Search results were then imported into Scaffold software to validate these protein identifications, filtered using the following criteria: <0.1%FDR for peptide identification, \(\le\)1% FDR for protein identification, and at least two unique peptides identified per protein, and protein spectral counts were normalized to total protein detected. Microbial taxa abundance was estimated by taking the sum of normalized total spectral counts from Scaffold for all proteins associated with each genus. Scaffold accession reports containing homologous protein information were used to identify proteins that matched to more than one genus, which were binned into an “undistinguishable” category.
Functional microbiome analysis
Non-homologous bacterial proteins identified in each participant were annotated for known biological functions using the KEGG ontology database with GhostKOALA (v2.2, Kyoto University Bioinformatics Center) (65). Proteins were binned to the pathway level; a total of 33 pathways were analyzed for interactions with inflammation to preserve experimental power.
Sample preparation for metabolomics analysis: Metabolomics procedures were adapted from Srinivasan et al (52). Metabolites were extracted from CVL using a 1:4 ratio of CVL to methanol, followed by vortexing and centrifugation at 16,000 x g for 30 minutes at 4°C. The supernatants were transferred to new tubes and dried using vacuum centrifugation. Metabolite samples were resuspended in 200µL of sample buffer (30% Buffer A: 70% Buffer B containing 53.1 µM 13C5-15N1-glutamic acid and 58.2 µM 13C2-succinic acid. Buffer A = 5 mM ammonium acetate in 0.1% acetic acid, Buffer B = 0.1% acetic acid in acetonitrile), then vortexed for 15 minutes and centrifuged at 16,000 x g for 30 minutes to remove particulate matter. A standard sample containing a mixture of 10 known metabolites, along with a mix sample consisting of extracted metabolites from a mixture of all samples, was injected periodically throughout each batch run for QC and monitoring LC-MS/MS conditions.
Targeted LC-MS/MS metabolomics analysis: Metabolites were separated using an Agilent 1200 binary pump HPLC system equipped with a ZIC-cHILIC column (150 x 2.1 mm, 3.0 µm particle size) (Millipore). The flow rate was set to 200 µL/min and the column temperature was kept at 4°C. The separation gradient was set as follows: 0–3 min: 70% B, 3-7.5 min: 70 − 30% B, 7.5–13.5 min: 30% B, 13.5–16.5 min: 30–70% B, 16.5–27 min: 70% B. Eight microlitres of each sample was injected for simultaneous positive and negative mode analysis on a Fusion Lumos Tribrid mass spectrometer. Targeted quantification of metabolites was accomplished in parallel reaction monitoring (PRM) mode, targeting 121 different metabolites (66 in positive mode, 55 in negative mode). Source conditions for positive mode were set as follows: Spray Voltage: 3500 V, Sheath Gas: 3 (Arbitrary Units), Aux Gas: 1.2 (Arbitrary Units), Ion Transfer Tube Temp: 275°C. Source conditions for negative mode are the same as for positive except the Spray Voltage is set to 2100 V. Mass spectrometry data is acquired by alternating MS1 and MS2 scans in both positive and negative mode. MS1 scans are collected with the following parameters (the same for positive and negative unless listed otherwise): Scan range: 55–280 m/z (positive), 65–500 (negative), Orbitrap resolution: 30000, RF Lens (%): 30, AGC Target: 4.0e5, Max Injection Time: 50 ms. MS2 scans are collected with the following parameters (the same for positive and negative mode unless listed otherwise): Isolation Window: 1.6 m/z, HCD Collision Energy: 30%, Stepped Collision Energy: +/- 10%, Orbitrap Resolution: 15000, Maximum Injection Time 22 ms. For MS2 scans the mass range was set from 50 m/z to the mass of the targeted metabolite + 10 m/z. Raw files obtained are subsequently uploaded into Skyline for metabolite peak integration and quantification using a custom method.
Statistical analysis
Statistical analysis and data visualization was performed with GraphPad Prism 8 (v.8.3.1 (332)) or R with plugins ‘NMF’, ‘RColorBrewer’, ‘dendextend’, and ‘ggplot2’. Demographic and gynecological data was analyzed using Chi Square or Kruskal-Wallis with Dunn’s multiple comparison test. Multi-‘omics data was analyzed using Kruskal-Wallis with Benjamini-Hochberg correction for multiple comparisons.
Bayesian Network analysis
Random forest models were created for each multi-‘omic dataset (vaginal metaproteome taxa, vaginal metaproteome functions, vaginal metaproteome proteins, metabolome, host proteome, cytokine, and flow cytometry) using the 3-level inflammation variable as the classifier. All continuous data was discretized using above/below median prior to network creation. Datasets were first screened using Random Forest models (66) in R to assess the top 5 most important variables based on Gini importance, which were selected as features for the network. The inflammation variable was also included in the network, resulting in 36 features chosen as nodes. Four participants were removed from this analysis as they did not have complete information across datasets. Discretized data from the 36 selected features across 39 participants were used as input for the Bayesian Network, which was constructed using the BNlearn package in R. To avoid getting stuck at a possible local maximum, 1,000 networks were bootstrapped and averaged for 1,000 random restarts, resulting in 1,000,000 total networks. The final consensus network was calculated using the average.network function in BNlearn (67). For each network created, the structure of the network was learned using the Hill-Climbing algorithm with the likelihood-equivalence Bayesian Dirichlet (BDe) scoring method. The optimal imagery sample size was calculated with the alpha.star function. As the network is intended to be a hypothesis-generating tool, the presence of an edge describes a relationship between two nodes without directionality. The final graph was visualized in Gephi using the Fruchterman Reingold layout. The nodes are coloured by data type and size indicates degree of the node.