3.1 Patient Characteristics
Data from 500 patients consisting of 2,000 individual UHPLC-QTOF-MS assessments were available (two ESI positive, and two ESI negative analyses from each study participant), which was merged with the database containing patient clinical covariates obtained from the Manitoba Tumour Bank. Four patients were excluded from the analysis due to lack of consent to disclosure of clinical variables annotated to their samples, 11 patients were excluded from the analysis as they were identified as having provided more than one sample to the Manitoba Tumour Bank during the study period (duplicate patients). Loss of metabolomic data fidelity was detected in 10 patients and were excluded from the analysis.
Thus, data from 475 unique patients consisting of 241 lung cancer cases, and 234 cancer-free controls from which 1,900 individual UHPLC-QTOF-MS assessments were conducted, were included in the final analysis. From these, a total of 2,032 metabolic entities were detected in the ESI positive, 1,667 were detected in the ESI negative modes, and 1,529 entities were detected in both ESI modes. After filtering low prevalence entities, 676 metabolites remained of which 353 were detected in the ESI positive mode, and 323 from the ESI negative mode which were used for further classification assessments.
Baseline clinical characteristics of the 475 included patients are shown in Table 1. Amongst the NSCLC cases, 177 (73%) had adenocarcinoma, and 64 (27%) has squamous cell carcinoma. For NSCLC cases, the final postoperative pathological staging (AJCC 6th ed) was as follows: Stage I (60%), Stage II (21%), stage III (17%), and stage IV (2%). NSCLC cases had a median age of 69 (range 49-88) versus 55 (range 20-89) for cancer-free controls (p<0.001). Males comprised 46% of cases, versus 29% of cancer-free controls, and median body mass index was similar between cases (27.2, range 14.8 to 49.5) and controls (27.0, range 16.4 to 49.6). NSCLC cases had higher proportions with significant comorbidities including diabetes, cardiovascular disease, dyslipidemia, and hypertension. NSCLC cases had significant smoking history such that 27% were current smokers, 65% were previous smokers, and 8% were never-smokers. By contrast, cancer-free controls had lower levels of smoking exposure such that 6% were current smokers, 22% were previous-smokers, and 48% were never-smokers.
3.2 Cluster Analysis
Metabolic entities were clustered based on their correlations to one another by their distance in vector space (1-correlation) using the complete linkage method and samples were clustered based on Mahalanobis distance using the complete linkage in order to generate the cluster analysis and heat map visualization for ESI Positive (Figure 1A) and ESI Negative (Figure 1B) metabolites.
3.3 Cluster Representative Metabolites
The representative metabolite from each of the 12 identified clusters of metabolic entities are shown in Table 2. The complete list of metabolites found in each cluster is tabulated in a downloadable file (Supplemental file 1). Amongst the ESI positive entities, lipid metabolism (including phosphosphingolipids, glycerolipids), fatty acid metabolism, and steroids predominated as cluster representative metabolites. Amongst ESI negative entities, there was a greater variety of classes of metabolites which were observed as cluster representative metabolites including: organic acids (ketoacids, carboxylic acids), steroids, fatty acyls, and Hydroxyindol.
3.4 Principal Component Analysis
Principal component analysis of the cohort using the 12 cluster representative metabolites from the ESI positive and negative modes demonstrated useful separation of the cohort by the first two principal components in each ESI mode (Figure 2).
3.5 Assessment of Volcano Plots
Assessment of the volcano plots (Figure A1) reveals that most of the cluster-representative metabolites identified from cluster analyses are not outliers. A number of cluster-representative metabolites, however, were situated at or below the threshold of 0.5, indicative of smaller mean differential concentrations of the metabolites between cases and controls.
3.6 Logistic regression modelling
Multivariable logistic regression analysis for the endpoint of NSCLC case status using the 12 cluster representative metabolites revealed that a number of cluster representative metabolites functioned as statistically significant predictors of NSCLC case status, while others did not (summarized as Forrest plots in Figure 3A & 3B)
The ESI positive class representative metabolites significantly associated with NSCLC case status in the logistic model of the metabolites only included: MG(0:0/18:1/0:0) (OR 1.33, 95% CI 1.15-1.55), calcidiol (OR 1.51, 95% CI 1.25-1.85), 3-methoxybenzenepropanoic acid (OR 1.38, 95% CI 1.21-1.59), glycocholic acid (OR 1.21, 95% CI 1.01-1.46), pyridoxamine 5’-phosphate (OR 0.90, 95% CI 0.85-0.95), sphinganine 1-phosphate (OR 0.84, 95% CI 0.75-0.92), gamma-CEHC (OR 0.40, 95% CI 0.31-0.51), 1b,3a,12a-trihydroxy-5b-cholanoic acid (OR 1.12, 95% CI 1.03-1.23). The impact of adjustment for clinical covariates was substantial whereby several cluster representative metabolites lost statistical significance. The following ESI positive cluster representative metabolites conserved statistical significance after adjustment for clinical covariates: MG(0:0/18:1/0:0) (OR 1.33, 95% CI 1.10-1.60), pyridoxamine 5’-phosphate (OR 0.86, 95% CI 0.80-0.93), sphinganine 1-phosphate (OR 0.87, 95% CI 0.76-0.99), gamma-CEHC (OR 0.48, 95% CI 0.34-0.66), 1b,3a,12a-trihydroxy-5b-cholanoic acid (OR 1.19, 95% CI 1.05-1.35).
The ESI negative cluster representative metabolites significantly associated with NSCLC case status in the multivariable logistic model of the metabolites only included: 20-carboxy-leukotriene B4 (OR 1.51, 95% CI 1.30-1.77), 11-beta-hydroxyandrosterone-3-glucuronide (OR 1.36, 95% CI 1.14-1.63), lithocholic acid glycine conjugate (OR 0.75, 95% CI 0.63-0.89), 18-Hydroxycortisol (OR 0.48, 95% CI 0.35-0.63), formaldehyde (OR 1.11, 95% CI 1.01-1.21), isodesmosine (OR 1.16, 95% CI 1.09-1.24), 3-Methyl-2-oxovaleric acid (OR 0.92, 95% CI 0.89-0.95), deoxycholic acid 3-glucuronide (OR 0.93, 95% 0.89-0.97). The following ESI negative cluster representative metabolites maintained statistical significance after adjustment for clinical covariates: Lithocholic acid glycine conjugate (OR 0.65, 95% CI 0.51-0.82), formaldehyde (OR 1.12, 95% CI 1.00-1.25), isodesmosine (OR 1.16, 95% CI 1.07-1.27), 18-Hydroxycortisol (OR 0.63, 95% CI 0.42-0.91), 3-methyl-2-oxovaleric acid (OR 0.93, 95% CI 0.88-0.98), deoxycholic acid 3-glucuronide (OR 0.91, 95% 0.85-0.98). The complete logistic regression analysis results are viewable in supplemental tables 2 (ESI positive) and 3 (ESI negative).
3.7 Classification performance of Cluster Representative Metabolites
The classification performance of the ESI positive and negative models both with and without covariates are detailed in Supplemental Table A4. Using the metabolites alone, diagnostic accuracy of between 75% (ESI Positive) to 82% (ESI negative) were observed. Diagnostic accuracy notably improved with the addition of clinical covariate variables (Age, Sex, Smoking History) to 90% (ESI positive), and 94% (ESI Negative). Receiver operator characteristics curves of the logistic regression models (Figure 4) demonstrated area under the curve (AUC) of 0.94 for both the ESI positive and negative metabolites when clinical covariates were included in the model.