Study Design and Quality Evaluation of the Plasma Proteome Analysis
In this study, we present a streamlined workflow to investigate the impact of FOLFOX treatment on the plasma proteome of colorectal cancer (CRC) patients. In the discovery phase, we collected plasma samples from 90 CRC patients (Fig. 1). These patients were categorized into two groups, the sensitive group (SENS) consisting of 60 individuals who showed stable recovery and no relapse after surgery and the no-impact group (NONE) comprising 30 patients whose tumors metastasized. Table 1 provides the basic clinical characteristics of the patients, including age at diagnosis, gender, height, and weight for calculating the body mass index (BMI). We employed appropriate statistical methods to analyze normally distributed values. Additionally, we collected the information of traditional clinical tumor markers, namely CEA, CA19-9, and CA125, which are commonly associated with colorectal cancer (Table 1). The CA125 levels in all patients ranged from 1.6 to 89.68, with a median value of 13.54. NONE group exhibited a higher concentration range of CA125, CEA and CA19-9 as expected, while no significant difference was observed between the two groups (Table 1). This suggests the need for complementary markers to increase prediction accuracy.
Table 1
Baseline characteristics of CRC-FOLFOX plasma proteome profiling cohort
Characteristics | Total | SENS | NONE | P-value |
Number of Samples (n) | 90 | 60 | 30 | -- |
Age (years, mean ± SD) | 57 ± 22 | 56 ± 20 | 56 ± 21 | 0.14 |
Gender (male, %) | 57(63%) | 38(63%) | 19(63%) | -- |
BMI (kg/m2, mean ± SD) | 23.34 ± 2.48 | 23.34 ± 2.5 | 23.62 ± 1.5 | 0.300 |
CA125 (U/mL) | 13.54 (10.01–17.92) | 13.10 (10.08–17.84) | 13.72 (9.77–21.36) | 0.494 |
CEA (ng/mL) | 3.11 (2.14–5.09) | 2.61 (1.88–3.93) | 3.74 (2.80-16.45) | 0.499 |
CA19-9 (U/mL) | 11.19 (7.76–17.81) | 10.62 (7.68–13.74) | 12.83 (7.62–26.15) | 0.500 |
We employed MS-based proteomics for plasma proteome profiling and subsequent screening of diagnostic markers (Fig. 1). To achieve this, we processed the plasma samples from the discovery cohort using a highly reproducible 3-hour proteomics sample preparation method known as SISPROT [16], The resulting MS-ready peptides were then subjected to LC-MS analysis using data-independent acquisition mode, allowing us to achieve deep quantitative plasma proteome profiling (Fig. 1). The expression matrix of the proteome obtained from the LC-MS analysis was further subjected to functional analysis and utilized to train machine learning models. This approach enabled us to generate protein panels that could be used for prognostic prediction, potentially identifying key markers associated with treatment response and patient outcomes. To validate the potential biomarkers identified through the discovery phase, we utilized the parallel reaction monitoring (PRM) method of targeted proteomics. This validation step allowed us to confirm the presence and abundance of specific proteins of interest, providing additional evidence for the reliability and relevance of our findings (Fig. 1).
To ensure reliable biomarker screening, rigorous quality controls were implemented for LC-MS/MS detection over an extended data acquisition period. Indexed retention time (iRT) perturbation demonstrated the stability of our LC system, with minimal deviations observed at adjusted retention times 40 and 100 (Fig. 2A). Consistent results were confirmed through manual peak comparison, total ion chromatogram, and base peak overlay analyses. MS analysis exhibited related consistency in the original response of total intensity in each group, subsequently normalized by the median (Fig. 2B). Our single-run shotgun proteomic workflow identified 831 protein groups from 1 µL plasma samples of 90 CRC patients, covering a broad dynamic range of 8 orders of magnitude (Fig. 2C). After data filtering and normalization, 536 protein groups were quantified on average per sample, showcasing the high quality of our data set (Fig. 2D).
Plasma Proteome Profiling of CRC Patients Undergoing FOLFOX Chemotherapy
The aim of this study was to investigate the impact of FOLFOX chemotherapy on the plasma proteome of colorectal cancer (CRC) patients and identify potential biomarkers associated with treatment response and patient outcomes. We utilized MS-based proteomics to comprehensively analyze protein expression profiles in CRC patients undergoing FOLFOX treatment. Partial least square-discriminant analysis (PLS-DA), a supervised clustering method, demonstrated a clear separation between the sensitive (SENS) and no-impact (NONE) groups based on their plasma protein expression profiles (Fig. 3A), consistent with sample types. The majority of the variance (PC1) accounted for 28.8% of the data, indicating strong discriminatory power. Volcano plots revealed 257 significant proteins with FDR-corrected p-values < 0.01 and 115 dysregulated proteins with at least a 2-fold change (Fig. 3B). Among these, 95 proteins were up-regulated, and 20 were down-regulated in the comparison between the SENS and NONE groups. The list of these proteins can be found in supplementary table 2.
Gene ontology (GO) enrichment analysis on the 257 significant proteins highlighted the immune system process (Fig. 3C), as the most significantly enriched biological process, including complement-related proteins such as C1R, CFD, CFB, CFI, C1S, and C6. Additionally, the response to stimulus, involving proteins known to be involved in cancer development such as immunoglobulin heavy variable chain, serotransferrin, and CD44 antigen [23], was the second most enriched process. Notably, a significant enrichment in metabolic processes was observed, particularly pyruvate metabolic processes, which have been associated with CRC initiation and cancer progression. Additionally, several enzymes were identified in this category. Notably, a significant enrichment in metabolic processes was observed when inspecting the children terms of the GOBP. For instance, proteins involved in pyruvate metabolic processes, including ALDOA, ENO1, GAPDH, LDHA, LDHB, PGAM1, PGK1, PKM, and TPI1, were found to be dysregulated. This metabolic process, also known as glycolytic process [24], has been reported to have a strong relationship with CRC initiation and cancer progression[24]. Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis (Fig. 3D) revealed pathways impacted by the significant proteins, including systemic lupus erythematosus and neutrophil extracellular trap formation, both have been reported to promote colon cancer metastasis [25]. The Molecular Complex Detection (MCODE) networks [26] showed protein-protein interactions gathered into 11 networks (Fig. 3E). The most complex network, MCODE1, represents platelet activation, signaling, aggregation, and degranulation. The second network is related to the initial triggering of complement activation and cascade. Another significant network, MCODE3, is composed of apolipoproteins and is associated with lipid-related processes.
To explore patterns of protein expression across the patient cohorts, we performed hierarchical clustering of all 802 filtered quantified proteins, visualized as a heatmap in Fig. 4A. Interestingly, this unsupervised analysis identified two significant clusters (circled and labeled), demonstrating distinct expression profiles in part of the NONE group and SENS group, respectively. We further examined the correlations between these clustered proteins and patient group classification by pattern search. Cluster 1 exhibited a diverse range of both positive and negative correlations with group classification. For instance, proteins like KRT10 and KRT2 displayed positive correlations, while FABPG, SPRR1B, and DSG1 exhibited negative correlations with patient group classification. Similarly, cluster 2 revealed a variety of expression patterns within the proteins, indicating considerable heterogeneity within these groups. These findings underscore the complexity of the CRC-folfox plasma proteome and emphasize the need for a comprehensive analysis of protein markers to distinguish patients with distinct clinical outcomes.
Furthermore, we observed significant differences in the expression levels of specific up- and down-regulated proteins between the SENS and NONE groups, spanning a wide range of intensities. These highly dysregulated proteins were found to be involved in various biological processes. For example, Galectin-1 (LGALS1) was downregulated in the NONE group (Fig. 4C) and is known to play a role in regulating apoptosis, cell proliferation, and cell differentiation in carbohydrate metabolism. Previous studies have associated LGALS1 downregulation with poor prognosis in CRC[25]. On the other hand, Apolipoprotein C-III and Apolipoprotein A-II were significantly upregulated in the study (Fig. 4C), and are involved in maintaining blood function, potentially contributing to chemotherapy resistance. We further examined the correlation patterns of these highly regulated proteins. LGALS1 and C1QA, both up-regulated proteins, exhibited a high correlation in the NONE group (R = 0.95) but a weaker correlation in the SENS group (R = 0.27) (Fig. 4D). Additionally, another up-regulated protein, P4HB, demonstrated a strong correlation with LGALS1 in both groups, with Pearson correlation coefficients of 0.91 and 0.83, respectively. In the case of downregulated proteins, APOC3 and DSG1 displayed a positive correlation in the NONE group (R = 0.38) but a negative correlation in the SENS group (R= -0.21). Both APOA2 and APOC2 were downregulated in both groups and exhibited similar correlations (Fig. 4D). These correlation patterns suggest that no single protein consistently changes in response to FOLFOX treatment in CRC patients. However, due to the lack of healthy individuals' samples and limited follow-up data, we were unable to directly assess the survival impact of these corresponding genes. To gain insights into the potential survival impact, we examined the disease-free survival (DFS) curve of these genes on Gene Expression Profiling Interactive Analysis (GEPIA). The results, shown in Supplement Figure S1, indicated that low expression of C1QA and LGALS1 was associated with better patient survival, whereas high expression of P4HB was related to longer survival. It is important to note that individual gene expression patterns may not precisely align with the plasma protein expression profiles observed in our study. This discrepancy could be attributed to differences between tissue leakage proteins in plasma and solid tumors themselves. Additionally, proteins may be subject to multiple regulations in response to FOLFOX treatment, and individual gene expression alone may not solely impact DFS. Further investigations and validations are warranted to understand the potential survival impact and clinical significance of these proteins in CRC patients undergoing FOLFOX chemotherapy.
Prognostic prediction of FOLFOX-treated CRC patients by machine learning
In our study, we employed a hypothesis-free machine learning method called Random Forest to explore the possibility of predicting the curative effect of FOLFOX treatment on Stage II/III CRC patients. For this analysis, we utilized the 115 dysregulated proteins as signatures. The samples were randomly divided into two sets, with 40 SENS group and 20 NONE group samples used as the training set, and the remaining samples as the validation set. We generated multiple models with varying numbers of features (1 to 115) based on 5-fold cross-validation (Fig. 5A). The generated models exhibited excellent performance, as evaluated using the receiver operating characteristic (ROC) curve. After thorough evaluation, we selected the model consisting of 25 preferential variables, which achieved an area under the ROC curve (AUC) of 0.908, with a 95% confidence interval of 0.742–0.997. This selected model demonstrated high accuracy, correctly classifying most of the patients into their respective groups. Only 4 SENS group and 2 NONE group patients were misclassified, resulting in over 93% accuracy (Fig. 5B). The top 20 protein signatures of this selected model are shown in Fig. 5C. Among these signatures, protein S100A4 emerged as the most important variable, and it has been previously reported as a prognostic biomarker for colorectal cancer [27]. Another important signature, LGALS1, is known to undergo significant changes during colorectal cancer development and metastasis, and it has been implicated in various normal and pathological processes [25, 28]. FABP5, a fatty acid-binding protein, was also identified as a crucial signature in the model and has been recognized as a novel target for its regulatory role in lipid metabolism in colorectal cancer [29]. Furthermore, a panel of 9 proteins was selected based on their high Gini index (higher than 1.3). This panel included highly up-regulated proteins such as LGALS1, S100A4, RPL12, and HSP90AB1, highly down-regulated proteins like FABP5 and KRT16, and slightly down-regulated proteins APOA2, APOC3, and JUP. This combination of biomarker panels holds significant potential as a powerful prediction model for assessing the curative effect of FOLFOX treatment in CRC patients. Overall, our machine learning approach using the plasma proteome data has demonstrated promising results for predicting treatment outcomes in CRC patients undergoing FOLFOX chemotherapy. However, further validation studies with larger patient cohorts are essential to establish the clinical utility and robustness of this prediction model.
Parallel reaction monitoring (PRM) validation
Parallel Reaction Monitoring (PRM) is a targeted mass spectrometry-based method that allows for precise and sensitive quantification of specific peptides or proteins in complex biological samples. In our study on predicting the curative effect of FOLFOX treatment on CRC patients, PRM validation is a crucial step in confirming the significance and reliability of the identified protein panel. To validate the findings from the discovery cohort, we collected a new cohort of 26 CRC patients, including 13 patients in the SENS group and 13 in the NONE group. We selected targeted peptides for the panel of 9 proteins identified in the discovery cohort. An example of the APOC3 peptide transition peak and quantification analysis is illustrated in Fig. 6A-B. By comparing the protein abundance in the two groups across these 9 proteins (Fig. 6C), we observed significant changes in 6 proteins. Notably, a panel of 5 proteins, namely S100A4, RL12, KRT16, HSP90AB1 and APOC3, exhibited expression changes consistent with the results obtained from the machine learning analysis, with 3 of these proteins showing statistical significance. The PRM validation results strengthen the robustness of our identified protein panel as potential biomarkers for predicting the curative effect of FOLFOX treatment in CRC patients. The concordance between the machine learning analysis and the PRM validation provides additional evidence for the reliability and accuracy of our prediction model. In conclusion, the use of PRM validation in our study further supports the potential clinical utility of the identified protein panel as a powerful tool for assessing treatment outcomes in CRC patients undergoing FOLFOX chemotherapy. However, further validation in larger patient cohorts and additional functional studies will be essential to fully establish the clinical value of these protein markers.