Background The primary focus for this study was investigation of prognostic biomarkers in colorectal cancer (CRC), since biomarkers are instrumental in clinical decision making and patient management as well as playing pivotal roles in precision medicine.
Methods We analyzed indigenous dataset from colorectal cancer patients. Exon microarray dataset was used and 135 genes were identified as novel candidate biomarkers. 135 genes were further split into two groups: low and high gene expression values via ’maxstat’ algorithm, they were analyzed using Kaplan-Meier (K-M) analysis and univariate Cox model, and a set of 33 genes were identified as statistically significant (p¡0.05). Furthermore, using the ’VARCLUS’ algorithms (a SAS software procedure) which is a useful tool for variable reduction, based on the divisive clustering techniques, a small subset of 5 genes were selected out of 33 significant genes as potential candidates to build survival models. Both parametric and semi-parametric survival models were utilized to assess whether these 5 genes could be used as prognostic biomarkers.
Results HHLA2 (p < 0.01) and NEBL (p < 0.01) genes emerged as potential biomarkers, based on the para- and semi-parametric models such as: Rayleigh, Exponential probability distributions, and Cox models and were also validated on independent datasets, apart from validation with qPCR test as well as with the cell lines patients data in the laboratory. A rigorous statistical evaluation for model’s performance were done via Harrell’s index, Brier score, Shoenfeld residual plots as well as with comparing several predictive model plots. We also made the comparison of Akaike and Bayesian Information(AIC and BIC) criteria as well as log-likelihood estimates. The Tukey-Anscome plot and Quantile-Quantile plot as diagnostic tools were applied to validate the parametric survival models.
Conclusion Based on predictive models HHLA2 and NEBL novel biomarkers were found as statically significant.