Purpose: The primary objective of this study was to address the challenges associated with the early detection and prognostic assessment of gastric cancer, a condition often diagnosed at advanced stages leading to decreased survival rates. Utilizing data mining techniques, we aimed to identify pivotal biomarkers and clinical indicators from extensive datasets, aiming to enhance the accuracy and efficiency of early gastric cancer detection and evaluate potential correlations with prognosis.
Methods: Data from 24,805 patients spanning five years were extracted from electronic medical and laboratory information systems. A comprehensive analysis involving 505 variables, including epidemiological and test indicators, was conducted. Machine learning algorithms like Random Forest and Gradient Boosting, complemented by classic statistical methods, facilitated the identification and evaluation of significant indicators. Performance and effectiveness of the models were ascertained through metrics including the Gini coefficient and ROC index.
Results: The study unveiled four critical epidemiological indicators and 14 test indicators as significantly associated with gastric cancer. The Random Forest model emerged as superior, effectively differentiating between benign and malignant gastric conditions.
Conclusion: Data mining techniques proved instrumental in uncovering significant biomarkers for early gastric cancer diagnosis and prognostic assessment. The findings present an opportunity to refine clinical decision-making, potentially elevating early detection rates and improving patient outcomes. Further studies are warranted for the biological validation and practical integration of these markers into clinical workflows.