This retrospective cross-validation study was conducted to investigate the factors that should be supplemented in AI to assist MTB in postoperative gastric cancer treatment recommendations. Only few studies have analyzed the concordance rate between AI and MTB in postoperative gastric cancer treatment recommendations. Choi et al. reported that stage IV gastric cancer was the only significant factor (6). In contrast, Tian et al. reported that HER-2 positivity was a significant factor, while stage IV was not (8). Surprisingly, in the present study, HER-2 positivity was not significant, but age > 80 years, performance status, and stage IV gastric cancer were all significant factors affecting the concordance rate between MTB and AI (Table 3).
Recommendations pertaining to patients aged > 80 years were less likely to be concordant than those pertaining to patients aged < 80 years (OR 0.175, 95% CI, 0.069–0.441; p = 0.000). Elderly patients have more comorbidities than younger patients, a tendency to refuse chemotherapy requiring hospitalization, and fluctuating general status (9, 10). These clinical features could have lowered the concordance rate in patients over 80 years of age.
The higher the performance score, the lower the possibility of concordance between MTB and AI in postoperative gastric cancer recommendation (OR 0.203, 95% CI 0.072–0.574, p-value = 0.003 for performance score 1, OR 0.191, 95% CI 0.057–0.639, p-value = 0.007 for performance score 2, OR 0.089, 95% CI 0.026–0.301, p-value = 0.000 for performance score 3). In other words, as the performance status score increases, it becomes easier to select a chemotherapy regimen that does not match the AI recommendation. In general, elderly patients have higher performance scores than younger patients (11).
In this study, there was no significant difference in concordance rates between stage II and stage III when compared to stage I (p = 0.367 for stage II, p = 0.673 for stage III). Interestingly, the concordance rate for stage IV was significantly lower than that in stage I as the preference for palliative chemotherapy regimen differed because of the difference in the local guidelines between MTB and WFO (OR 0.017, 95% CI 0.005–0.055, p = 0.017). S-1 plus cisplatin is a commonly used regimen in Korea and Japan, following the Japanese guidelines (12, 13). On the other hand, S-1 is an investigational agent in the NCCN guidelines of 2018, so it is not used in the Memorial Sloan Kettering Cancer Center (MSKCC) following the NCCN guidelines (14). Therefore, since S-1 + cisplatin was not included in the WFO chemotherapy regimen based on the MSKCC data, the concordance rate between MTB and AI was significantly lower in stage IV.
The discrepancies between MTB and AI for each gastric cancer stage and their reasons are summarized in the supplementary tables (Supplementary tables S1, S2, S3, and S4).
In stage I, there were five patients in the non-concordance group. AI recommended adjuvant chemotherapy, but MTB selected surveillance as the treatment option after gastrectomy. Unlike in the West, D2 lymph node dissection is commonly performed in East Asia; thus, observation is recommended in the Japanese guidelines for pathologic stage I after curative gastrectomy (13). MTB judged that since all of them underwent D2 lymph node dissection and were over 65 years of age, there was little benefit from adjuvant chemotherapy considering the complications of chemotherapy.
In stage II, five patients belonged to the non-concordance group. For four of them, AI recommended adjuvant chemotherapy, but MTB selected surveillance as a treatment option, considering their age, performance status score, and comorbidities. They were 68, 85, 69, and 72 years old, respectively, and their performance status scores were all grade 3. They also had comorbidities, such as dementia, chronic kidney disease, and chronic heart failure. For the remaining 1 gastric cancer patient, AI recommended FOLFOX as adjuvant chemotherapy, but MTB selected S-1 adjuvant chemotherapy because of the inconvenience of frequent hospitalization due to an old age of 82 years and a performance status score of grade 2.
In stage III, there were six patients in the non-concordance group. For four of them, AI recommended 5-FU or FOLFOX, but MTB selected S-1 adjuvant chemotherapy considering that they were all over 80 years of age, had a performance status of 2 or 3, and frequent hospitalization was inconvenient for them. For the other two patients, AI recommended S-1, capecitabine + radiation, or capecitabine + cisplatin, but MTB selected the XELOX regimen considering that they were young (49 and 64 years old, respectively), had a performance status of 1, and had no comorbidities.
In stage IV, 26 patients with the S-1 plus cisplatin regimen belonged to the non-concordance group.
AI is applied in various areas of medicine, such as robotics, medical diagnosis, and medical statistics. WFO, an AI system for clinical decision support, is expected to have many advantages, such as increased work efficiency and decreased workload of doctors, decision support for junior oncologists, and treatment selection based on the latest medical research, even in hospitals with few or no experts (15, 16). However, several factors lowered the concordance rate between the medical AI and experts in gastric cancer, resulting in the reduced validity of the medical AI.
First, AI lack a comprehensive understanding of individual patient. The WFO cannot understand the comprehensive status of patients, such as patient compliance and rapport with doctors, comorbidities that may affect chemotherapy, and interpretation of whether biochemical study results are temporary or persistent (5).
However, it is expected that these shortcomings of AI will be compensated for as the technology advances. A wearable device or sensor can check the patient's condition 24 h a day and evaluate the patient's activity. In this way, continuous rather than fragmentary information can be obtained, and accurate individual performance status can be obtained through individual activity history rather than performance scores classified into scores such as 0, 1, 2, and 3 (17, 18). The development and usage of health applications that can be used in portable computer devices such as smartphones are also expanding. If the medical information recorded in a personal device application can be easily linked to the medical information database of a country or hospital, the patient's medical history and comorbidities can also be easily identified and applied in clinical practice (19, 20). In addition, AI assistance for emotional support, such as a robot companion for the elderly with limited cognitive function or activity, is expected (21).
Second, the local guidelines for gastric cancer differ according to race, country, and region. WFO, the medical AI used in this study, is based on the data of MSKCC in the US, but gastric cancer treatment in Korea follows the Korean guidelines, which is closer to the Japanese guidelines than the NCCN guidelines (12). Local guideline differences are caused by differences in preferred surgical methods, the effectiveness of specific chemotherapy regimens or radiation therapy, and the approval status of chemotherapy drugs by country. In the future, besides resolving these local guideline differences, AI may assist in determining the best treatment plan tailored to individual patient with cancer.
Third, there are several economic factors. Owing to the high cost of cancer medicines, an individual patient’s financial circumstances and coverage of public or private insurance affect the selection of the chemotherapy regimen. Therefore, lowering the price of chemotherapy drugs and expanding insurance coverage will enhance the affordability and accessibility of cancer medicines (22). AI increases the efficiency of clinical trials and research, thereby significantly lowering the cost of drug development, which in turn lowers the price of cancer medicine (23). The increase in the efficiency of chemotherapy drugs is also expected to have a positive impact on social discussions and government approval regarding the insurance coverage of anticancer drugs.
This study had several limitations. First, this study was a retrospective and single-center analysis; therefore, it may have been biased. Second, the results of this study were analyzed based on the treatment consensus for WFO and MTB from 2015 to 2018. If we compare the concordance rates between the last version of WFO and MTB based on the latest guidelines, the results may differ. However, because S-1 is still considered an investigational agent in the NCCN guidelines, it is most likely that there will be no change in the concordance rate for stage IV. In addition, the NCCN guidelines for gastric cancer, version 2022, also recommend adjuvant chemotherapy if there is LN metastasis after gastrectomy. Thus, the stage I discordant groups will still be discordant. An additional second-blind test for the remaining discordant group could not be performed because of changes in the MTB members. Third, this study analyzed only the concordance rate as a method for evaluating the validity of AI. It has not been analyzed to what extent AI influences doctors’ decisions, whether better outcomes such as increased overall survival or disease-free survival can be achieved with AI-recommended treatments, and how much time and cost can be saved by using AI. These factors should be analyzed not only for the validation of AI but also in terms of the usefulness and economic feasibility of AI. However, to analyze these factors, a large-scale prospective study is required, and discussions on the ethics and legal responsibility of AI decisions should be conducted before such a study.
In this study, the factors affecting the dis-concordance between AI and MTB were age, performance status, and stage Ⅳ gastric cancer. The effect of gastric cancer stage IV occurred because of the difference in the local guidelines between AI and MTB. And the effects of age and performance status were caused by the absence of AI’s the ability to comprehensively understand individual patients.