A Targeted Review of Breast Cancer Studies of Concordance for an Internationally-Implemented Articially Intelligent Clinical Decision-Support System

Background: Breast cancer has the highest incidence and is the leading cause of cancer-related mortality among women worldwide. IBM Watson® for Oncology (WfO), an articial intelligence-based clinical decision-support system, provides therapeutic options for consideration to cancer-treating physicians. We conducted a targeted review of studies evaluating concordance of therapeutic options offered by the system with treatment decisions by practicing clinicians in breast cancer. Methods: PubMed, EMBASE, Cochrane, trial registers, conference abstracts, and an internal publication database were searched to identify studies evaluating the concordance of system-generated therapeutic options with treatment decisions by individual clinicians and multidisciplinary tumor boards for breast cancer patients reported in peer-reviewed abstracts or papers published in English between 01/01/2015 and 11/15/2019. Results: Ten breast cancer concordance studies (4703 patients) that met the inclusion criteria were identied and analyzed; the identied studies were from China, India, and Thailand. The weighted mean concordance for all studies was 67.4% (SD 16.0%, range 55.0% - 98.0%). The weighted mean concordance of the system with multidisciplinary tumor boards was 88.2%, (SD 9.7%, range 76.5% -98.0%), which was substantially higher than concordance between the system and individual clinicians (61.5% , SD 10.1%, range 55.0% -76.0%). Conclusion: Concordance between system-generated therapeutic options and treatment decisions of multidisciplinary tumor boards or individual clinicians for breast cancer demonstrated overall agreement between the system and decisions of practicing cancer-treating physicians in China, India and Thailand. As multidisciplinary tumor boards may lead to higher quality clinical decision-making compared to those of individual clinicians in practice, the relatively higher concordance of the system with multidisciplinary tumor boards suggests a role for clinical decision support to inform clinicians of evidence-informed treatment options. The to the after Another study

multidisciplinary tumor boards or individual clinicians for breast cancer demonstrated overall agreement between the system and decisions of practicing cancer-treating physicians in China, India and Thailand.
As multidisciplinary tumor boards may lead to higher quality clinical decision-making compared to those of individual clinicians in practice, the relatively higher concordance of the system with multidisciplinary tumor boards suggests a role for clinical decision support to inform clinicians of evidence-informed treatment options.

Background
Breast cancer is a global health problem as it the most common malignancy and one of the leading causes of cancer-related mortality in women. 1,2 Among all invasive malignancies in the United States (US), breast cancer has the highest annual incidence and is the second leading cause of cancer-related death in women. 3 The incidence of breast cancer in the US rose at an annual rate of 0.2% between 2005 and 2011. 4 In 2020, it is estimated that 276,480 new breast cancer cases will be diagnosed in the US, with 42,170 breast cancer-related deaths in females. 3 Moreover, in low-and middle-income countries, breast cancer is a leading cause of morbidity, disability, and mortality as well. 2,5 In 2015, the number of new female breast cancer cases and breast cancer-related deaths in China was 268,600 and 69,500 respectively. 36 Along with a global rise in the burden of breast cancer, a shortage of cancer care services exists. 6 The increasing demand for cancer care, 7 without a proportional increase in services, poses challenges to cancer-care institutions, providers, and patients. Thus, there is a growing need for informatics resources and infrastructure to provide clinical decision-support systems (CDSS) that can help oncologists keep pace with medical advances and rapid practice changes for optimal care of patients with breast cancer.
Over the past decade, results of randomized clinical trials have led to substantial changes in the management of breast cancer. Examples include the avoidance of axillary lymph node dissection in patients with early stage, (i.e.,T1-T2) hormone-receptor (HR) positive breast cancer with 1 to 2 positive sentinel lymph nodes. 8 A 21-gene tumor expression assay now guides adjuvant chemotherapy decisionmaking in patients with HR positive, human epidermal growth factor receptor 2 (HER2) negative, axillary lymph node-negative breast cancer. 9 A contemporary rst-line treatment option for patients with metastatic HR positive, HER2 negative breast cancer is a selective inhibitor of cyclin-dependent kinases 4 and 6 combined with an aromatase inhibitor. 10 As of December 2019, the US Food Drug Administration (FDA) listed 69 drugs approved for the treatment of breast cancer. 11 Moreover, an increasing number of novel breast cancer drugs and multimodal treatment strategies are now undergoing evaluation in clinical trials worldwide. The clinical trial registry, ClinicalTrials.gov, listed 487 female breast cancer clinical trials in December of 2019. 12 Consequently, oncologists and their patients face a complex and fast changing breast cancer management landscape. Through shared-decision making 13 , and based on evidence, best practice, and cost-effectiveness considerations, they must choose among diagnostic and staging tests, therapeutic modalities and their sequence. Unfortunately, limited health resources and variability in breast cancer practice patterns in different regions of the world pose a challenge to the ability of oncologists and their patients to make personalized treatment decisions.
Published studies reporting on the performance and implementation of arti cial intelligence (AI)-based CDSS aiding oncologists in cancer treatment decision making are quite limited. Furthermore, there is dearth of therapeutic AI-CDSS implemented in routine oncology practice, that consider key patient attributes and incorporate evidence from peer-reviewed literature and cancer treatment guidelines supporting oncologists with personalized treatment plan suggestions. IBM Watson® for Oncology (WfO) is an AI-based CDSS 14 that considers select patient data for a given cancer type and provides a set of evidence-informed therapeutic options for consideration by cancer-treating physicians. The options presented by the system are accompanied by published evidence in the medical literature, facilitating personalized, evidence-informed treatment options for patients. Studies have evaluated the acceptability and validity of therapeutic options suggested by this CDSS, as well as concordance between the tool's treatment suggestions and treatment decisions made by cancer-treating physicians at the point of care for a variety of cancer types. 15− 17 To summarize the performance of the CDSS for breast cancer treatment, we conducted a targeted review of peer-reviewed studies evaluating concordance between therapeutic options offered by the system and treatment decisions by cancer-treating clinicians in practice. With our targeted review we hope to advance the eld of informatics applied to clinical oncology by providing knowledge about the performance of an AI-based CDSS aiding clinicians in breast cancer treatment-decision making.

Aims and Research Questions
The aims of this study were to summarize and analyze the results of a targeted review of peer-reviewed published studies reporting the concordance of therapeutic options from the CDSS with individual clinicians and multidisciplinary tumor boards (MTB) treatment recommendations for breast cancer. Our speci c research questions were: -What are the overall concordance rates between system-generated therapeutic options and practicing oncologists' treatment decisions?
-Are there differences in concordance rates between system-generated therapeutic options and individual clinicians-treatment decisions compared to concordance rates between system-generated therapeutic options and MTB-treatment decisions?
-What are the concordance rates in different subgroups of patients with breast cancer, according to age, menopausal status, cancer stage, HR status, HER2 status, molecular subtype?
-What are the concordance rates of system-generated therapeutic options with practicing oncologists' treatment decisions by country?

Studies Eligibility Criteria
We included prospective and retrospective studies reporting the concordance of WfO's therapeutic options with practicing oncologists' treatment decisions in breast cancer. We selected studies published in English in peer-reviewed journals and peer-reviewed abstracts from oncology conferences. We excluded studies reporting the concordance of WfO therapeutic options with practicing oncologists' decisions in more than one cancer type lacking separate results for breast cancer. We did not exclude publications based on the country where the study was performed. We did not nd peer-reviewed published studies reporting on the performance of AI-CDSSs aiding oncologists in breast cancer treatment decision making other than WfO. Table 1 summarizes the study inclusion and exclusion criteria.

Study Selection
We used Endnote 18 to manage and remove duplicate references. Two experienced reviewers (KL, LM) screened titles and abstracts from each unique record for relevancy in DistillerSR (Evidence Partners), a software designed for supporting literature reviews. For all records classi ed as relevant based on the title and abstract, one reviewer assessed the full text report to determine nal eligibility for inclusion using established criteria. For this targeted review, a second reviewer assessed all studies based on criteria presented in Table 1. In cases of uncertainty, a consensus decision was reached through discussion.

Data Collection Process
Two additional reviewers (RH and KD) con rmed and extracted data in those studies that were identi ed by initial reviewers. The additional reviews were conducted using a pre-de ned data collection form that included data listed below.
-Citation information: study title, authors, and year of publication.
-Study characteristics: country where study was performed, name of institution.
-Study design: retrospective or prospective, number of patients, and date of study initiation and completion.
-Clinical context: mean patient age in years, and menopausal status, The system use case (MTB or individual clinician) and version utilized in the study.
-Outcomes: de nition and percent of concordance between CDSS therapeutic options and practicing oncologists' treatment decision, concordance according to breast cancer stage by the AJCC staging (edition used according to the year of study completion), HR status, HER2 status, molecular sub-type, and reported reasons for discordance.

Analysis
We analyzed treatment concordance (agreement) between the system's therapeutic options and treatment decisions made by either MTB or individual clinicians in breast cancer patients. Decisions made by MTB or individual clinicians were de ned as concordant if they agreed with the CDSS treatment options labeled as "Recommended" or "For Consideration." Concordance was calculated based on the number of concordant treatment decisions divided by the total number of patients in each study. Mean concordance was calculated as a weighted average based on the number of patients in each study, assuming no patient was included in more than one study (independent samples) and data were normally distributed. We summarized concordance by patient subgroups according to the AJCC breast cancer stage edition utilized in the study (e.g. 7th ), HR status, HER2 status, as well as luminal A, luminal B, and triple negative breast cancer (TNBC).

Identi cation and Selection of Studies
Comprehensive searching yielded 1502 total unique records ( Fig. 1). After title and abstract screening, we retrieved 211 full text reports. We further screened the 211 studies to identify those reporting concordance between system-generated therapeutic options with treatment decisions of practicing oncologists in breast cancer. Of the 211 studies, we excluded 201 for the following reasons: 89 did not evaluate this system, 14 did not have clinical results, 5 provided only generic information about the device, 56 had different research questions or outcomes, 18 evaluated the system's therapeutic option concordance with practicing oncologists' treatment decisions in malignancies other than breast cancer, 4 used the system outside of the approved indications or populations, 2 utilized attributes not recognized by the system, 19,20 7 were conference abstracts already published in full text, and 6 reported on an ongoing trial where results were not yet available. Of the 56 reports that had different research questions or outcomes, 26 did not report concordance as an outcome, 1 compared concordance of system-generated therapeutic options with real world evidence treatment decisions derived from a cohort of US breast cancer patients, 21 and 1 compared concordance of system-generated therapeutic options with treatments recommended by a breast cancer genomic test. 22 In total, our targeted review included 10 breast cancer concordance studies.

Breast Cancer Concordance Studies According to Use Case
The 10 concordance studies included in this review were retrospective, enrolling a total of 4703 patients distributed across regions of China, India, and Thailand (Table 2). These studies compared systemgenerated therapeutic options with treatment decisions made by MTBs (5 studies) or individual clinicians (5 studies). Across all 10 studies, the mean weighed concordance was 67.4% (standard deviation (SD) 16.0%, range 55.0% − 98.0%). The mean weighed average concordance of the system with MTBs of 88.2%, (SD 9.7%, range 76.5% − 98.0%) was higher than the mean weighed average concordance between WfO and individual clinicians of 61.5% (SD 10.1%, range 55.0% -76.0%).  (Table 3). There were substantial differences in concordance by country and CDSS use case.  Table 4. In 3 of these studies, 25,27,29 concordance was higher for patients with non-metastatic breast cancer, as compared to metastatic disease. In the study reported by Someshakar et al., 25 concordance was higher for stages II and III than for stage I. A lower concordance in patients older than 70 was found in 2 studies. 25,29 The reported reasons for treatment decision discordance between MTBs or individual clinicians with the CDSS varied among studies. Documented reasons were related to availability of treatments recommended by the system, the absence of some clinician-preferred treatments as options in the system, patient preferences, and age ≥ 70 years.

Discussion
Treatment Concordance by Use Case: MTB vs. Individual Clinicians.
To our knowledge, this study is one of the rst to summarize the performance of an oncology AI-based CDSS, measured by concordance of its therapeutic suggestions with treatment decisions made by cancer-treating physicians in practice for female breast cancer patients in diverse, international, cancer care settings. We found substantial agreement between system-generated therapeutic options and both treatment decisions of MTBs as well as individual clinicians in a large number of patients with breast cancer. Our targeted review demonstrate that the system's suggested treatment options agreed with therapies selected by cancer-treating physicians in China, India, and Thailand, countries where we identi ed breast cancer treatment decision concordance studies.
The CDSS exhibited a higher treatment decision concordance with MTBs, as compared to individual clinicians. MTBs provide multidisciplinary team management that generally results in decreased mortality, improved quality of life, and reduced costs in cancer patient care. 33,34 The higher concordance between system-generated therapeutic options and treatments agreed upon by MTB experts is consistent with the quality of therapeutic options suggested by this CDSS. 35 Furthermore, the lower rate of concordance with decisions made by individual physicians as compared to MTBs supports a role for an AI-based CDSS in aiding individual oncologists during the complex clinical task of breast cancer treatment decision making.

Concordance in Different Countries and Breast Cancer Subgroups
According to system use case and country, we identi ed a study conducted at a tertiary cancer center in India with a higher breast cancer treatment decision concordance between the CDSS and MTB, 25 as compared to similar use case studies in China. 23,24,26,27 Likewise, individual clinicians in Thailand had higher concordance with the CDSS in 3 studies 28,30,32 as compared to 2 large individual clinicians studies from China. 29,31 Differences in breast cancer treatment decision concordance between the system and individual clinicians or MTBs in different countries are multifactorial and likely explained by differences in oncology practice patterns at the institutional and national levels.
Successful implementation of a CDSS in medical practice can be achieved by identifying and addressing barriers to CDSS clinical adoption. Successful CDSS implementation relies on factors such as quality, complexity, usability, learnability, transparency, work ow integration, and cost-effectiveness of a CDSS.
Furthermore, there is need for early involvement of end users in the development and enhancement of these systems. Consideration of regional health regulatory requirements and localization efforts to address regional differences in clinical practice are also important. 37 A lower concordance between system-generated therapeutic options and MTBs, as well as individual clinicians in breast cancer patients ≥ 70 years of age ,was reported in 2 large studies. 25,29 Age-related differences in patient and cancer care across China, India and the US may account for the lower concordance reported in older patients. Breast cancer stage at presentation, patient functional status, comorbidity burden, socioeconomic support, cultural values and treatment preferences may play a role in concordance, with a need for well-designed studies to promote evidence-informed management of elderly patients with breast cancer. 38 Prospective studies are also needed to evaluate the technical performance and clinical impact of the system in different subgroups of breast cancer patients.

Evaluation of CDSS Clinical Decision Quality and Impact
Concordance between CDSS and clinical decisions made in practice is limited as a measure of decision quality, which should be based on evidence and best practices. We selected concordance as a metric because many early adopters A blinded panel of cancer experts re-evaluating treatment decisions by both humans and CDSS or measuring treatment decision adherence to guideline recommendations helps reduce bias associated with the source of recommendations. A blinded study comparing WfO therapeutic options and treatment recommendations by individual physicians in breast, colon, lung and rectal cancers at a regional referral hospital in Thailand employed an expert panel of 3 oncologists to evaluate the quality of treatment decisions offered by clinicians and the CDSS. 32 The expert panel, which was blinded to the source of treatment decision, compared treatments recommended by the individual clinicians and systemgenerated therapeutic options, rating 71% of these paired options as either identical or acceptable. 32 The concordance studies we identi ed in this targeted literature review re ect the intended use of the CDSS, which is to support cancer-treating physicians by providing therapeutic options that re ect best evidence and current practice. The system was designed according to a premise that humans are more likely to make optimal treatment decisions when supported by a CDSS. For institutions lacking a MTB, a CDSS may help ll this gap by providing individual clinicians with a choice of evidence-informed therapeutic options. Consistent with this idea, use of the system as part of clinicians overall decisionmaking process signi cantly impacted treatment decisions in several studies. A large cross-sectional observational study measured the impact of the CDSS in treatment decision-making by individual clinicians in 1197 patients with breast cancer in China. 29 Participating physicians, initially blinded to system-generated therapeutic options, saw impact of the system on treatment recommendations in 5.0% of cases after the system's therapeutic options were disclosed to them.

Conclusion
In summary, this study is one of the rst targeted reviews of breast cancer treatment decision concordance studies in women for an internationally-deployed AI-based CDSS. The concordance between the CDSS, MTBs, and individual clinicians demonstrated good system agreement with practicing oncologists in China, India and Thailand. A higher concordance was observed between the system and MTBs than the system and individual clinicians, likely re ecting a greater agreement between multidisciplinary expert consensus with evidence-and guideline-informed therapeutic recommendations of the system. This nding suggests a role of the CDSS in treatment decision support in breast cancer practice. Concordance varied across countries, re ecting regional differences in breast cancer practice. Non-concordant treatment decisions were likely related to physician preference, absence of some prescribed cancer therapies in practice as treatment options in the system, as well as differences in treatment due to factors such as patient or family preferences and availability of social support. Prospective randomized clinical trials are needed to assess the usability, work ow integration, user

Declarations
Author's contributions YA, RH, KD, SW, WF, ID, KR, and GJ were involved in the conception and study design. RH, KD, LM, and KL were involved in the acquisition of data (systematic and targeted reviews of the literature, study selection, and data collection processing). YA, RH, KD, SW, AP, ID, and GJ were involved in data analysis and interpretation (summarization of results from selected studies, discussion, and interpretation of results). YA, AP, RH, KL, ID, and GJ were involved in writing the manuscript. All authors were involved in the review and revision of the manuscript. All authors read and approved the nal version of the manuscript.