On Improving Physicians’ Trust in AI: Qualitative Inquiry with Imaging Experts in the Oncological Domain

Background Although the role of image-based AI in cancer research has been substantial, its impact on the clinical side has been limited so far. Physicians’ trust in AI, and its wider acceptability, has been signiﬁcantly lower owing to its “ black-box ” nature, which raises liability questions concerning its use in the clinical context. Methods To comprehend the barriers in AI’s adoption, and to inform the future discourses in the human-centric and ethical design of AI, we designed and conducted semi-structured interviews with 7 imaging experts in the oncological domain. Results Data saturation was achieved despite the small sample size, gathering concordant emerging needs and recommendations. Our ﬁndings demonstrate the divergent nature and focus of clinical and research practices, with diﬀering AI needs. AI is aﬀorded a peripheral, and yet a crucial role of a “ decision help ”, which can enable oncologists and related imaging specialists (i.e. radiologists, radiation-oncologists and nuclear medicine physicians) to push the boundaries of biological reasoning in treating cancers. Furthermore, our interviewees emphasized the need to embody ethics and liability in designing AI systems, and the development of educational opportunities for AI and cancer experts to enable an integrative vision of image-based AI. To this end, speciﬁc design guidelines are provided to inform both the Human-Centered Design and AI researchers in order to meaningfully address the contextually-sensitive concerns and challenges around the adoption of intelligent interactive technologies in cancer care. Conclusions The existing impact of AI in the clinical practices is limited as compared to the clinical research. In the future, AI is aﬀorded a peripheral role of a ‘decision helper’ which might enable doctors to better understand the peculiarities and subtleties of cancers, and support them in developing novel treatment methods. Finally, in order to develop physicians’ trust in AI and its wider acceptability in clinical oncology, designers would have to address the ethical and liability concerns in relation to the use of AI systems.

practices, routines, and policies, and c) the entanglement of their therapeutic role with AI-powered tools and systems and its scale. In this way, we also seek to outline the potential spaces for design interventions which can better augment and adapt to the prevalent practices in cancer therapy. We contribute to the disciplines of AI and Medical Imaging, and more broadly to the emerging domains of Ethical-AI and Human-AI Interaction, by eliciting contextualized insights about physicians' engagements with AI, their perceptions and projections regarding the future role and impact of AI in oncology, and identifying the potential areas where AI's impact is most desired.
It is worth noting that our use of the term "physician" refers specifically to medical imaging experts within the domain of Oncology (for example, Radiologists, Radiation Therapists, Nuclear Physicians, and Oncologists). In addition, our usage of the terms related to AI (e.g. AI-powered tools, models, classifiers), are specifically meant to be interpreted within the realms of quantitative imaging analysis and image-based (deep) machine learning models.

Related Work
Our research lies at the intersection of human-centered design and cancer care practices, and seeks to comprehend the role and impact of AI in physicians' diagnostic and therapeutic practices. For the sake of clarity and precision, we specifically examine how image-based AI systems, i.e. based on 3D anatomical and metabolic images at the macroscopic scale, are utilized in the successive phases of cancer therapy. We also study physicians' attitudes and projections regarding their wider acceptability in the clinical research and practice. Therefore, in this section, we review relevant works which a) study the adoption of image-based AI in oncology, and b) examine the role of human-centric research in cancer care.

Image-Based AI in Oncology
The systematic review and meta analysis of AI and radiomics studies by Sollini et al. [2] provides an interesting snapshot of the field in early 2019. While filtering papers with the highest quality score (based on the QUADAS-2 criteria), they analyzed a total of 171 papers. They found that 147 (86%) papers were focusing on oncological applications among which 83 (56%) of them concerned brain and lung cancers, mainly for predictive outcome modeling and biological characteriza-tion. They observed an increase in the overall quality of clinical studies over the years. However, all approaches were still far from clinical adoption, where the blackbox effect, limited sample size, ethics and liability were pointed out as the major hindering factors.
More recently, Reyes et al. [8] focused on alleviating the black box effect to improve interpretability, specifically in the context of deep-learning models. Furthermore, the authors categorized the diverse application fields, and elaborated the ways in which interpretability applies for the varied application fields such as lesion detection, computer-assisted diagnosis, prognosis, and so on. Pianykh et al. [7] suggested that the performance of static AI algorithms degrades over time, owing to the naturally occurring changes in local data and environment. As a solution, they introduced the principles and early applications of continuous learning for AI algorithms in daily radiology routine. One key message is that the adoption of AI in clinics should not stop after the replication of static models imported from clinical research, but rather to continuously feed and monitor the model with the clinical data and observe how it performs in this particular clinical environment.
In the specific context of predictive models in oncology, Gatta et al. [6] highlighted the importance of "holomics" (also called "medomics"), i.e. integrating Radiomics with other omics (for example, genomics, proteomics) to increase the relevance of models and facilitate their migration to the bedside. Considering optimized radiation therapy planning, Thompson et al. [9] pointed out the necessity to adequately make room for AI including the education of end-users, data availability, and potential changes in clinical workflows for a serene integration of the models to optimize intensity modulated radiation therapy.
In this article, we particularly focus on image-based AI predictive models for oncology. They include both "hand-crafted" radiomics and deep learning models addressing the following five application categories defined by Reyes et al. [8]: a) computer-assisted diagnosis and/or staging, b) prognosis, c) radiation therapy planning, d) computer-assisted monitoring of disease progression, and e) triaging.

Human-Centric Research in Cancer Care
Seeking to improve the overall quality and experience of cancer care -not just for the patients who are being treated but also for their family and friends-numerous sociological studies have been conducted in the recent past. These contributions have examined the varied facets of cancer care facilities, such as a) patients' overall experience with cancer care [10,11], b) perceived sense of stigma, guilt, and depression amongst patients and how it impacts their psychological well-being and social interactions [12,13,14,15], c) disparity in the quality of life amongst cancer patients of different ethnic backgrounds [15,16,17], d) the role of doctor-patient interactions and communication strategies in patients' perceived well-being [18,19], and e) the impact of established decision-making practices and the opportunities and challenges for technological interventions [20,21,22].
Thematically different from the domain of Social Sciences, and closely associated to the study of socio-technical user experiences and the design of interactive and intelligent technologies, is the domain of Human-Computer Interaction (HCI; also referred as human-factors in computing). Extensive research has also been conducted within HCI on the particular role of technology in patients' everyday experiences, and their journey while living with a cancer. These contributions can be classified into several themes and application areas, for example, a) the design of personalized and data-centric health information management systems for patients and healthcare workers [23,24,25], b) technologies enabling the continued clinical education of caregivers [26] and fostering communication between oncologists and their patients [27], c) facilitating behavior change amongst cancer patients [28], and d) interactive tools assisting cancer patients to seamlessly navigate the overwhelming amount of medical, financial, emotional, and mental-health challenges [29,30].
While a significant amount of HCI research has focused on adult patients, a recent work by Warren [31] also examined the means of utilizing 'social' and 'playful' technologies to improve the emotional and social well-being of child cancer patientswho often feel a sense of isolation and loneliness, and develop beliefs of an abnormal childhood.
A significant amount of research within Social Sciences and HCI has focused on the patients' experiences and their interactions with oncologists and caregivers. However, there is a gap in our understanding of how physicians within the oncological domain use and experience AI-powered systems, what are their concerns regarding the increased permeability of AI in clinical context, and the ways in which they navigate around the challenges posed by these technologies. This is the subject of our qualitative inquiry.

Methods
In this article, we generally aim to inquire the existing role and impact of AIpowered technologies in cancer diagnosis and treatment, as well as to identify the potential areas within this domain where intelligent and interactive technologies are still to make an impact. These AI-powered systems are increasingly based on the computerized analysis of imaging modalities, in particular Computed Tomography (CT), Positron Emission Tomography (PET), and Magnetic Resonance (MR) images, which is often referred to as "quantitative imaging analysis" in the domain of oncology. As mentioned earlier, there exists a disconnect in the current focus of AI research and the existing protocols and practices within medical institutions with established cancer treatment facilities. Moreover, this disconnect manifests into overlooked aspects of medical professionals' practices and experiences, which can be better served through the human-centric design and development of intelligent tools and technologies. These overlooked aspects primarily concern with the profound differences between the clinical and research practices, and the differential need for support for these divergent practices through intelligent, interactive, AI-powered tools.
Particularly, through our research, we aspire to (a) develop a comprehensive understanding of established protocols in cancer treatment facilities, including both the 'clinical' and 'research' aspects; (b) scrutinize the role and impact of AI-powered technologies in cancer diagnosis and treatment, especially their perceived utility, scope, and acceptability by physicians; (c) bring insights about the physicians' interactions with technologies and patients, and the collective entanglement of diverse medical specialities in the effective treatment of cancers; and finally (d) depict the space of design possibilities where intelligent and interactive tools may support the physicians in meaningful ways. In order to address these inquiries, we conducted a qualitative -semi-structured interview-study with 7 physicians belonging to different sub-specialities.
We, initially, intended on conducting face-to-face interviews with the physicians together with a field study to gather insights about the established workflows and the varied resources (medical images, reports, and instruments) that are conventionally used during the course of cancer treatment. However, the constraints imposed following the onset of the COVID-19 pandemic steered us to conduct the study over video conferencing, and also curtailed our efforts to conduct a field study. Despite these constraints, the design of our interviews sought to capture the aforementioned aspects of physicians' work practices, experiences, and interactions in a fine-grained manner.

Participants
Initially, a formal email of invitation was sent to physicians in the Centre Hospitalier Universitaire Vaudois (CHUV) in Switzerland, who are involved in the diagnosis, treatment, and rehabilitation of cancer patients, and are simultaneously engaged in research activities. In particular, we reached out to medical professionals whose expertise lie within the specialities of oncology, nuclear medicine, radiology, and radiation oncology. These specialists predominantly use different imaging modalities to track the evolution of tumors and metastases. In addition, they also incorporate algorithmic means of analyzing a large amount of image data during the treatment process. Moreover, these experts are also accustomed to leveraging AI in their clinical and research activities. This was the primary rationale behind involving them in our study. The invitation letter contained 1) the general objective of our research without revealing the specifics of the questions we intended on asking in order to prevent biasing their responses, and 2) the description of our research approach including the approximate time required on the participants' part. In addition, the participants were also informed that we will properly acknowledge them in our research publications (along with their affiliation) and that there was no financial compensation.
In addition to the aforementioned specialities, surgery, immunotherapy, chemotherapy, etc. are other sub-specialities which are involved in the treatment of cancers.
However, they were not included in our study because they are less likely to interact with AI systems in the course of treatments. Even though these domains were not part of our study, still it is worth noting that the established medical protocols require a collaborative effort amongst diverse range of specialities to decide on the most effective and personalized treatment for each patient. Capturing these collaborative aspects was one of the key focus of our inquiry.
Upon receiving an affirmative response, we requested our participants to agree to an informed consent document clearly stating the manner in which we will use and analyze the collected data, and also sought their consent to record the video of the discussions during the interviews. Furthermore, a few international experts were recommended to us by our participants, who were also invited to participate in our study. In total, we interviewed 7 experts, 6 from the CHUV (Switzerland), and 1 international expert from Cleaveland Clinic (USA). Six interviews were conducted over the Zoom video conferencing tool, and one expert responded to the interview questions in writing, over Google Docs, due to time constraints. Table 1 lists the participants and their sub-specialties.
It is worth noting that the low sample size of our interview study can be attributed to following factors: 1) our interview study involved cancer experts who are extremely busy professionals, with additional responsibilities besides cancer care during the (COVID-19) pandemic, and 2) we focused on physicians who frequently use AI in their clinical and research routines and are actively contributing to the frontier research in medical imaging and oncology. Despite the small sample size (and a potential limitation), we achieved data saturation in terms of gathering concordant experiences with the use of AI-powered systems, emerging needs and expectations, and recommendations for a ethical and responsible development of AI.

Procedure
The interviews were conducted in a semi-structured manner, and the interview questions were organized into the following three themes: 1 Role and Background: The initial set of questions were intended to "break-theice" and asked participants about their medical speciality, their role in the diagnosis and treatment of cancers, and the approaches they employ in the treatment process. In addition, we registered their experiences and opinions about precision (or personalized) medicine and the role AI is playing in its realization. 3 Impact and Scope of AI: The last part of the interview was focused on the participants' interactions and experiences with AI systems (manifesting as either algorithmic procedures or physical instruments). We asked questions which could provide us with fine-grained insights about participants' practices which are currently supported by AI, and the diverse ways in which technologies impede the seamless attainment of their objectives. In addition, we elicited participants' unmet technological needs in their clinical and research practices, as well as gathered their perceptions surrounding the design of responsible AI-powered technologies embodying the notions of Fairness, Accountability, Transparency, and Explainability (FATE). Finally, the participants were asked to project the ways in which AI may shape the future of cancer treatments.
The interviews lasted for approximately one hour and were video recorded, except for one interview where the participant wrote the responses to our questions on a shared Google Doc. Next, all the interviews were transcribed by one researcher, followed by open coding with recurrent topics and themes that emerged from the repeated reading of the interview transcripts by three researchers. After the open coding phase, the co-authors organized two sessions to interpret and consolidate the codes into relevant categories and themes. During this phase, relevant segments of the interviews were aggregated into categories and further compared in order to identify common approaches to discussed topics among the participants.

Results
Next, we will describe our analysis of the the semi-structured interviews, which is organized in the following sections along the four emergent themes.

Clinical vs. Research Routines
In this section, our intention to provide a detailed account of the differences between the clinical and research routines is to establish that these diverse work practices afford for different spaces of design possibilities, which should be accounted by human-factor, design and AI researchers when designing intelligent tools for this context.
The interviewees in our study work at university hospitals with well established and reputed cancer treatment facilities. These centers serve a dual purpose: first to ensure an optimal and customized care path for cancer patients which is "highly The clinical practices can be characterized by their "curative intent" and an "individualistic scope" that puts the spotlight on a single patient, whose treatment is very specific and is based on numerous attributes such as the cancer type, anatomical and physiological peculiarities, and medical history. Depicting the humane nature of the clinical activities, P05 stated that as "we are not treating a disease [but] we are treating a human being", where "we try to be very proactive and very preemptive in the management of side effects". Moreover, the temporality of interaction with the patient is significantly higher and may range from a few weeks to several months, depending on the type of treatment (surgery, radiation therapy, or systemic treatment involving chemotherapy or immunotherapy) and the severity of the disease.
This high level of interactivity with the patients further necessitates the establishment of an inclusive environment which does not just impose the most suitable treatment onto the patient, but engages them in the decision making process -"the patient is always part of the solution" (P01).
On the other hand, the research practices are more dynamic in terms of outcome and impact. They manifest through the comprehensive analyses of cohorts of patients involving the development of statistical and predictive models which amalgamate numerous streams of data coming from different diagnosis and treatment modalities. While a significant proportion of research activities are retrospective, these activities require a minimal to no contact with the patients. In addition, the research outcomes although serve a long-term objective of advancing the domain of cancer therapy in the form of published works, they may also serve a more immediate purpose of "turning these [predictive models and data-centric insights] into clinical action" (P04). However, this latter case corresponds to specific patients "who are failing standard of care therapies" (P04), and may require novel treatments which are highly personalized for the patients and based on the analysis of their molecular and microscopic profiles (such as genomics, proteomics, histopathology).
These fundamental differences between the clinical and research practices also manifest into differential needs regarding the nature and type of digital tools that are employed by the physicians, particularly, their role and relationship with AI.
The research activities primarily entail the analysis of "100s or let's say 1000s of features which are obtained from one exam or one organ", and consolidate the features corresponding to "imaging, metabolic, and morphological data" (P02). In order to accomplish the analysis of this immense volume of data, P04 elaborated that his team "writes a lot of [their] own code in R and other languages, and [they] use a little bit of Google tool box ", and employ a "mixed bag approach", where "there is not one major library (publicly available APIs) that [they] haven't used ".
On the contrary, the clinical practices employ "very simple tools that enable a kind of home-made analysis", which "is strange because when we go to [medical conferences] we always see 3D images, in color, with multi-parametric quantitative analysis, and it's really not how it is in the clinical routine" (P03). She further illustrates that analytical practices on the clinical side, although integrate data from multiple sources (for example, CT, PET, and MRI), still "it's a visual analysis and not a quantitative one" (P03). "We have some very basic parameters that probably are still the most important, size of the tumor is very basic but it works. The tumor border is very important in oncology and quite difficult to assess because it's very subjective, and currently we don't have the possibility to do an automatic segmentation of the whole tumor. Another parameter is vascularization for some specific treatments, and currently we define vascularization subjectively, only through our eyes, and say if it is hyper-vascular or not. Similarly, by looking at images we say if the tumor is aggressive or not." (P03). Underpinning the previous argument, P07 added that "[diagnostic] decisions are mainly taken based on morphological analysis of images, while quantitative parameters serve as complementary information", and since the "interpretation of quantitative parameters may be challenging, [he] uses cutoff values that have been validated in the literature". Additionally, P07 acknowledged the limitations with the use of thresholds and accounts for them in his diagnosis because "cutoff values are often determined in very specific situations and cannot be extrapolated to all the cases we see in daily practice". Finally, despite the advancements in image analysis approaches, the clinical analysis of medical imaging data is still "archaic" and done "in a cross-sectional manner " because "we know the anatomy, physiology, and their normality" (P01).
Owing to the vast amount of multi-modal data that needs analyzing for each patient on the clinical side (for example, each PET/CT exams generate approximately 2000 images), combined with the need to serve a great volume of patients (40-50 patients per day) adds a significant amount of workload for the physicians and the supporting staff. Still, the preference to rely on one's prior knowledge and experience, as opposed to autonomous AI-powered tools, in preparing a diagnosis and prognosis can be partly attributed to the notion of responsibility. "The directors of hospitals and national health agencies require that a patient has the same care in every hospital, and we cannot base a diagnosis or treatment on two technologies that are not available everywhere ... That's why we cannot use some very innovative technology that is only available in one part of the world. So, it's a political health question and not a question of performance or utility" (P03).
In the following sections, we will utilize the distinction between clinical and research routines as a design lens to ground the collaborative and decision making constructs, as well as the notion of ethics and responsibility in relation to AI-powered technologies.

Decision Support: Reality vs. Expectation
In clinical practice, making a decision on the appropriate treatment course for each patient entails the assessment of numerous aspects and the synthesis of multiple information sources as illustrated previously in Section 4.1. In addition, choosing from amongst the diverse set of treatment modalities (e.g. surgery, radiation therapy), and more importantly, deciding on their order or combination which is highly customized for each patient is a difficult task. "We know now that cancer is a chronic disease, that means that the patient will live a long time with it. So, we need to decide which kind of treatment should be the first, second, or third." (P03). As a result, Multidisciplinary Tumor Boards (MTB) have become a norm, and an essential practice for taking decisions in cancer treatment facilities worldwide.
All our participants unanimously affirmed the centrality of MTBs in clinical decision making concerning treatments and follow-ups. "Since the last 10 years, all decisions are made in MTB meetings" (P06). P03 further added that "we cannot decide alone on a treatment plan, so the MTB meeting is mandatory for each new diagnosis of cancer, for each recurrent disease, and for each modification of the treatment". In addition, "in big institutions, [MTB] meetings are something that's very traditional and well established " (P05). Furthermore, depending on the patient's status, the nature of the discussions may vary but their multi-disciplinarity is maintained, as illustrated by P07: "immediate informal discussions happen during the emergency phase to speed up the initial management and treat potential life threatening situations, however, when the patient is stabilized, and after the initial workup has been completed, the discussion is more formal in one or several MTBs".
Owing to their multi-disciplinary and collaborative nature, different specialities are represented within the MTB meetings, such as surgery, oncology, radiology, radiation oncology, internal and nuclear medicine, and pathology. In addition, depending on the peculiarity of the disease other specialists such as neurologists, pneumologists, cardiologists, and pediatricians may also be present during these meetings. Furthermore, "major disease groups have their meetings weekly" (P06).
Moreover, "[MTB] meetings are not patient based, and we generally discuss 10-20 patients in each meeting" (P05). In response to a question about the established practices within these MTB meetings, our participants provided us with the following account. Before each meeting, the participating members draw a diagnostic workup of the concerned patients, which is grounded in their medical specialization and based on the particular examinations they have conducted. During the MTB meeting, the members present these analyses, either in the form of a report or by assembling a set of images from the hospital's Picture Archiving and Communication System (PACS). An assistant is also present in these meetings to register the minutes of the ongoing discussions and the collective decision.
In response to our questions aiming to comprehend the rationale behind the practice of MTB meetings, our interviewees furnished several insights. Illustrating an example, P01 stressed the importance of 'collective validity' of the most efficacious therapy for the patients: "to make the MTB most efficient, each speciality has to say that this (referring to the example) therapy will work ... in a way, every speciality is then hand-in-hand doing the best they can for the patient". This collective endeavor may also lead to "reduced errors" (P06) since multi-disciplinary collaborative analysis could complement individual assessments and assist in the development of a robust treatment plan. Extending his argument further, P06 explained that if each specialist pursued his/her therapy -"a surgeon would like to operate every time, a radiation oncologist always wants to irradiate, and medical oncologist always wants to do chemotherapy", the likelihood of reaching a better judgement would be low as this individualistic approach does not consolidate all aspects of a patient's wellbeing and medical history. Also, the MTB meetings provide a framework for enabling standardized and accepted treatments for the patients while mitigating the adverse effects of some investigational treatments: "I am a radiation oncologist for 25 years, so, I mean, I can feel things, OK! And I can sometimes exaggerate. So, MTB meetings lets you to do, let's say, standard treatments" (P06). Moreover, P05 maintained that the MTB meetings also provide a "great educational opportunity" that is afforded by "the cross-talk between disciplines and to know how others approach this problem". Finally, in terms of logistics, the MTB meetings also make things easier for the patients: "Because if you don't have this common language or cross-talk, it's gonna be very easy for the patient to be lost in between sub-specialities.
And the patient's appointments will not be coordinated. So, we are trying to spare the patients this hassle, the logistics of coming back-and-forth to the hospital " (P05).
Finally, the prevalence of MTBs combined by the collective validity of their outcomes afford them the status of legitimacy. Consequently, decisions made in MTBs are recognized and often required by insurance companies, as illustrated by P06: "In some countries, if there is no report of a MTB decision, the patient care, the insurance will not pay the treatment".
With regards to the decision making practices on the clinical side, particularly in relation to MTBs, our interviewees acknowledged that the role and impact of AI is currently limited and minimal. Although, some form of predictive modeling or pattern recognition might be employed to prepare an analysis for presentation within MTBs, still, its scope is relatively small and applies to rare diseases and novel treatment modalities. As a result, hospitals are constantly "coalescing and curating huge datasets of 100s, maybe 1000s of patients" (P04, P05) to allow for retrospective analysis and to support future decision making. In order to a) extend this data-informed decision support within MTBs, b) increase its accessibility to a higher number of patients, and c) foster the development of a synergistic educational environment for both physicians and AI experts, P04 who is leading the Precision Oncology program in his institution, is involving AI researchers as an additional speciality within the MTBs.

Attitudes and Projections regarding AI
We asked the interviewees about their perceptions regarding the current role and impact of AI in cancer care, and how it may evolve in the future. In particular, the ways in which AI-powered technologies will shape their work practices and their relationship with the patients. They expressed a positive disposition towards the utility of AI and its potential in transforming cancer treatments and clinical practices in the future -"the most interesting [AI] tools are going to come and help us in the future" (P01) which will "facilitate faster screenings of patients in noninvasive manner " (P06). P04 also held a similar attitude regarding AI's potential: "I think AI has clearly revolutionized oncology already, but the perspective for the future is much bigger. I think we are really in the infancy of what AI can deliver for such a domain". At the same time, our interviewees acknowledged the limitations and shortcomings of AI and discussed the potent domains where AI-powered tools would be most beneficial.
The role of AI has evidently been more pronounced on the research side as compared to the clinical practice, especially in regards to the realization of precision (or personalized) medicine. P05 argued that "we have seen a lot of papers which promise that AI will be personalizing medicine, but to be honest, so far it has been far from practicality". However, despite their promised performance, these AI models have failed to scale on the clinical side, even when "applied to very simple and basic problems such as diagnosing whether a pulmonary nodule is benign or malignant" (P05).
As a result these models do not inspire confidence amongst the physicians: "although it's tempting to publish papers on some sophisticated cool stuff and important on a conceptual level, in my opinion, I want something that I can use in the clinic and can talk to our patients with confidence that this model can be trusted " (P05). Another problem with the seamless amalgamation of AI in the clinical workflow is the lack of effective means of integrating multi-modal data and the absence of meaningful ontologies. P04 illustrated this problem by stating that "when you have to integrate the treatment plan, response to treatment, imaging, genomics, proteomics, and pathology data, we simply don't have a system ... and to say that a system will tell patients what to do and it's done, is a bit of science fiction". He further added that "I have worked from the data to the patient with every little step, and it's still impossible to plug an AI system and let it spill out something new because you have to know where the data is, how was it captured, and what are the semantics" (P04).
The interviewees argued that the means of addressing the aforementioned problems with the current state of AI, and to lower the barrier of its adoption by physicians is to 1) promote educational programs, not just for the physicians, but also for AI designers to develop a better mutual understanding of each other's practices, experiences, and needs; and 2) enable physicians to conduct validation and reproducibility studies with novel AI algorithms to improve their likelihood of adoption in existing clinical practices.
All our interviewees recognized their familiarity and knowledge about AI (owing to their strong background in statistics), and some even apply advanced AI concepts in their research practices. Still, they emphasized the need for embodying the fundamental knowledge of AI in the education of physicians: "you've probably heard that AI will not replace physicians, but imaging specialists who are using AI will replace the ones who are not" (P01). P05 further added that "I think it's an unmet need and a must for radiation oncologists to have a minimum acceptable understanding of advanced statistical and AI methods". Extending this line of argumentation, P04 stated that medical and AI communities have to learn to understand each other better: "It's very important that doctors understand what is AI, because it's fair to say that the level of familiarity of the medical system with AI is not huge, and people can be fooled in thinking that AI is actually smart, but it's clearly not. It's extremely powerful, but not intelligent and can go wrong in every wrong corner possible. Secondly, the problem is that AI does not know enough about doctors". This two-way educational initiative will result in "super doctors" because the physicians will be more equipped to employ AI "as a decision help" in their daily work "to have a more meaningful impact" (P04). Simultaneously, such an approach will also pave the way for a human-centric design of AI-powered technologies which are more attuned to the practices and needs of physicians (P01, P02).
In addition, AI was referred as a "tool to improve our basic understanding because we are reaching a plateau in our understanding in the cancer field " (P02), which can enable physicians to "push [their] biological reasoning quite far " (P04). Furthermore, physicians are trained to examine images visually for diagnostic purposes, and the massive amount of images they encounter on the clinical side makes a "part of [their] work very repetitive" (P01). Therefore, AI tools will be much needed in this space because "algorithms can look directly at signals and get to some kind of diagnosis" (P01), as well as "help [physicians] to see things which [they] are not seeing because [they] cannot put as much information in [their] analysis" (P02). In this way, "machines can help [physicians] by showing them where to look for new and diverse features" (P01), and "stimulate [them] to think more" (P02).
Another means of developing trust in AI based systems is through extensive validation studies. Both P03 and P07 suggested that in order to be comfortable with AI based systems, and in order to develop trust in AI's capabilities, they need to compare the results coming out of an AI system with their own analysis. Moreover, the "black-box" approach of decision making might hinder with the development of trust in AI as stated by P07: "I am personally less confident with black-boxes" because drawing conclusions about how a certain outcome was produced based on raw data is not straightforward. In addition, AI systems when trained on a certain population might fail to work for other geographical regions, owing to the differences in population. P03 exemplified this aspect by citing her own research -an imaging technique for diagnosing breast cancer called 'contrast mammography'-and stated that "I was very comfortable to use it and to believe in it, but when I went to [Asia] and I tried this approach, it failed, because [Asian] people don't have the same breasts as [Europeans] and we could not extrapolate the results we obtained in [Europe] ". Furthermore, P01 and P02 expressed their concern about the mismatch in the speed of advancements in AI domain and the time it takes to conduct validation studies: "AI algorithms are changing so fast that no one can really take the time and make the validation studies which would be necessary to know the performance and to be able to use it" (P01) and "it's going so fast that by the time you use [AI systems], they are already outdated " (P02). One possible solution to this differing temporality problem would be to subject algorithms to the same certification standards (for example, U.S. Food and Drug Administration, FDA) as other medical appliances (P01, P06).
Finally, our interviewees unanimously concurred that despite the advancements in the AI domain, physicians will not be replaced because "in case there is an error, who will be responsible" (P06). Citing an example of practices involving physicians crowdsourcing medical images of patients to developing countries for analysis and reporting, P06 specified that delegating decision making to AI is similar from the legality and liability perspective.

Responsibility, Transparency and Ethics
The consequence of physicians' interactions with AI systems, particularly in relation to the notion of ethics and responsibility, often manifested in our interviews in the form of a phrase "in case there is an error, who will be responsible" (P06). This conditional expression underpins two nuanced but entwined observations: 1) the skepticism in the capabilities of AI systems, particularly on the clinical side, is grounded in the potentially unfavourable outcomes for the human life, and 2) the impact of failures on the part of AI systems outweighs their perceived strength and utility. Basing his argument in human rights and elaborating on the aforementioned observations, P02 stated that "a patient has the right to be here, and to be treated, we will do our best to provide the maximum attention for medical care", and as a consequence you cannot "put your trust in something that is not trustful " or "bypass the doctors". Concerning the use of AI-powered systems, P02 further argued that "such systems can be present at all times, but, they should always be supervised by the [physicians] ... I cannot imagine, I prefer not to imagine a system where we put the patient on the scanner, AI does the diagnosis, and robot does the surgery".

Sustained interactions between the physicians and patients embody transparently
communicating the findings from the diagnosis, engaging patients in a discussion about the effective treatment plan, and explaining every detail of the treatment and its impact (also to "debunk some myths" (P05)). However, with regards to AI systems, in particular the ones employing deep neural networks, the desired notions of explainability and transparency are in conflict with the design and functioning of these systems as illustrated by P03: "we are not confident using a black box because we don't like it and we don't understand how it works". Highlighting the difficulties around the comprehension of some of the features used to train the deep learning algorithms, and how they relate to biological responses, P05 stated that: "papers which use generated deep learning radiomics features have a lot of features like wavelet transformations, which as a clinician I struggle to understand, what is the significance and how to correlate this to some tumor features". Moreover, "hand-crafted features, for instance the ones related to the texture can be easily correlated to tumor heterogeneity or the central necrosis, that is something [physicians] understand and can verbalize and explain to the patient" (P05). Owing to these concerns around the use of black boxes in the clinical routines, AI-powered tools are perceived as auxiliary tools meant to re-examine physicians' assessment.
"Currently, AI is an additional tool which can provide an additional parameter to help confirm something that we have already assessed subjectively" (P03). This argument is aligned with P04's assertion that AI's future on the clinical side is that of a decision help (see Section 4.3). However, "the problem arises when [physicians'] assessment is opposite to that of AI, in such cases it is not easy to believe in the outcome of AI " (P03).
In order to address the aforementioned problem of selecting meaningful and reproducible features to train AI algorithms, which are comprehensible for the physicians and correlate to biological functioning of tumors, P05 reported contributing to a standardization initiative known as Imaging Biomarker Standardization Initiative (IBSI) [32]. He elaborated that "I think we need to understand the features more, and make sure, for instance, that the features we are extracting are reliable and reproducible" (P05). Similarly, P04 stressed that causality and not correlation must be accounted for when selecting the features: "There is a lot of biology to be understood here. I mean, you can look at basically an indefinite number of covariances, but the secret is to really look at strong signals and consider those that you can hopefully connect with a reasonable biology ... to not be fooled just by correlation but to seek for causality". These arguments surrounding the standardization of features and their influence on biological processes were recognized by our interviewees as essential, and some (P02, P03, P04, and P07) justified that they fall under the banner of 'quality control' -"not just of the data quality, but also of what the algorithm is predicting" (P04).
Finally, in response to the earlier question of responsibility, and who bears it in case of a mistake, P01 answered that "a physician is legally always responsible" and "each hospital protects itself " (P06). Extending this question to include treatment decisions made by AI algorithms, P01 argued that "it's a bit like autopilots in modern airplanes which are perfectly capable of taking-off and landing, but a pilot is still there to supervise" Furthermore, implying that the aforementioned is an open-ended question, P01 provided another example of Autonomous Vehicles (AV): "If you would have one AV hitting another one. Who is responsible for the crash?
The vehicle which had the latest update, or which does not have the latest update? " In the same context, another example was cited by P02 of Mars Rovers employing "decade-old technologies, which are now outdated, but still used because they have been extensively tested and can be trusted ". These examples, although symbolic of the problems surrounding the ethics behind using AI systems to treat humans, also underline the complexity of attributing responsibility when accounting for the collective outcome ensuing from a Human-AI entity, which in turn has implications for the wider acceptability and trust in these systems.

Discussion
In this section, we discuss the presented findings and explain their implications for our project context and the overarching role of AI research and development in cancer care.

Clinical vs. Research Practices: Opportunities and Challenges for Design
The clinical and research routines within the framework of cancer care are inherently different. This disparity can be attributed to the divergent nature of their respective intentionality and scope. The clinical practices are guided by their curative intent and their focus on individual patients. On the contrary, research practices seek to advance the domain of oncology by adding to the existing knowledge about cancers and their treatment in novel ways. Although, seemingly different in both nature and aspirations, the clinical and research routines do interact in synergistic manner -driving the mutual evolution of cancer care by making it rapid, efficient, and innovative. Our intention to delineate the clinical side from the research side is to provide qualitative insights into the differing needs amongst physicians, and that "one size fits all " is not an appropriate approach when designing intelligent technologies for this specific domain: "In the end, everyone is unique, once we understand it, we need technologies to search for this uniqueness" (P02).
The recurrent emergence of this distinction in our interviews led us to utilize it as a design lens to ground our findings, and to inform the future discourses in the development of AI-powered tools which are more attuned to the clinical and research routines. On the one hand, the research practices are dynamic and leverage a plethora of tools (to analyze medical data of cohorts of patients) which are freely and easily available. The choice of an appropriate tool is spontaneous, and consequently, in some cases the physicians may have to spend a significant amount of their time to acquire the necessary skills to utilize these tools (for example, writing a script in R or Python to extract ad-hoc image features). On the other hand, the need on the clinical front is to serve a substantial volume of daily patients. Therefore, the design of AI-powered technologies should be integrated into the existing digital infrastructure comprising of a Picture Archiving and Communication System (PACS) and instruments, and must enable physicians to rapidly make inferences based on the anatomical and physiological peculiarities of lesions and tumors.
In addition, besides being user-friendly, the tools on the clinical side should enable physicians to a) rapidly identify, examine, and if need be adjust the regions/volumes of interest (e.g. lesions, metastases, tumor boundary) to conduct a detailed analysis, b) seamlessly combine data from different modalities as well as published research works to enable a comprehensive analysis of patients' disease, c) seek collaborations around the analysis of medical data, and d) facilitate communication with the patients through the sharing of data-centric insights about the diagnosis, treatment, and any perceived side-effects. Furthermore, designing specialized AI tools which are well integrated within the clinical workflow, will afford physicians the freedom from the repetitive manual tasks which currently consume time, and inhibit them to push the frontiers of "biological reasoning" (P02) in treating cancers.
Designing for clinical routines also poses significant challenges, which on the one hand, decelerate the rapid proliferation of innovative tools and discourage disruptive changes in the cancer care; on the other hand, they ensure the adoption of standardized tools which have been certified by regulatory bodies such as FDA. These challenges primarily ensue from the humane side of clinical practices which prioritize human-rights and accessibility to effective and reliable treatments for the patients.
In addition, the trust in physicians for the novel AI-powered tools is currently low owing to inherent bias and lack of explainability on the part of "black-box" based systems. Consequently, in order to develop trust amongst physicians and to improve the wider acceptability of AI-powered tools, designers could a) engage physicians in the validation of these technologies whereby the physicians can test these tools in real-world settings with their own 'contextually-' and 'culturally-sensitive' data, b) embody the notions of transparency and explainability -opening the black-boxwhile designing how these tools operate and interact with physicians, and c) leverage features which relate to relevant biological process, and allow physicians to interact with the input feature space to afford for better interpretability of the predictions.

Decision Maker vs. Decision Help: Situating AI
A key component of the clinical practice is deciding on the suitable, effective, and personalized treatment plan for each patient. Our interviewees revealed that these decisions are ceremoniously taken in MTB meetings. Owing to this established decision-making framework, the role and impact of AI in clinical decision making practices is currently limited. The (overstated) expectation on the part of AI to examine a patient, form a diagnosis, and suggest a treatment -visualizing AI as a "decision maker "-is far from fruition and was referred to as "science fiction" (P04) and "unimaginable" (P02) by our participants.
However, AI was afforded a slightly more realistic, modest, and a peripheral role of "decision help" in the decision making process by our interviewees. In this role, AIpowered tools are deemed to support physicians' in synthesizing the large volume of multi-modal data, rather than replace them altogether. In addition, AI could assist physicians in building a robust diagnosis of the patient, and gain confidence in making the prognosis which is more likely to realize. One of our interviewees (P04) affirmed that AI will pave the way for the physicians to become "super doctors", who are well equipped to leverage data-centric insights to find novel treatments (for example, precision medicine) and in a way "push the boundaries of biological reasoning". He further added that in order for AI-physician collaboration to be a successful one "this pair has to learn about each other " (P04), implying a need for the development of educational programs for both physicians and AI researchers to ensure that a) physicians understand better the capabilities and limitations of AI in order to be able to effectively utilize it in their work and develop trust in it, and b) AI researchers comprehend the contextual subtleties of the oncological domain and are consequently better equipped to design AI-powered technologies for this domain. Other interviewees also regarded the mutual educational opportunities as an "unmet need " (P01, P05). As a result, in recent years and in specific cases, AI is an additional speciality being incorporated in the MTB meetings.
Through our research, we aspire to bring contextualized insights about the oncological domain, and the peculiarities of physicians' individual and collective practices. This, in turn, can inform the Human-Centered Design and AI communities to rethink the ways of designing intelligent and interactive technologies which are attuned to the expectations and needs of physicians.

Ethics-Centered Design of AI
A recurrent theme in our interviews concerning the erroneous outcomes of AI systems was the notion of 'responsibility' and from a legal perspective who bears the liability -the physician or the programmer. In particular, these concerns are exacerbated with regards to the use of black-boxes on the clinical side. Moreover, the perceived adverse effects of algorithmic mistakes on human life also raises ethical concerns regarding the use of AI in clinical routines, and manifests in a dilemma on how to best deploy AI-powered technologies. Our participants used varied examples -such as AVs, racial bias in algorithmic decision making-to illustrate the urgency and gravity of the aforementioned concerns.
Even though the existing clinical routines and protocols are designed to moderate the negative impacts of technological failures, for example, through multidisci-plinary discussions or through exhaustive and repeated assessment of the medical diagnosis. Still, the developers of AI-powered systems could establish additional measures in their designs to ensure dependability on their outcomes. These measures could include provisions for 1) quality control assessments of the data, which might comprise of steps to identify inherent biases in the datasets and its explicit awareness to the physicians, 2) transparency about the functioning of the algorithms and the conditions under which their outcomes may be questionable, 3) making the confidence intervals in the predicted outcomes more salient for the physicians, 4) supporting predicted outcomes with substantial and verifiable evidence preferably from published research works, and 5) assisting physicians to seek for causation rather than correlation.
Finally, in order to attain a wider acceptability and trust for AI-powered technologies amongst physicians, and in the long term, AI designers might have to rethink ways of assimilating FATE principles not just in their products, but also in the complete design process. Addressing these concerns in a proper manner, entails a concerted collective and multidisciplinary effort on the part of all stakeholders including physicians, regulators, designers, Human-Computer Interaction and AI researchers. This is how we believe AI-powered technologies can successfully permeate the oncological domain.

Conclusions
We contribute by presenting a qualitative inquiry into physicians' clinical and research practices and the differential role and impact of image-based AI in these practices. We conducted semi-structured interviews with 7 physicians who predominantly rely on image-based analysis to draw the diagnosis, prognosis, and treatment plan. The objective of our analysis was to comprehend ways in which they currently leverage and experience AI-powered systems in their therapeutic work, and what are their perceptions and projections regarding the future of AI in the oncological domain.
Despite a relatively small sample size, data saturation was reached with concordant and complementary reported needs and recommendations. Our findings reveal that the existing impact of AI in the clinical practices is limited as compared to the clinical research. Moreover, in the future, AI is afforded a peripheral role of a 'decision helper' which might enable doctors to better understand the peculiarities and subtleties of cancers, and support them in developing novel treatment methods. Finally, in order to develop physicians' trust in AI and its wider acceptability in oncological domain, especially in the clinical practice, designers would have to address the ethical and liability concerns in relation to the use of certain class of AI systems (black-boxes). In this article, we provide specific design guidelines which could inform both the Human-Centered Design and AI researchers to meaningfully address the contextually-sensitive concerns and challenges around the adoption of intelligent interactive technologies in cancer care.

Declarations
Ethics approval and consent to participate Since no human data was processed in our study (e.g., biometric, personal), we did not seek for an ethics approval.
A week prior to each interview, we sent an Informed Consent Document to the interviewees. These documents clearly stated the nature of our research, the type of data that we collect, how we intend to store and analyze our data, and publish our findings. All the participants were informed that their facial images and audio/video recordings of interviews will not be shared on public repositories or future publications. Instead, the publications will contain excerpts from the transcribed interviews. Finally, the participants were also informed of their rights i.e. we can delete their interview data or a portion of their interviews upon their request, and we will seek their further consent if we aim for a different analysis.

Consent for publication
Not applicable for this paper.

Availability of data and materials
Not applicable for this paper.

Competing interests
The authors declare no conflicts of interests. Author's contributions HV was primarily responsible for the design of the interview study and the study materials, which were collectively reviewed by other co-authors. The interviews were jointly conduced by HV and JR. Furthermore, the interviews were transcribed by HV, and later analyzed in a collaborative manner by HV, RS, FE and AD. In addition, this article was collectively written by HV and AD. Moreover, since the aforementioned co-authors do not have a background in medicine, and particularly in cancer related specializations, MJ and JP independently validated our analysis and results (post-interview participation), and provided valuable feedback on the feasibility and applicability of the identified design guidelines.