3.1 Selection of sources of evidence
Of the 12,722 records identified after de-duplication, 81 peer-reviewed articles and 22 grey literature records met the inclusion criteria for a total of 103 records in the scoping review sample (Figure 1).
3.2 Synthesis of Results
Descriptive Analytics
The vast majority of publications had primary authors in the United States (n=42) or the United Kingdom (n=17) (Figure 2) and while our literature search yielded publications between 1989 and 2018, most were published between 2014 and 2018 (Figure 3). The academic and grey literatures addressed numerous AI-enabled health applications, including in particular, care robots[1] (n=48), followed by diagnostics (n=36), and precision medicine (n=16) (Figure 4).
There were notable differences between the academic and grey literature sources in terms of authorship, AI health applications addressed, and treatment of ethical implications. The academic literature was written by persons primarily affiliated with academic institutions, whereas the grey literature was written by researchers, industry leaders, and government officials, often collaboratively, with authors frequently affiliated with multiple institutions. The grey literature tended to cover a broader range of AI health applications, issues, and trends, and their associated ethical implications, whereas the academic papers typically centered their discussion on one or at most a few topics or applications. The grey literature was oriented more towards broader health and social policy issues, whereas the academic literature tended to focus on a particular dimension of AI in health. As compared to the grey literature, robotics, particularly care robotics(a) were highly represented in the peer-reviewed literature (48% of peer-reviewed literature, n=39; 18% of the grey literature, n=4). The academic literature on care robots was most concerned with the ethics of using care robots in health settings (e.g. “How much control, or autonomy, should an elderly person be allowed?”… “Are the safety and health gains great enough to justify the resulting restriction of the individual’s liberty?” (41, p. 31, p.33), whereas the grey literature tended to emphasize ethical or operational implications of using robots in health settings, such as the potential displacement of human jobs (42).
3.2.1 Common Ethical Themes
Four ethical themes were common across the health applications of AI addressed in the literature, including data privacy and security, trust in AI, accountability, and bias. These issues, while in many ways interconnected, were identified based on how distinctly they were discussed in the literature.
Privacy and Security
Issues of privacy and data security were raised about the collection and use of patient data for AI-driven applications, given that these systems must be trained with a sizeable amount of personal health information (43,44). Highlighted concerns about the collection and use of patient data were that they may be used in ways unbeknownst to the individual from whom the information was collected (45), and that there is a potential for information collected by and for AI systems to be hacked (45). One illustrative example of this challenge was that of the diagnostic laboratory database in Mumbai that was hacked in 2016, during which 35,000 patient medical records were leaked, inclusive of patient HIV status, with many patients never informed of the incident (45). Further noted was that patients may believe that their data are being used for one purpose, yet it can be difficult to predict what the subsequent use may be (46,47). For example, ubiquitous surveillance for use by AI systems through personal devices, smart cities, or robotics, introduces the concern that granular data can be re-identified (48,49), and personal health information can be hacked and shared for profit (49). Of further concern was that these smart devices are often powered by software that is proprietary, and consequently less subject to scrutiny (48). The stated implications of these privacy and security concerns were vast, with particular attention given to if ever personal data was leaked to employers and insurance companies (46,50–54). A prevailing concern was how population sub-groups may then be discriminated against based on their social, economic, and health statuses by those making employment and insurance decisions (49–51,53).
Trust in AI Applications
The issues of privacy, security, and patient and healthcare professional trust of AI were frequently and closely linked in the literature. Attention was given, for instance, to how individuals must be able to trust that their data is used safely, securely, and appropriately if AI technology is to be deployed ethically and effectively (2,46,55–57). Asserted in the literature was that patients must be well enough informed of the use of their data in order to trust the technology and be able to consent or reject its use (52,56). One example that highlights these concerns is the data sharing partnership between Google DeepMind, an AI research company, and the Royal London NHS Foundation Trust (NHS) (49,58). Identifiable data from 1.6 million patients was shared with DeepMind with the stated intention of improving the management of acute kidney injuries with a clinical alert app (58). However, there was a question of whether the quantity and content of the data shared was proportionate to what was necessary to test the app, and why it was necessary for DeepMind to retain the data indefinitely (49,58). Furthermore, this arrangement has come under question for being made in the absence of adequate patient consent, consultations with relevant regulatory bodies, or research approval, threatening patient privacy, and consequently public trust (49,58).
HCPs have similarly demonstrated a mistrust in AI, resulting in a hesitancy to use the technology (59,60). This was exhibited, for instance, by physicians in various countries halting the uptake of IBM’s Watson Oncology, an AI-powered diagnostic support system (61). These physicians stated that Watson’s recommendations were too narrowly focused on American studies and physician expertise, and failed to account for international knowledge and contexts (61). The distrust amongst HCPs was also raised with regards to machine learning programs being difficult to both understand and explain (62,63). In contrast, a fear exists that some HCPs may place too much faith in the outputs of machine learning processes, even if the resulting reports, such as brain mapping results from AI systems, are inconclusive (57). One suggestion to improve HCP trust in AI technology was to deploy training and education initiatives so HCPs have a greater understanding of how AI operates (43). A further suggestion was to promote the inclusion of end-users in the design of the technology so that not only will end-users develop a better understanding of how it functions (64), but user trust will also increase through a more transparent development process (47).
Accountability for use of AI technology
Frequently mentioned was the question of who ought to assume responsibility for errors in the application of AI technology to clinical and at-home care delivery (41,45,58–60,65–67). The question often arose in response to the fact that AI processes are often too complex for many individuals to understand and explain, which hinders one’s ability to scrutinize the output of AI systems (2,61,66). Similarly, grounds for seeking redress for harm experienced as a result of its use were noted to be obstructed by the proprietary nature of AI technology, for under the ownership of private companies, the technology is less publicly accessible for inspection (2,48,51,68). Further to these questions, a debate remains as to whether or not HCPs ought to be held responsible for the errors of AI in the healthcare setting, particularly with regards to errors in diagnostic and treatment decisions (41,45,57).
Beyond the clinical environment, issues of accountability arose in the context of using care robots. Related questions revolved around the burden of responsibility if an AI-enabled robotic care receiver is, for example, harmed by a robotic care provider (2). Is the burden of responsibility for such harm on the robot manufacturer who wrote the learning algorithm (69)? Similarly, the question arose of who is to be held accountable if a care receiver takes their own life or the life of another under the watch of a care robot (46). If a care robot is considered an autonomous agent, should this incident then be the responsibility of the robot (46)? While proposed solutions to accountability challenges were few, one suggestion offered included building in a machine learning accountability mechanism into AI algorithms that could themselves perform black box audits to ensure they are privacy neutral (45, pg. 18). Also suggested was appropriate training of engineers and developers on issues of accountability, privacy, and ethics, and the introduction of national regulatory bodies to ensure AI systems have appropriate transparency and accountability mechanisms (45).
Adverse Consequences of Bias
Bias was yet another transcending ethical theme within the literature, notably the potential bias embedded within algorithms (43,54,59,64,68,70–73), and within the data used to train algorithms (43,45,49,51,55,59–61,63,64,68–70,73,73–78). The prevailing concern with algorithms was that they are developed by humans, who are by nature fallible, and subverted by their own values and implicit biases (68,72). These values have been noted to often reflect those that are societally endemic, and if carried into the design of AI algorithms, could consequently produce outputs that advantage certain population groups over others (43,51,54,59,63,68,71,73,73,75). Bias was indicated to similarly manifest in the data relied upon to train AI algorithms, by way of inaccurate and incomplete datasets (48,51,63,75,78), or by unrepresentative data sets (43,76,77), thus rendering AI outputs ungeneralizable to the population unto which it is applied (51,68,75).
Not only have biased data sets been noted to potentially perpetuate systemic inequities based on race, gender identity, and other demographic characteristics (48,51,59,63,68,70), they may limit the performance of AI as a diagnostic and treatment tool due to the lack of generalizability highlighted above (43,48,77). In contrast, some noted the potential for AI to mitigate existing bias within healthcare systems. Examples of this potential include reducing human error [50]; mitigating the cognitive biases of HCPs in determining treatment decisions, such as recency, anchoring, or availability biases (45,51); and reducing biases that may be present within healthcare research and public health databases (48). Suggestions to address the issue of bias included building AI systems to reflect current ethical healthcare standards (70), and ensuring a multidisciplinary and participatory approach to AI design and deployment (72).
3.2.2 Specific Ethical Themes by AI Application in Health
Three health applications were emphasized in the reviewed literature: care robots, diagnostics, and precision medicine. Each health application raised unique ethical issues and considerations.
Care Robotics
A notable concern for the use of care robots was the social isolation of care recipients, with care robots potentially replacing the provision of human care (41,61,79–84). Some asserted that the introduction of care robots would reduce the amount of human contact care recipients would receive from family, friends, and human care providers (41,61,79,81–84). Implications of this included increased stress, higher likelihood of dementia, and other such impacts on the well-being of care recipients (41). Others, in contrast, viewed robots as an opportunity to increase the “social” interaction that already isolated individuals may experience (41,79,85,86). Care robots could, for example, offer opportunities for care recipients to maintain interactive skills (86), and increase the amount of time human care providers spend having meaningful interactions with those they are caring for (79) as opposed to being preoccupied with routine tasks. Yet despite these opportunities, of note was the idea that care robots risk deceiving care recipients into having them believe that the robots are ‘real’ care providers and companions (41,46,79,81–83,87–89), which could undermine the preservation and promotion of human dignity (41,87).
The issue of deception often linked to the question of ‘good care’, what the criteria for good care are, and whether robots are capable of providing it. In the context of deceit, some consider it justified as long as the care robot allows them to achieve and enhance their human capabilities (88,90). Also challenged was the assumption that good care is contingent upon humans providing it (46,88,91), for while robots may not be able to provide reciprocal emotional support (88), humans similarly may fail to do so (91). A further illustrated aspect of good care was the preservation and advancement of human dignity (88), support for which can be offered by robots insofar as they promote individual autonomy (41,61,69,79,82,83). Some, however, contest this, arguing that care robots may in fact reduce a person’s autonomy if the technology is too difficult to use (82); if the robot supersedes one’s right to make decisions based on calculations of what it thinks is best (61); and because the implementation of robots may lead to the infantilization of care recipients, making them feel as though they are being treated like children (83). The promotion of autonomy also appeared controversial, acknowledged at times as the pre-eminent value for which robots ought to promote (69,86), where at others, autonomy was in tension with the safety of the care recipient (41,86). For example, with the introduction of care robots, care recipients might choose to engage in unsafe behaviours in pursuit of, and as a result of, their new independence (41,86). A comparable tension exists in the literature between the safety of care recipients, which some believe care robots protect, and the infringement on the recipient’s physical, and information privacy (41,46,83,86,92,93).
Diagnostics
Diagnostics was an area that also garnered significant attention with regards to ethics. Of note was the ‘black box’ nature of machine learning processes (36,45,51,63,74,94–96), frequently mentioned with a HCP’s inability to scrutinize the output (44,51,63,96). With the acknowledgement that the more advanced the AI system, the more difficult it is to discern its functioning (94), there was also a concern that due to the difficulty in understanding how and why a machine learning program produces an output, there is a risk of encountering biased outputs (74). Thus, despite the challenge of navigating these opaque AI systems, there is a call for said systems to be explainable in order to ensure responsible AI (45,74). Also a pervasive theme was the replacement and augmentation of the health workforce, particularly physicians, as a result of AI’s role in diagnostics (44,59,63,95,97). While few fear the full replacement of physicians in diagnostics (2,63,95), some expect its presence to actually enhance the effectiveness and efficiency of their work (63,95). There were expressed concerns, however, about how the roles and interactions of physicians may change with its introduction, such as the ethical dilemma encountered if a machine learning algorithm is inconsistent with the HCP’s recommendation, if it contradicts a patient’s account of their own condition, or if it fails to consider patients’ non-verbal communication and social context (59).
Precision Medicine
Issues of bias persisted in discussions of precision medicine, with the recognition that biased data sets, such as those that exclude certain patient populations, can produce inaccurate predictions that in turn can have unfair consequences for patients (75). While precision medicine was a less prominent theme than the aforementioned AI applications, questions of the accuracy of predictive health information from the intersection of AI and genomics arose, as did an uncertainty of where and by whom that data may then be used (98). In the case of AI-assisted gene editing, deep learning holds potential for directing experts where in the human genome to use gene editing technologies such as CRISPR, to reduce an individual’s risk of contracting a genetic disease or disorder (25). However, deep learning models cannot discern the moral difference between gene editing for health optimization, and gene editing for human enhancement more generally, which may blur ethical lines (25). A further tension exists in how the technology is deployed to support human choices; for example if a person not only seeks gene editing to reduce their risk of inheriting a particular genetic disease, but to also increase their muscle mass, obtain a particular personality trait, or enhance their musical ability (25). Also illuminated was the implications of AI-enabled precision medicine in the global north versus the global south (99). First is the possibility that this technology, given its high associated costs and greater accessibility in the developed world, might leave LMICs behind (99). Second was the awareness that the introduction of genetic testing may undermine low cost, scalable and effective public health measures, which should remain central to global health (99).
3.2.3 Gaps in the Literature
Healthcare was the predominant focus in the ethics literature on AI applications in health, with the ethics of AI in public health largely absent from the literature reviewed. One article that did illuminate ethical considerations for AI in public health highlighted the use of AI in environmental monitoring, motor vehicle crash prediction, fall detection, spatial profiling, and infectious disease outbreak detection, among other purposes, with the dominant ethical themes linking to data privacy, bias, and ‘black box’ machine learning models (76). Other articles that mentioned public health similarly illustrated infectious disease outbreak predictions and monitoring (61,78,100), tracking communicable diseases (100), mental health research (101), and health behaviour promotion and management (59,100), however these applications were only briefly mentioned in the broader context of primary healthcare, and few spoke to the ethics of these applications (59,101,102).
In the literature reviewed, there were also evident gaps in the area of global health, with few considerations of the unique ethical challenges AI poses for LMICs. Though there was mention of utilizing AI for screening in rural India (45); genomics research in China (25); facial recognition to detect malnutrition in Kenya (74); and precision medicine in LMICs more broadly (99), among others, there was a significant gap in the literature commenting on the ethics of these practices in the global south. Furthermore, there was little discussion of health equity, including how the use of AI may perpetuate or exacerbate current gaps in health outcomes between and within countries. Instead, references to “global” health were often limited to global investments in AI research and development (R&D), and a number of innovations currently underway in HICs (25,41,49,59,69,85,103–105). The lack of focus on global health was further reflected in the primary authorship of the literature, with a mere 5.8% (n=6) of the reviewed literature authored by individuals from LMICs. Furthermore, 33% (n=34) of articles had primary authorship from non-English speaking countries, which indicates that while the discourse of AI is indeed global in scope, it may only be reaching an Anglo-Saxon readership, or at the very least, an educated readership.
[1] Robots for the care of the sick, elderly, or disabled bore a number of different labels in the literature, however they will herein be described as ‘care robots’ in an effort to broadly discuss the associated ethical challenges. ‘Care robots’ as used in this context are exclusive of surgical robots. Only those care robots that relied on AI are discussed, such as those that can understand commands, can locate and pick up objects, relocate a patient, and other tasks that require machine intelligence.