Many health organizations have begun to strategize how to best incorporate AI into their core functions and have developed AI or data-specific strategies, reports, and guidance documents (Table 1; see Appendix for search strategy). Review of these documents reveals many common priorities and approaches. Informed by this review, we have identified five key priorities needed for successful use of AI technologies by public health organizations (Table 2):
Abbreviations: AI, artificial intelligence; CIFAR, Canadian Institute for Advanced Research; CIHR, Canadian Institutes of Health Research; HHS, Health and Human Services; IBC, International Bioethics Committee; NHS, National Health Service; NIH, National Institutes of Health; PHO, Public Health Ontario; UNESCO, United Nations Educational, Scientific and Cultural Organization
Every public health organization exists within a larger governance context. Comprehensive understanding of relevant legislation, policies and procedures that govern use of AI for health is therefore integrally important to the safe and successful use of AI for public health activities. This governance exists at different levels, from the international and federal level to organization-specific governance frameworks designed to guide the strategic and efficient management of data and AI technologies. All the organizational documents we reviewed discussed organizational governance and the associated challenges. However, we argue that public health organizations need to focus on understanding and operationalizing higher-level governance rather than reinterpreting into organization-level governance frameworks. Importantly, this should include the intimate involvement of subject-matter experts in AI, data management and information technology to help ensure that higher-level governance is being interpreted appropriately, operationalized realistically, benefits and risks are both fully understood, and that unnecessary restrictions are not being implemented. It is important to also recognize that the higher-level governance context can change, and that organizational governance must be able to easily adapt. The European Union General Data Protection Regulation represented a massive shift in data protection and privacy and has prompted review of privacy regulation around the world.[65] Canada’s Privacy Act, for example, is currently under review.[66]
Table 2
General recommendations to support five strategic priorities for successful use of artificial intelligence by public health organizations
Strategic Priority
|
General Recommendations
|
Governance
|
|
|
- Clarify data leadership roles and responsibilities
- Review current organizational governance
- Understand and operationalize higher-level governance
o Involve subject-matter experts in AI, data management and information technology
o Mechanism for community and public engagement
- Establish transparent oversight and accountability
|
Infrastructure
|
|
|
- Assess infrastructural and analytic needs
- Increase data access
- Improve data interoperability
- Increase availability of advanced analytic infrastructure and tools
o Consider investment in distributed data platforms and cloud computing
|
Upskilled Workforce
|
|
|
- Identify and forecast desired skills and competencies, and review existing skills and capacity
- Upskill existing staff
o Increase data literacy across the organization, with a focus on bias and equity considerations
- Recruit new staff with desired skills
- Engage with trainees; consider development of trainee fellowship programs
- Foster multidisciplinary collaboration and diversity
|
Partnerships
|
|
|
- Identify areas where partnerships may be helpful (e.g., gain expertise, gain or share access to data or infrastructure, engage a wider variety of perspectives)
- Consider partnerships with
o Local, provincial/state, federal government
o Educational institutions
o Private sector
|
Best Practices
|
|
|
- Consider use of an existing ethical AI framework
- Default to transparent data and analytic processes and following open science principles whenever possible
- Engage with the public
- Ensure access to practical guidelines for AI development, evaluation, and implementation
|
Organization-level governance should focus on the development and maintenance of effective and efficient data and information technology (IT) systems within the constraints of higher-level regulation. This should include an emphasis on data procurement, linkage and access, privacy, data and IT interoperability, investment in and maintenance of IT infrastructure, prioritization of AI projects, and workforce management of AI, data, and IT personnel. Common governance priorities identified in the documents reviewed include transparent and clear definition of roles and responsibilities and strict oversight and accountability.[48-50, 54, 56] Several organizations have established new roles to lead data governance activities, including a Chief Data Officer at the Public Health Agency of Canada (PHAC)[50] and Health Canada[49], and a Chief Data Strategist at the United States National Institutes of Health.[53] Individuals in these roles are tasked with leading data strategy implementation in collaboration with relevant organizational data councils. Other organizations have prioritized increased communication and coordination between relevant individuals and councils responsible for data governance activities.[54]
For ethical AI use, the public should be engaged and informed about how their data is used, how AI applications may influence their lives, and be given space to voice their preferences and concerns.[57, 63, 67, 68] Community governance, which involves participation and engagement of the public in decision-making about one’s community, has become recognized as particularly important when considering First Nations’ and Indigenous data and information. The First Nations Principles of OCAP (Ownership, Control, Access, and Possession) were developed to protect Canadian First Nations’ data and information and ensure that it is used and shared in a way that brings benefit to the community while minimizing harm (www.fnigc.ca). Similarly, the CARE (Collective Benefit, Authority to Control, Responsibility and Ethics) Principles for Indigenous Data Governance are global principles for governance of Indigenous data (www.gida-global.org/care), and EGAP (Engagement, Governance, Access and Protection) is a governance framework for health data collected from Black communities (www.blackhealthequity.ca).
Investment in modernized data and analytic infrastructure and procedures
Modernization of organizational data infrastructure and procedures is widely recognized by health organizations as vital to moving forward with AI application and strategic use of data. A common priority of all organizational strategies we reviewed was to improve data access[48-55] by reducing administrative barriers, reviewing, and revising data use agreements, exploring new data de-identification techniques and establishing remote access to data and analytic tools. Investment in distributed data platforms and cloud computing infrastructure is widely discussed as a means of facilitating rapid and seamless data access in addition to improving data storage and increasing computational power for advanced analytics.[48, 50, 51, 53, 54, 57] These platforms may also reduce infrastructure and maintenance costs in the long-term, compared to local data centers.[53] Health Canada additionally provides access to data through application programming interfaces, [49] which Statistics Canada are also looking to use to provide data access to Government of Canada departments.[48]
Many organizations are also seeking to improve data interoperability. The NHS is aiming to modernize data infrastructure and increase interoperability through development of a Data Services Platform that will serve as a single place for data collection, processing and management.[55] Similarly, the United States National Institutes of Health (NIH) has goals to connect their data systems and reduce data ‘silos’.[53] Interoperability is also a primary goal of the Statistics Canada Data Strategy, which they are seeking to improve through the development and use of open data standards.[48] Similarity, Health Canada is aiming to improve data standardization, consolidation and integration through use of open standards and sharing of expertise.[49] Easily accessible data documentation, essential for data interoperability, has also been prioritized in several of the organizational data strategies we reviewed. Examples of this include the Health Canada Information Reference Model,[49] the United States NIH Data Discovery Index[53] and a data holding inventory by PHAC.[50] Some organizations are also seeking to improve data interoperability through use of common data models, schema for data harmonization and standardization.[49, 51, 53] Use of existing commercial tools, technologies and services as opposed to internal development of project or organization-specific data infrastructure is also recognized as a means of improving system interoperability and data integration both within and outside of an organization. [48, 49, 59] Increased data linkage is also a common organizational priority.[51-55]
In addition to modern data infrastructure and procedures, successful use of AI also requires advanced analytic infrastructure and tools. Many organizational strategies outline plans to increase organizational capacity for advanced analytics by assessing organizational needs,[49] increasing computational power,[48, 50] facilitating access to new analytic tools,[48-53, 55] and through pilot projects using AI methods.[50, 51] It is important to establish what analytic tools are needed to enable AI use, as most traditional public health tools are incapable and/or are not familiar to those with AI or machine learning expertise. For example, Python (Python Software Foundation) and R (R Foundation for Statistical Computing) are programming languages commonly used to develop machine learning models and Git is a popular, free, and open-source version control system that tracks coding changes. TensorFlow, which is also free and open-source, is an end-to-end software library for machine learning that is especially effective at efficiently deploying machine learning algorithms.[69] As public health professionals do not traditionally use these tools currently, it is important for those with expertise in computer science, AI, and machine learning to determine the appropriate infrastructure, software and tools needed to perform advanced analytics. Several organizations also recognized the importance of flexibility in accessing new analytic tools to enable ‘nimble and agile data analytics’.[48-50]
Addressing the skills gap
Successful use of big data, advanced analytic methods and AI requires a workforce with strong data literacy and capacity in data management, statistics, computer science, software engineering, data privacy, bias, and ethics, among other skills. All organizational data strategies we reviewed recognized the importance of building a workforce that is educated in these skills and outlined plans to achieve it through training staff and leveraging existing skills, targeted recruitment, and engagement with trainees and educational institutions. Most of the strategies also discussed the intention to increase organizational skills and capacity in AI specifically.[48-54]
Upskilling existing staff will generally be an important priority of all public health organizations interested in increasing use of AI. It should first involve identifying and forecasting desired data and analytic competencies and a review of existing organizational skills and capacity.[50, 52, 55] Data literacy, defined as the ability to collect, manage, evaluate, and critically apply data,[70] is widely recognized as a vital competency to be emphasized across health organizations interested in AI.[48-50, 52, 54] Statistics Canada has developed data literacy training products including the Framework for Responsible Machine Learning Processes at Statistics Canada[71] and introductory training videos on machine learning, data stewardship and data quality, among others.[72] The Government of Canada developed a Digital Academy in 2018 to “help federal public servants gain the knowledge, skills and mindsets they need in the digital age”, and includes training on data literacy and competencies, cloud computing, cyber security, AI and machine learning, among other topics.[73] The Digital Academy is being used by PHAC and Health Canada to train existing and new employees.[49, 50] PHAC outlined many additional training strategies, including use of third-party web-based tools, self-directed learning, trainings customized to specific audiences, development of a Data 101 onboarding package and specific training in innovation.[50] The United States Department of Health and Human Services (HHS) is looking to increase data science and statistical training opportunities and increase multidisciplinary collaboration across the organization, recognizing that informed data science decisions require a wide range of skills and expertise.[54] The NHS is seeking to leverage existing skills through the creation of teams specializing in particular data skills and through external and internal staff rotation, in addition to the development of training programs.[55] ICES is looking to develop a data science staff education strategy, which will include data science workshops and increased exposure of analysts and methodologists to the R statistical programming language.[51] Statistics Canada is seeking to develop a culture of ‘continuous learning’.[48] Continuous learning can be facilitated in part by increased access to scientific publications, a priority of PHAC.[50]
Targeted recruitment of new employees is another means of developing a workforce educated in data science and AI and is an important component of the workforce development plan for many health organizations.[48-50, 52-55] As individuals with many of the desired skills have not traditionally worked in health, it is important to consider how to best attract and retain this talent. This begins with increasing data literacy across the organization and provision of appropriate infrastructure and tools, and is further facilitated by an organizational culture that is receptive to change and taking risks. A goal of Health Canada Data Strategy, for example, is to provide employees with “an agile collaborative space for learning and innovative uses of data” and is seeking to create an strong data culture and environment that values experimentation and learning from failure.[49] PHAC and PHO have similar innovation and risk-taking goals.[50, 52] The United States HHS strategy outlines four approaches to hiring data scientists, including participating in job fairs and industry events, creation of intern and fellowship programs, hiring of individuals with non-traditional backgrounds into senior positions and making use of existing specialized hiring programs.[54] The United States NIH is looking to develop a Data Fellows program, in which individuals with desired skills are recruited from the private sector and academia for short-term national service sabbaticals.[53]
It has been suggested that trainees are “the glue that tie researchers together”, fostering interdisciplinary research and learning.[58] Engagement with trainees additionally increases awareness of organizational data science career possibilities and provides the organization with access to new and developing data science talent. Most of the organizational strategies we reviewed include engagement with trainees and educational institutions as part of their workforce development plans.[48, 50, 51, 53, 54] In Canada, the Health System Impact Fellowship[74], funded by the Canadian Institutes for Health Research, has an equitable AI stream in which PhD and postdoctoral fellows with skills in computer science, AI and data science are embedded within health system organizations to help solve critical health system challenges.[75] Both PHAC and PHO have hosted fellows through this program. Other organizations have plans for similar organization-specific fellowship programs, including Statistics Canada[48] and the United States NIH and HHS.[53, 54]
Lastly, it has been recognized that scientific teams greatly benefit from being diverse and multidisciplinary.[34, 58, 63] An AI report from the United States National Academy of Medicine recommends that AI teams are diverse in “gender, culture, race, age, ability, ethnicity, sexual orientation, socioeconomic status, privilege, etc.” to promote the development of impactful and equitable AI tools.[63] The United States NIH is looking to increase workforce diversity, in part through their Big Data to Knowledge Diversity Initiative.[53] The United States HHS has goals to promote multidisciplinary data science teams and increase cross-program and interdepartmental collaboration.[54]
Development of strategic collaborative partnerships
Development of collaborative partnerships is an important component of strategic data use and successful AI implementation. Collaboration can come in many forms and be used to gain expertise, gain, or share access to data and infrastructure, and engage a wider variety of perspectives. The CIFAR AI for Public Health Equity report recommends collaboration of public health professionals and researchers with computer science and AI researchers, in addition to a wide range of other groups (e.g. sociologists, political scientists, engineers, civil society and citizen scientists, people with lived experience, policymakers) to help ensure health equity when using AI technologies.[58] ICES is looking to facilitate development and implementation of data and computational infrastructure through partnerships in addition to continued collaboration with external scientists and research institutions for data science and AI expertise.[51] Many governmental organizations plan to closely collaborate with other local, provincial/state or federal government organizations or departments, sharing infrastructure, data and expertise.[48-50, 53, 54] As mentioned previously, engagement with trainees can be greatly beneficial to workforce development and promotes collaboration with educational institutions. Collaboration with the private sector can additionally be advantageous. The United States NIH, for example, is seeking to leverage private sector infrastructure through strategic collaboration.[53]
Use of AI best practices including explicit consideration of equity
Health equity has been defined to mean that “all people can reach their full health potential and should not be disadvantaged from attaining it because of their race, ethnicity, gender, age, social class, language and minority status, socio-economic status, or other socially determined circumstance”.[76] Ethical considerations exist at all stages of the AI development and implementation pipeline, from problem selection and data collection to post-deployment.[34] As public health professionals are trained to think about bias, generalizability and equity, they are especially able to recognize and inform mitigation strategies for AI use in public health in collaboration with computer science and AI professionals. As mentioned previously, an organization’s workforce should also be diverse and educated in bias and equity issues.
Best practices have been established to guide the development, implementation, and evaluation of AI-powered technologies to ensure that they are not only useful, but additionally do not create, sustain, or exacerbate health inequities. Many principles and frameworks have been developed to guide the ethical use of AI.[67] The UK National Health Service (NHS)[55, 60] and United Kingdom (UK) National Academy of Medical Royal Colleges[61] recommends following the ‘Guide to Good Practice for Digital and Data-Driven Health Technologies’.[77] This document developed by the UK Government outlines ten principles to guide the development and implementation of data-driven health and care technologies. Many organizations[49, 53, 78] refer to the FAIR data principles: research data should be finable, accessible, interoperable and reusable.[79] A report from the United States National Academy of Medicine refers to several existing frameworks and principles including ‘Artificial Intelligence at Google: Our Principles’[80] and the ‘AI Now Report 2018’.[81] Common principles among many of these frameworks include transparency, non-maleficence, responsibility and accountability, privacy, freedom and autonomy, beneficence, trust and justice, fairness and equity.[67]
Transparency is intended to foster trust and prevent harm and is one of the most common ethical AI principles.[67] Transparency in AI often refers to efforts to increase explainability and interpretability and generally involves detailed disclosure of how an AI model or technology was developed, how it performs, the data it uses, how it is deployed and used, discussion of limitations, and may involve sharing of source code and data.[67] Transparent AI promotes freedom and autonomy by increasing the public’s knowledge of AI and promoting informed consent.[67] The EQUATOR (Enhancing the QUAlity and Transparency Of health Research) Network is an international initiative promoting transparent reporting of health research literature by encouraging wider use of robust reporting guidelines (www.equator-network.org). Particularly relevant to AI is the ‘Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis’ (TRIPOD) reporting guideline,[82] with an upcoming extension specifically for models developed using machine learning methods.[83] Health Canada is looking to improve data transparency and access to data as a means of increasing public confidence in decision-making.[49] The NHS has committed to greater transparency in data use and in algorithmic development and deployment.[77]
Closely related to the principle of transparency is open science. Open science is a movement to make scientific research transparent and accessible to all. Open science reduces research waste, facilitates reproducibility, and allows for AI to more easily benefit everyone (related to the ethical principle of beneficence). One of the main recommendations of the CIFAR Public Health Equity report[58] and the NHSX[60] is that organizations should use and allow for sharing of datasets, data repositories, and resources. Open science efforts include providing the public with access to data and information, use of open data standards and open-source programs, open-source code, use of open data, and open access publication. A commitment to increased data sharing was stated in several organizational strategies[49, 50, 53, 54] Several organizations also prioritized increasing their use of open data standards to improve interoperability.[48, 50, 53] Statistics Canada is committed to increased transparency of data use and processes, including through publishing of code on the Open Data Portal.[48] The Pan American Health Organization also lists open science and open data as guiding principles.[56] Principles of transparency and open science, however, need to be carefully balanced with privacy and confidentiality through organizational governance. Protections must be in place to ensure data protection and security and prevent discrimination of individuals and small population sub-groups.[84] The Health Canada data strategy states that “getting privacy and ethics right will actually enable increased use and sharing of data, since data stewards will have knowledge of the data limits and have confidence that they can use and share data without harm.”[49]
As mentioned previously, public engagement is also important for the ethical use of AI.[57, 63, 67, 68] The CIFAR Artificial Intelligence for Health (AI4H) task force report recommends that “members of the public and patients should be included as active partners in the development, governance and evaluation of AI4H policies and strategies”.[57] For public health specifically, it has been recommended that rural and remote communities and people with lived experiences be engaged in relevant AI research and implementation from project initiation.[58, 59] Another means of engaging with the public is through citizen science, in which members of the public lead or participate in scientific research, recommended by two reports from the United States.[63, 64] The United States NIH is committed to facilitating citizen science in their Strategic Plan for Data Science, through public access to data, tools and education in addition to exploration of other community engagement models.[53]
In addition to ethical principles and guidelines there exist practical guidelines for developing and reporting prediction models. The detailed explanation and elaboration document for the previously mentioned TRIPOD reporting guideline lists many practical recommendations for developing well performing models, including predictor measurement and description, defining the outcome, handling of missing data and variable preprocessing.[85] We anticipate that the upcoming TRIPOD-ML guideline will be especially useful.[83] Other practical considerations for the use of AI in health include the importance of representative data, cross-validation and data leakage, overfitting, and rigorous model evaluation.[86, 87]
Moving towards an AI-enabled public health organization
Among those that provided a timeline, organizations generally planned to take steps toward all identified priorities in parallel [48, 50, 55] although progress on governance issues and infrastructure are likely needed before significant progress can be made on other priorities (Figure 1).
Initial governance activities should include clarification of data and analytic leadership roles and responsibilities, and a review of current organizational governance to ensure alignment within the larger governance context. Existing data and analytic infrastructure should be evaluated in consultation with data management, data science and AI experts to identify priorities for modernization and identify places where a small early investment may have a large impact. An early focus on data standardization and documentation may have long-term benefits to data interoperability within the organization. Pilot projects evaluating use of new infrastructure and/or advanced analytic methods should be initiated in several application areas. For example, the United States NIH piloted use of a cloud computing environment with a small number of test datasets to establish the architecture, policies and processes for storage, sharing and analysis of data through the NIH Data Commons Pilot.[53]
To begin to address the skills gap and establish a workforce educated in data and analytic skills, desired skills and competencies should be identified and forecasted, and existing skills and capacity reviewed to inform the development of employee training and targeted hiring programs. Training of existing and new employees at all levels of seniority in data literacy, bias and equity should be prioritized early, as organizational culture changes slowly. Areas where partnerships may be beneficial should be identified and relationship-building prioritized. Organizations should consider use of an existing ethical AI framework to guide AI activities and default to transparent data and analytic processes and following open science principles whenever possible. Access to practical guidelines for AI development, evaluation, and implementation should be ensured.