Codified Racism in Digital Health Platforms A Meta-Analysis of COVID-19 Prediction Algorithms and their Policy Implications

doi:10.21203/rs.3.rs-3249460/v1

Download PDF

Research Article

Codified Racism in Digital Health Platforms A Meta-Analysis of COVID-19 Prediction Algorithms and their Policy Implications

https://doi.org/10.21203/rs.3.rs-3249460/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

New technologies are changing the way we interact with the world around us, and we tend to use them based on the assumption of their neutrality. This, however, is far from the truth. The blind spots of algorithmic models reflect the goals and ideologies of its developers and the society in which they live, and run the risk of replicating and amplifying human biases. This paper analyzes the harmful racial biases that are present in predictive digital health algorithms and the best way to regulate them. To answer the research questions, a meta-analysis was carried out of prognostic COVID-19 models developed for clinical use within the US using an analytic framework designed to reveal the risk for harmful racial biases. Of the five models observed, all presented with medium risk for bias. Possible policy recommendations for mitigating this bias include establishing national ethics standards, diversifying the AI workforce, investing in transparent data access symptoms, and improving biased measurement tools. While previous attempts to regulate this space have been made, to fully address racial bias in digital health, policymakers must acknowledge the historical systems of oppression that shape us and by extension, our technologies, especially in such a high-risk setting as healthcare.

New technologies have changed the way we interact with the world around us. Decision-making processes have been particularly altered, with algorithms and automated systems being used to increase our efficiency, consistency, and impartiality. Governments use predictive algorithms to automate the immigration process[1]; banks make credit allocations based on the recommendations of algorithms[2]; corporations use software to aid their decision-making in every stage of their hiring process.[3] Algorithmic decision systems (ADS) rely on the analysis of large amounts of data to determine correlations and patterns to derive information useful for decision making, with human intervention varying across applications.[4]

The health sector has also seen an influx of automated decision-making. A medical algorithm can standardize the selection of patient care plans, improve adherence to value-based care, or provide timely clinical decision support at direct points of care. Electronic health systems containing data for billions of patients is transferred to healthcare automation systems that “can think like doctors, only faster and more efficiently”.[5] Large health systems utilize risk-prediction algorithms for a range of services, including high-risk care management. These programs are often expensive, and the algorithm allows for the identification of patients who would derive the greatest benefit, therefor maximizing resource allocation for better health outcomes.[6] Throughout the pandemic, researchers and private companies have been developing algorithms that use demographic and clinical data to help hospitals predict COVID-19 outcomes, complications, and mortality, in an effort to guide the management of under-resourced hospitals and eventually alleviate the stress of pandemic.[7], [8] There a three main types of prediction models that have been developed for COVID-19; models predicting COVID-19 susceptibility in the population to guide the use of preventative healthcare resources in high-risk areas; models predicting the presence of COVID-19 in patients with symptoms to direct diagnostic capacity to patients with a high probability of having COVID-19, and models predicting a course in patients diagnosed with COVID-19 to allocate hospital resources to those with an estimated poor prognosis.[9] The number of prediction models related to COVID-19 is increasing every day, but as recent research has shown, these algorithms run the risk of magnifying and even aggregating existing biases that are harmful to marginalized communities.

We often use technology based on the assumption of its neutrality. This, however, is far from the truth. Models are simplifications of our environment, and “[can’t] include all of the real world’s complexity to the nuance of human connection.”[10] Their blind spots reflect the goals and ideology of those who created it and the society they live in and run the risk of replicating and amplifying human biases.[11],[12] Take the Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) Tool, for example. COMPAS is a risk assessment algorithm designed for judges in the U.S. to predict a defendant’s risk of committing another crime upon their release. However, it incorrectly identified Black defendants as potential re-offenders far more often than their White counterparts.[13] Despite being pitched as “race neutral” and a more accurate, objective mechanism for decision making, the algorithm only furthers racial biases that currently exists in our justice system.

This can be especially damaging when applied through healthcare, which itself is ridden with historical racial biases. In the U.S., racial minorities disproportionately suffer from higher rates of chronic illness, injuries, infection, and mortality than their white peers do. Black women, for instance, are far less likely to suffer from breast cancer but are 40% more likely to die from it. Racial disparities persist in maternal health, with pregnancy-related deaths for black women five times as high as it is for white women.[14] Centuries of racism in the U.S. have had startling effects on the health of minority communities, and COVID-19 has emerged as the most recent example. The current pandemic has caused disproportionate harm to historically marginalized groups, with exposure, hospitalization, and death rates from COVID-19 higher among Black, Hispanic, American Indian, and Native Hawaiian individuals.[15] Long-standing inequities in the social determinants of health, or the conditions of where people live, work, and learn, increase the risk of COVID-19 exposure, hospitalization, long-term health consequences, and death for marginalized groups.[16] Vaccination coverage rates also vary by race, with Black and Hispanic people less likely than their White counterparts to receive a vaccine, even though White people account for the largest share of the unvaccinated.[17] If this is the foundation on which medical algorithms are built, how can we assume that racial bias will not be translated through new, algorithmic channels?

Algorithmic bias can present itself through multiple facets of the model’s design, including through incomplete training data, or inaccurate information reflective of historical inequalities. Human biases are shaped by deeply embedded prejudices against certain groups, and if left unchecked, manifest themselves in the computer models we develop in an attempt to improve both efficiency and accuracy. This is especially troubling in a sector like healthcare, where a biased decision could lead to preventable deaths and deepen the pre-existing health inequities of historically marginalized groups. It is almost impossible to completely remove bias within our technology while racist systems still stand; however, the first step to improving an algorithm’s accuracy is to identify what biases are present and where they come from.[18] This paper aims to answer the following: What harmful racial biases are present in predictive digital health algorithms, and how can they best be regulated? An analytic framework developed within this paper will be used to uncover bias within the algorithm, and if present, the explicit and implicit consequences. The framework provides a conceptual map to an algorithm’s mechanisms, observing its’ Constitution, Input and Outputs, Training Data, Transparency, Outcome, Scale, and Policy. This is a variation of the model developed by Upturn and Omidyar Network’s 2018 Report, Public Scrutiny of Automated Decisions.[19] Depending on the bias uncovered, I will make policy recommendations for how best to remove, alleviate, and reduce algorithmic bias and the consequent effects.

Racism In the American Healthcare System

Systematic racism, defined as “a system in which public policies, institutional practices, cultural representations, and other norms work in various, often reinforcing ways to perpetuate racial group inequity” permeates almost every aspect of American society, and healthcare is no exception.[20] While there are difficulties pinpointing when racial bias was specifically introduced into the field of medicine, racism has been an integral part of the U.S. government’s structuring and financing of the healthcare system since the Jim Crow era. [21]

Jim Crow segregation laws – which legalized racial discrimination- separated America’s Black population from the rest of society. Some of these significantly affected the ability of Black Americans to access medical care and necessary medical treatments.[22] Medical institutions, including hospitals, doctor’s offices, and clinics, were segregated by race, and care at the Black hospitals tended to be lower in quality due to a significant lack of resources and underfunding. Due to further segregation in the education system, the lack of trained Black medical professionals often meant that if Black patients did receive care, they would have to face personal discrimination and inhumane treatment. White physicians could care for any patients, while Black physicians (if granted admitting privileges, which they were often denied) were restricted to the Black wards.[23] Fully integrated hospitals were rare, but even then, Black patients were still required to be treated in separate, subpar wards, and could not share the same room as the White patients.[24] Healthcare segregation had serious clinical consequences, although we may never know the full extent due to a lack of data collection on racial disparities until the first National Health Interview Survey in 1958.[25]

Legally sanctioned hospital discrimination continued as late as the mid-1960s after a Supreme Court decision made segregation illegal in hospitals.[26], [27] However, for Black Americans who lived through Jim Crow, the damage had already been done by creating a legacy of racial health disparities, called the Jim Crow effect. For example, Black women born in Jim Crow states before 1965 were more likely than both White women born in the same era and Black women born later to have estrogen-receptor-negative breast cancer, which is particularly aggressive and difficult to treat. Black women born after the abolition of the racist laws did not experience the same effect, suggesting the racial disparity was a direct product of discrimination.[28] The harm extended through generations, with the infants of Black women born in the early 1960s at a much higher risk of low birth weight than the infants born in the late 1960s.[29] Health disparities along racial lines are not due to innate genomic differences, a hypothesis that has been disproven by multiple researchers over the years, but instead due to the influence of inequitable social factors dating back to even before Jim Crow.[30] Bundled with the appalling social and environmental conditions that Black Americans faced post-Reconstruction, the segregated healthcare system worked recode their poor health outcomes so systematically “as if they were [their] genetic material”.[31]

Despite the elimination of segregation in hospitals in 1965, systematic racism persists on individual, intra-organizational, and extra-organizational levels, again limiting the access to care and quality of treatment of patients of color.

Covid-19 Racial Disparities

The COVID-19 pandemic has exposed racist social determinants of health, which highlight the ways historical injustices still linger today. On April 8^th, 2020, the CDC published their surveillance data of confirmed COVID-19-associated hospitalizations across 14 states. Black Americans were disproportionality affected by the disease, making up 33.1% of patients despite representing only 18% of the catchment population.[32] The same phenomenon was seen in government statistics from cities across the United States. The racial disparities persisted.

There are several ways social determinants of health led to racial disparities associated with COVID-19 infection, morbidity, death, and vaccine rates. Black Americans are more likely to live in densely populated neighborhoods, leading to increased exposure, and areas of lower socioeconomic status, where they have decreased access to health care and lower rates of COVID-19 testing sites.[33] Racial minorities more often work in essential worker settings, such as public transportation, healthcare facilities, factories, and restaurants, where their chances of exposure are higher again due to the nature of their work.[34] Minorities may also have a greater risk of infection due to comorbidities that exacerbate COVID-19 symptoms, such as hypertension, diabetes, or heart disease. [35] Additionally, the historic and current experiences of racism in medical fields have built a strong mistrust of the American healthcare system among racial minorities, which may have extended to vaccine uptake. As of April 4^th, 2022, only 57% of Black Americans had received at least one COVID-19 vaccine dose, compared to 85% of Asian Americans, 65% of Latinx Americans, and 63% of White Americans.[36]

Algorithmic Bias

New medical technologies, including the machine learning algorithms presented in this paper, are built on a foundation of historical and structural racism that has long resulted in inequitable health outcomes for marginalized communities. Without a conscious effort, racial bias can be translated through new, algorithmic channels.

AI systems have successive entry points in their development and management that serve as potential entry points for bias, creating a domino effect of prejudices with alarming consequences.[37] The “bias cascade” begins with the process of data collection. Machines are only as accurate as the data that they receive, so if an algorithm is trained with data with disparities, such as underrepresentation of minority groups or inequitable measurement techniques, the outcome will be similarly skewed.[38] The algorithm developers may even fall victim to their own biases, further compounding any bias in the technology in the way the code is actually written. Lastly, lack of access to healthcare and medical technology of minority groups introduces another level of inequality for healthcare algorithms. If minority groups are not exposed to these technologies, it becomes more difficult to identify biased results, further adding to the misrepresentation of the data if it is used to train the algorithm at a later date.

Analytic Framework

This paper aims to answer the following: What harmful biases are present in predictive digital health algorithms, and how can they best be regulated? Using prognostic models that predict a care course for patients diagnosed with COVID-19, this research follows the analytic framework to offer a conceptual map for evaluating automated decision systems, their development, their outcomes, and their risk of bias.

The models will be pulled from COVID PRECISE, a comprehensive, systematic, and continuously updated review of diagnostic, prognostic, and general population prediction models for COVID-19 and their accuracy, quality, and applicability.[39] COVID PRECISE has a comprehensive list of the three main types of prediction models related to COVID-19; models predicting COVID-19 susceptibility in the population to guide the use of preventative healthcare resources in high-risk areas; models predicting the presence of COVID-19 in patients with symptoms to direct diagnostic capacity to patients with a high probability of having COVID-19, and models predicting a course in patients diagnosed with COVID-19 to allocate hospital resources to those with an estimated poor prognosis. This thesis will focus only on the latter, prognostic models, which are or have been used in clinical settings.

In statistics, bias is used to depict any type of error that may occur when using statistical analyses. Within the context of this thesis, bias will refer to a predicted outcome that is “systematically less favorable to individuals within a particular group and where there are no relevant differences between the groups that justify such harms”.^{^[40]} While the main purpose of this paper is to reveal the risk of harmful racial bias, the analytic framework will also lend itself to exposing other forms of bias, such as gender and income-level, while paying particular attention to any compounding interaction effects.

There are various types of statistical bias that offer a possible cause for racial discrimination. The following will most likely be relevant under this context:

Selection bias occurs when you select the wrong set of individuals, groups, or data in a way that randomization is not achieved.[41] Any calculation using this incorrect dataset will not represent the whole population.[42]
Confirmation bias, a subset of selection bias,is the tendency to favor information that confirms an individual’s beliefs.[43]
Omitted variable bias originates from the absence of a relevant variable in a model, making it inaccurate and underfit.[44]
Susceptibility bias is when cause, effect, and correlation are incorrectly interchanged, without the acknowledgment that correlation does not imply causation.[45]
Funding bias, also known as sponsorship bias, is the tendency to alter a study or its outcome to support a financial sponsor.[46]
Status quo bias is more of a cognitive bias that refers to an exaggerated preference for the status quo, where an individual prefers to keep their context or environment the same as it was before.[47]
Label bias arises when the outcome variable has different meanings across groups and should be anticipated when a proxy is used.[48]

It is important to note however, that even if an algorithm’s outcome is statistically perfect and presents no form of statistical bias through these mechanisms, it is still possible for racial bias to occur. Racism is so engrained into the architecture of American society that it has effectively normalized discrimination as objective reality. Notions of race are embedded into the medical field everywhere from research to clinical practice, from medical school training to insurance claims. Take for instance, the spirometer, a device for measuring and assessing lung function. Drawing on the idea of the once-standard assumption that there are racial differences in lung capacity, the spirometer has a button that produces different measurement of lung normalcy by race.[49] To register at the same level as their White counterparts, Black patients must demonstrate a worse lung function and more severe clinical symptoms. According to prospect theory, decision-making is influenced by options that may rest on biased judgment.[50] A measurement from a spirometer as such would be technically accurate, but only so in the context of a racially biased system. As we observe digital predictive health algorithms for signs of harmful bias, racial or otherwise, it is imperative to consider the broader context under which the medical field is built upon.

The analytic framework consists of seven domains that will all give a distinct insight into the operation of the algorithm: Constitution, Inputs and Outputs, Training Data, Transparency, Outcome, Scale, and Policy. The framework incorporates certain elements from the Prediction Model Study Risk of Bias Assessment Tool (PROBAST), which assesses for both risk of bias and concerns surrounding the applicability of a multivariable diagnostic or prognostic prediction models.[51] The framework expands upon this recently developed risk assessment tool by incorporating details on either side of the algorithms development, including developer diversity, where the training data originates, the scale of the application, and any governance attempts. A brief definition and justification for each domain is included below.

I will observe each domain for signs of statistical bias that are at risk to present inequitable outcomes for historically marginalized groups. Each domain includes a set of descriptive and signaling questions to facilitate structured judgment of the algorithm, which were designed to reveal flaws in the algorithm’s design, conduct, or analysis.[52] The signaling questions (italicized) are phrased so that the answer ‘yes’ indicates no bias, and the answer ‘no’ indicates a gap in the algorithm’s design allowing for the potential for bias introduction. If the information provided by the model’s developers do not give a concrete ‘yes’ or ‘no’ to the signaling questions, I will make an contextual deduction in either direction, with other possible answers being ‘probably yes’ or ‘probably no’. In terms of my own judgement, will tend to assume an answer may be ‘probably no’ rather than ‘probably yes’ from an abundance of caution.

1.Constitution

What is the algorithm and how does it operate?
Who developed the algorithm?
Who funded the development of the algorithm?
Is the team who developed the algorithm representative of traditionally marginalized groups?

Definition

The constitution of an algorithm refers to what makes up its composition and development. It looks at who actually developed the algorithm, what the development team’s diversity make-up is, how the research was funded, how it is supposed to work, and its main objectives.

Justification

As previously mentioned, blind spots or biases that a model presents reflect the priorities and judgments of its creators. If a development team is diverse and representative of marginalized groups, there is a higher probability that potential bias was flagged and acknowledged at the development stage.

2. Inputs and Outputs

What data is inputted in the algorithm? Was this appropriate?
What data is outputted from the algorithm? Was this appropriate?
Were all data exclusions appropriate?

Definition

Every algorithm takes in data (inputs) and produces a different set of data (outputs) based on the specified set of input values. An input is deemed appropriate if it is defined, and assessed in a similar way for patients, has unbiased measurement, and does not act as an unintended race proxy.[53] An outcome is appropriate if it was deemed using data collected for the purpose of the algorithm, standardized across all patients, and determined without information on the predictors, all within an appropriate time interval.[54] Exclusions of participants are appropriate if they did not meet the developer’s inclusion criteria.

Justification

A model’s inputs and outputs may be observable and controllable, and in other cases, they are not. Sometimes a model must use a closely related variable, also known as a proxy, as a stand-in for a data point that could not be measured for some reason. If the relationship between the proxy and intended measure isn’t perfect or the proxy brings an unintended racial element, this could skew the output. In addition, studies that exclude certain participants may produce a biased estimate, as the model would be based on a group that may not be representative of the target population.[55]

3. Training Data

What training data does the algorithm use?
Where does the training data come from?
How often is the training data updated?
Was the training data representative and appropriate?

Definition

Training data is a set of historical data that facilitates an algorithm’s machine learning. These data sets help the algorithm identify the correct outputs for certain scenarios. Using the training data, the algorithm develops a model that maps the inputs to the outputs and is later applied to other scenarios where it will be used again. Under this domain, appropriate training data is defined as accurate, free from bias or discrimination, and representative of marginalized races and genders. For prospective models, the most appropriate use of training data with the lowest risk of bias is through a prospective longitudinal cohort design, where using pre-specified and consistent methods ensure that data is systematically and validly recorded specifically for the design of the model.[56]

Justification

An algorithm is at risk of producing biased results if it relies on inappropriate training data. If the training data that an algorithm uses is biased in any way, the algorithm will be biased as well, as that is what it has been taught to replicate.

4. Transparency

Are information on the algorithm, its decision-making process, training data, and any decisions made readily available and accessible to the public?
Is the information presented in an accessible and understandable way?

Definition

The transparency of a model refers to the availability and accessibility of information about an algorithm, its decision-making process, and the decision that it ultimately makes. Transparency benefits patients who can better understand why the outcomes are the way they are, and developers can identify issues and problems.

Justification

AI transparency helps us to understand the inner workings of a particular model, and can make instances of bias or inaccurate outcomes more visible and consequently easier to fix. If a model has more transparency, it is less likely to have sources of bias that ae able to fly under the radar or public judgment.

5. Outcome

What was the outcome?
Was there a standard outcome definition used?
Were there a reasonable number of participants with the outcome?
Were participants with missing data handled appropriately?
Were relevant model overfitting in model performance accounted for?
Was the outcome determined in an identical way for all participants?
Was the outcome accuracy constant between groups of participants?
Is there a mechanism for human oversight or judgement if deemed appropriate?

Definition

This domain relates to the risk of bias in the definition and determination of a model’s outcome. Appropriate handling refers to optimal data analysis methods. In terms of the number of participants, the events per variable (EVP) rule of thumb will be applied for different racial groups. This rule recommends that at least 10 individuals need to have developed the outcome for every predictor variable included in the model.[57] If the EPV is equal or more than 20, this generally eliminates bias in regression coefficients.[58]

Justification

By observing a model’s outcomes and the specific choices it took to get there, such as its definition, how missing data was handled, and the general accuracy among groups of participants, we can detect the points that have a potential for biased results. It most of these studies, accuracy is measured using the Area Under the Curve (AUC), which represents an aggregate measure of a model’s performance based on a curve plotting the true positive rate and the false positive rate.[59] This number shows the probability that a model ranks a random positive example correctly. If the researchers report this measure for each racial group, it is possible to see the effectiveness of the algorithm across races, allowing us to see if there is any unintended bias. If the algorithm has a specific mechanism for human oversight in the case of biased results, this may potentially alleviate risk of algorithmic bias presenting as harmful outcomes.

6. Scale

Is the algorithm being used in a clinical setting?
If yes, at what scale?
Is this appropriate?

Definition

A model’s scale refers to the ability of an algorithm to grow exponentially, and if it is being used as such. For an algorithm’s scale to be appropriate, it needs to still be relevant for the additional contexts, including its training data, inputs, and outputs.

Justification

When an algorithm is used to aid or replace decision making on a large scale, there are more opportunities for any bias to become exponentially harmful.[60] The risk of bias increases additionally if the algorithm is used in contexts that it wasn’t originally designed for. Models that have been externally validated are more likely to have appropriate use on a larger scale.

7. Policy

Are there performance evaluation measures in place and were they used appropriately?
Are there any existing policies governing the use of the algorithm?
Who is held accountable for any consequences of the algorithm?

Definition

The policy domain observes any pre-existing U.S. regulation or evaluation frameworks that attempt to regulate or mitigate the effects of harmful bias. An appropriate performance evaluation would aim to measure not only the accuracy of an algorithm, but its precision, specificity, accuracy among different populations, AUC, and external validity.

Justification

If there were previous attempts at regulating the use of an algorithm or evaluating its performance, the risk for bias is already much lower as the effects of the bias are acknowledged and mitigated. The creation of the policy itself if already a channel for bias mitigation, and its presence conveys an awareness of the potential impacts of the algorithms outcome.

I hypothesize that there are four possible answers to my original question: that there is no bias present (0 no’s), low risk for harmful bias (1-5 no’s), medium risk for harmful bias, (6 -10 no’s) and high risk for harmful bias (11-17 no’s). A ‘probably no’ answer will count as half of a definite ‘no’ (.5). The answer ‘probably yes/no’ was later added due to the lack of transparency from the research papers once the analysis began. Depending on where the risk of bias introduces itself, I will give policy recommendations for how best to alleviate this within an AI model and for regulating that risk going forward.

The boundaries between high risk, medium risk, and low risk were created by dividing the signaling questions into three equal segments. It is also important to note, however, that in reality, not every bias mechanism is equal. Some types may be more harmful than others. This is outside the scope of this thesis, which intends to observe where the bias is coming from, so each bias domain will be weighted equally.

To answer my main research question, I carried out a meta-analysis of prognostic COVID-19 models developed for clinical use and with patients based in the US.

To select the models used within the context of this research, I searched the database of the COVID-PRECISE: Living Review. COVID PRECISE is a comprehensive systematic review of all COVID-19 diagnostic, prognostic, and general population prediction models, and is frequently updated to reflect emerging evidence.[61] It was most recently updated on 12^th January 2021. Their list was created by searching for publicly available COVID-19 prediction models and includes preprints and peer reviewed articles with a focus on studies published in English. Reassessment took place after the publication of a preprint in a peer reviewed journal. In total, there were 238 studies collated.[62]

I then narrowed down the models to fit the purpose of this research. I isolated prognostic models with target populations in the US in all stages of model evaluation, including development, internal review, external review, and all. I then found the studies in question through a general search and searches on the PubMed and medRxiv, removing repeated models or models that had been updated. Titles, abstracts, and full texts were screened for eligibility. This narrowed the list of studies down to five unique models: the Northwell COVID-19 Survival (NOCOS) Calculator, the Epic Deterioration Index, the Risk Prioritization Tool, the Extreme Gradient Boosting Model (XGBoost), and the Towards a COVID Score (TACS) model. The TACS model is the only study that has not been peer reviewed to date.

The Analytic Framework was then applied to each model, allowing for a systematic analysis for potential bias introduction with the descriptive and signaling questions (see Figure 1).

Of the five models analyzed, all five presented with a medium risk of bias. None presented with a high risk of bias, low risk of bias, or no bias at all. Every domain had an instance of potential bias in at least one model; Constitution, Training Data, Transparency, and Scale were flagged in three out of the five models; Inputs and Outputs and Outcome were flagged in four out of the five models, and Policy was flagged in all five.

Constitution

Of the five models, one model, the Risk Prioritization tool, was definitively not diverse in its development team. The Epic Deterioration Index received a ‘probably no’, as it was developed by the Epic Systems Corporation, which is a privately held health record giant. No information was provided about the researchers and engineers on this project specifically, but the overall lack of diversity in the medical technology industry points to the likelihood of underrepresentation of minority groups, even in such a large company. The TACS score received a ‘probably no’ in this category as well for the same reason, as demographic information was not available for all of the authors.

Inputs And Outputs

Only one model, TACS, had inputs that were deemed appropriate. All the others were rated inappropriate due to the inclusion of oxygen saturation levels. As previously mentioned, oxygen saturation levels are measured through pulse oximetry. Some, but not all, models of pulse oximeters report higher, inaccurate values with increasing skin pigmentation due to the initial calibration of light transmission only on lighter-skinned individuals.[63] This results in an overestimation of blood oxygenation, leading to the delayed recognition of severe oxygen levels.[64] The NOCOS calculator also included absolute neutrophil counts, which is similarly problematic. Recent studies have shown that racial and ethnic groups have unique distributions in laboratory tests for absolute neutrophil counts, meaning that specific reference intervals may be needed for a more accurate diagnosis.[65], [66]

All the models included appropriate outputs, including the probability of survival, risk of ICU transfer, death, or critical illness versus survival/discharge.

The NOCOS calculator, Epic Deterioration Index, XGBoost, and TACS model all received a definitive yes for the appropriateness of their data exclusions. The Risk Prioritization Tool received a ‘probably yes’ due to a lack of information provided in the study, but as it was peer-reviewed it is likely that the exclusions were handled correctly.

Training Data

The NOCOS Calculator, Risk Prioritization Tool, and XGBoost all were developed using training data that was representative of gender and racial groups. The Epic Deterioration Index received a ‘probably yes’ for representativeness- Epic is one of the largest healthcare software vendors in the world and includes medical records for nearly 56% of the US population. Therefore, it is safe to assume that they have representative data across gender and race. The TACS model, however, received a ‘probably no’. While the study provided a breakdown of the training data by gender (52.9% female), there is no information on specific racial groups. The patients included were those only admitted to ICUs, which have varying admittance rates across racial groups in the U.S.

The Epic Deterioration Index received a ‘probably yes’ for the appropriateness of its training data. Again, this training data was most likely the largest dataset due to the size of the company. However, the data was not initially collected for the purpose of predicting COVID-19, which is why the yes was not a definitive one. XGBoost also received a ‘probably yes’. While the training data was compiled from multiple hospitals, the researchers reviewed the datasets to ensure consistency. The NOCOS Calculator received a score of ‘probably no’ for the appropriateness. Due to it being a retrospective longitude study, not all laboratory tests were completed on all of the patients, and the performance of those variables could not adequately be assessed. The Risk Prioritization Tool also scored ‘probably no’. The authors quoted a low sample size in the initial training dataset set and a class imbalance due to low ICU transfer rates, resulting in low model precision. In addition, the training data was derived from a single hospital, so the outcome may not be easily generalized to other settings. These are potential examples of selection bias, where the data set does not represent a whole population.

The TACS model’s training data, however, was not appropriate. The dataset was created from patient data collected prior to the beginning of the COVID-19 pandemic, leaving the authors to make large assumptions connecting general respiratory diseases to COVID-19. This, paired with the probable lack of presentation and the significantly small sample size, deemed it inappropriate for a model of this nature. This is an additional example of selection bias, and also susceptibility bias, where cause, effect, and correlation are interchangeable.

Transparency

All of the models, excluding the Epic Deterioration Index, had information about the workings of the algorithm available to the public. However, only the NOCOS Calculator and the TACS Model presented this information in an accessible and understandable way with their online tools. The Epic Deterioration Index, Risk Prioritization Tool, and XGBoost only presented information in the academic studies used for this analysis, which were quite formal and directed to those very familiar with the medical field. An average patient would not be able to fully understand how the algorithm worked based on this alone.

Outcome

All the models used a standard definition for their outcomes, and the outcome was determined in an identical way for all participants. However, the Risk Prioritization Tool, XG Boost, and TACS Model did not have a reasonable number of participants with the outcome, at 3.7%, 2.6-22.4%, 1.1%-8.0%, and 7.3%/4.5% respectively for each of their defined outcomes.

XGBoost handled their missing data appropriately by directing missing values through split points to minimize loss and removed values that were missing in over 30% of patients. The Epic Deterioration Index, Risk Prioritization Tool, and TACS model received a ‘probably yes’, as missing values were inputted with the mean value. The NOCOS Calculator scored as ‘probably no’, as participants with missing data were excluded, even though they may still have had the outcome.

XGBoost was the only model that did not account for model overfitting.

The Epic Deterioration Index had constant accuracy rates between different groups of participants. The other models, however, did not provide any information on different rates of accuracy, and such received a score of ‘probably no’.

The NOCOS Calculator, Epic Deterioration Index, and Risk Prioritization Tool all scored as ‘probably yes’ for providing a mechanism for human oversight. The NOCOS Calculator’s online tool contained a disclaimer stating that the tool does not give professional advice and that physicians and other healthcare professionals should use their own clinical judgment. The researchers of the Epic Deterioration Index recommended physicians interpret individual scores based on a specific threshold rather than purely on the score itself, which may fluctuate in the 15-minute intervals the algorithm provided. The authors of the Risk Prioritization Tool made a similar statement, saying the precision of the tool was limited and that it wouldn’t be practical to perform labor-intensive interventions for all patients who registered as high risk, so the physician’s judgment would need to be introduced. The XGBoost and TACS Model did not offer any recommendations for human oversight.

Scale

The Risk Prioritization Tool and XGBoost scored a ‘yes’ on the appropriateness of its scale due to the likelihood that it was not being used in a clinical setting. The Epic Deterioration received a score of ‘probably yes’, as it was originally developed for uses outside of COVID-19 but has already been used successfully on a large scale. However, the NOCOS Calculator and TACS model both received a definitive ‘no’. Their models are both published online publicly, so there is a possibility that it is being used in settings at a scale that is not appropriate, especially due to the risk of bias presented in the previously mentioned domains.

Policy

None of the models provided performance evaluation measures, and none are concretely governed by any existing policy.

Through the analysis of five unique prognostic COVID-19 models developed for clinical use within the US, all the models presented a risk of harmful racial bias to some degree, with every domain flagged at least three times. While this paper only observed a small number of models, it was apparent that the risk for bias is very present, particularly through selection and susceptibility bias. Contextual deductions had to be made at certain points (displayed by the answer ‘probably yes/no’) due to the lack of transparency in the research papers written by the developers. This was particularly troubling considering that transparency was a domain in of itself. Another distributing detail was the overarching lack of governance in AI for such a high-risk and historically biased setting, especially for systems that may be used at a clinical scale. This central governance issue goes beyond healthcare.

There is an immense gap for concrete, actionable laws to further regulate this space, and mitigate the effects of biased algorithms in healthcare. As of now, most of the existing regulation outlines best practices for AI privacy, transparency, accountability, governance, and innovation in general or commercial use.

To address the challenges and enhance the benefits brought forth by AI for healthcare, there are a number of actions that policymakers, federal agencies, state and local governments, developers, and academics can make. These can all be built upon existing self-regulatory best practices and official public policies and are not proposed instead of these. While these frameworks do provide an adequate “first step” towards both fairer technology and a fairer healthcare system, they need to be expanded upon and implemented efficiently and effectively to mitigate the effects of harmful feedback loops in an already biased system, particularly amidst a global pandemic.

1. Establish and Enforce Well-Defined National Ethics Standards

In collaboration with relevant stakeholders, developers, and experts, policymakers need to establish a set of best practices that need to be fulfilled before models are approved and used in clinical settings. While the models that do not fit the requirements may still be published, they should not be used by hospitals on any scale until the national standards governing the system are satisfied.

These standards should incorporate almost all of the domains included in this paper; Constitution, Inputs and Outputs, Training Data, Transparency, Outcome, and Scale, while building upon the recently developed guidelines from the Algorithmic Accountability Act, National AI Institute, NIST, and HDD. Furthermore, they should add an established role for human judgment in the outcome of an AI system. This would allow physicians, nurses, and hospital staff to dispute the decision of a model an outcome seems inaccurate, inappropriate, or biased in any way. This would add to the culture shift proposed by NIST in the knowledge that our technology is not always accurate or fair, especially in a healthcare, a high-risk application of AI, where a biased decision could mean life or death for a marginalized patient.

The standards should also enforce regular bias audits carried out by an independent, third-party to ensure that the algorithm remains unbiased as it’s used continuously and at scale. This could help flag any unintended consequences of an AI stem as they arise and prevent a bias cascade before too much damage is done. Auditors can then recommend any corrective actions to alleviate or remove any adverse effects, [67] and provide an additional mechanism for enforced transparency, accountability requirements, and continuous monitoring.

2. Diversify the AI Workforce

A more diverse AI community would be better equipped to anticipate and identify algorithmic bias to combat racial bias from the outset of development.[68] The persistent underrepresentation of minorities in the tech workforce needs to be corrected. To do so, funding should be directed on multiple fronts, with a primary focus on AI education programs in marginalized communities, subsidized apprenticeships, and increased access to learning tools and opportunities.

3. Invest in Transparent Data Access Systems

As proposed by the U.S. Government Accountability Office, policymakers should develop high-quality data access mechanisms where developers can store, share, access, and interact with medical data securely and transparently.[69] Patients could also potentially have access to their own data and know if and how it is being used. This could drastically improve the quality and transparency of training data by expanding access to information from marginalized communities. While this data might still carry the systematic bias running through the U.S. healthcare system, misrepresentation would have less of an impact on an algorithm’s output.

4. Improve Biased Measurement Tools

In an effort to reduce the pre-existing bias in the U.S. healthcare system, policymakers should direct funding toward the research and development of unbiased medical measurement tools. Even if an algorithm meets the requirements of a set of standards and is built using representative training data, it may still risk reproducing bias if the original data is not free from it itself. The few mentioned in this paper are the spirometer, pulse oximeter and absolute neutrophil counts. These need to be addressed, especially under the context of COVID-19 prediction, but additional studies should be done to ensure similar phenomena are not occurring across other institutionalized measurement tools.

The purpose of this paper was to answer the following: What harmful racial biases are present in predictive digital health algorithms, and how can they best be regulated? In the five unique US-based COVID-19 prognostic models analyzed, the potential for harmful bias presented in everyone across different, successive entry points, which presented a risk of bias in every domain. None of the models reached the ‘high risk’ threshold, and perhaps that is a good sign for the future of AI in healthcare, but any instance of bias is one that should be taken seriously when making critical decisions, such as resource allocation during a global pandemic.

Regulation for AI, both in healthcare and otherwise, must address the multi-faceted, contextual causes and effects of harmful racial bias. While crucial, correcting the technical elements of a system, namely training data, standardized outcomes, accuracy, and data exclusions, is unfortunately not enough to rectify the countless years of structural racism that have left the American healthcare system saturated with White supremacy. Even if an AI model were to work at its’ peak accuracy, we cannot be sure that the data it uses, produces, and recycles wasn’t biased to begin with. We cannot be sure that a developer will not decide to use an input that serves as a poor proxy for race. Even further, we cannot be sure that a physician would act on their own individual, perhaps unconscious racial bias, effecting their medical decision-making.

Previous bias-mitigating regulation attempts have taken the first steps towards acknowledging that our technology can magnify and exacerbate human bias. However, to fully address racial bias in medical AI technology, policymakers and developers need to acknowledge the stratification of our society and their place in it. Technology is not neutral, and nor has it ever been. It is a “culmination of particular tools, people, and power structures that foreground one way of seeing or judging over another.”[70] Ignoring the implications of this can be dangerous; streamlining discrimination under the guise of algorithmic accuracy and efficiency will only deepening the cycles of inequity we have been trying to reverse. This is exceptionally pressing as our society’s attention and investments increasingly turn to AI to automate critical decision-making processes. Funding for AI companies has increased exponentially over the past decade, with the global market projected to reach 126 billion U.S. dollars by 2025.[71] Scrapping this technology is not the answer, and it doesn’t have to be. Despite its glaring issues, this sort of technology has the potential to right some of the wrongs that we’ve established as a society. It can sort through tremendous amounts of data in record time, identifying mistakes and inconsistencies in minutes that may have been otherwise overlooked by a human eye. It can streamline costly and arduous tasks, saving both time and resources, and offers massive opportunities for economic development. The transformative power of AI can be tremendously positive, but only if developers and regulators are naming and explaining the power dynamics and systems of oppression that are baked into our technology and collectively in our daily lives.

Data Availability

The datasets that support the findings of this study are publicly available in the COVID-PRECISE: Living Review, https://www.covprecise.org/living-review/

Adkins-Jackson, P. B., Legha, R. K., & Jones, K. A. (2021). How to Measure Racism in Academic Health Centers. AMA Journal of Ethics, 23(2), E140-145. https://doi.org/10.1001/amajethics.2021.140
Algorithmic Accountability Act of 2022. (2022). https://www.wyden.senate.gov/imo/media/doc/2022-02-03%20Algorithmic%20Accountability%20Act%20of%202022%20One-pager.pdf
AlHasan, A. (2021). Bias in medical artificial intelligence. The Bulletin of the Royal College of Surgeons of England, 103(6), 302–305. https://doi.org/10.1308/rcsbull.2021.111
Alston, P. (2019, October 17). World stumbling zombie-like into a digital welfare dystopia, warns UN human rights expert. United Nations Human Rights Office. https://www.ohchr.org/en/press-releases/2019/10/world-stumbling-zombie-digital-welfare-dystopia-warns-un-human-rights-expert
American Medical Association. (2018). Augmented intelligence in health care* 1 * Content derived from Augmented Intelligence (AI) in Health Care (Annual Meeting 2018). American Medical Association. https://www.ama-assn.org/system/files/2019-01/augmented-intelligence-policy-report.pdf
Ashana, D. C., Anesi, G. L., Liu, V. X., Escobar, G. J., Chesley, C., Eneanya, N. D., Weissman, G. E., Miller, W. D., Harhay, M. O., & Halpern, S. D. (2021). Equitably Allocating Resources During Crises: Racial Differences in Mortality Prediction Models. American Journal of Respiratory and Critical Care Medicine, 204(2). https://doi.org/10.1164/rccm.202012-4383oc
Aspen Institute. (n.d.). Glossary for Understanding the Dismantling Structural Racism/Promoting Racial Equity Analysis. https://www.aspeninstitute.org/wp-content/uploads/files/content/docs/rcc/RCC-Structural-Racism-Glossary.pdf
Barocas, M. R. and S. (2019, December 6). Challenges for mitigating bias in algorithmic hiring. Brookings. https://www.brookings.edu/research/challenges-for-mitigating-bias-in-algorithmic-hiring/
Benjamin, R. (2019a). Race after technology : abolitionist tools for the new Jim code. Polity.
Benjamin, R. (2019b). Assessing risk, automating racism. Science, 366(6464), 421–422. https://doi.org/10.1126/science.aaz3873
Bhakta, N. R., Kaminsky, D. A., Bime, C., Thakur, N., Hall, G. L., McCormack, M. C., & Stanojevic, S. (2021). Addressing Race in Pulmonary Function Testing by Aligning Intent and Evidence With Practice and Perception. Chest. https://doi.org/10.1016/j.chest.2021.08.053
Bigman, Y. E., Yam, K. C., Marciano, D., Reynolds, S. J., & Gray, K. (2021). Threat of racial and economic inequality increases preference for algorithm decision-making. Computers in Human Behavior, 122, 106859. https://doi.org/10.1016/j.chb.2021.106859
Brault, N., & Saxena, M. (2020). For a critical appraisal of artificial intelligence in healthcare: The problem of bias in mHealth. Journal of Evaluation in Clinical Practice, 27(3). https://doi.org/10.1111/jep.13528
Byrne, M. D. (2021). Reducing Bias in Healthcare Artificial Intelligence. Journal of PeriAnesthesia Nursing, 36(3), 313–316. https://doi.org/10.1016/j.jopan.2021.03.009
Camerer, C. F. (1998). Prospect Theory in the Wild: Evidence From the Field. In Cal Tech Library. California Institute of Technology. https://authors.library.caltech.edu/80314/1/sswp1037.pdf
Castelluccia, C., & Le Métayer, D. (2019). Understanding algorithmic decision-making: Opportunities and challenges. European Parliamentary Research Service. https://www.europarl.europa.eu/RegData/etudes/STUD/2019/624261/EPRS_STU(2019)624261_EN.pdf
CDC. (2020, December 10). COVID-19 Racial and Ethnic Health Disparities. Centers for Disease Control and Prevention. https://www.cdc.gov/coronavirus/2019-ncov/community/health-equity/racial-ethnic-disparities/index.html
CDC. (2021, November 24). Racism and Health. Centers for Disease Control and Prevention. https://www.cdc.gov/healthequity/racism-disparities/index.html#:~:text=The%20data%20show%20that%20racial
Centers for Disease Control and Prevention. (2020, April 30). Health Equity Considerations and Racial and Ethnic Minority Groups. Centers for Disease Control and Prevention. https://www.cdc.gov/coronavirus/2019-ncov/community/health-equity/race-ethnicity.html
Char, D. S., Shah, N. H., & Magnus, D. (2018). Implementing Machine Learning in Health Care — Addressing Ethical Challenges. New England Journal of Medicine, 378(11), 981–983. https://doi.org/10.1056/nejmp1714229
Chen, I. Y., Joshi, S., & Ghassemi, M. (2020). Treating health disparities with artificial intelligence. Nature Medicine, 26(1), 16–17. https://doi.org/10.1038/s41591-019-0649-2
Cheng, F.-Y., Joshi, H., Tandon, P., Freeman, R., Reich, D. L., Mazumdar, M., Kohli-Seth, R., Levin, M. A., Timsina, P., & Kia, A. (2020). Using Machine Learning to Predict ICU Transfer in Hospitalized COVID-19 Patients. Journal of Clinical Medicine, 9(6). https://doi.org/10.3390/jcm9061668
Coggon, D., Rose, G., & Barker, D. (2019). Chapter 4. Measurement error and bias | The BMJ. Bmj.com. https://www.bmj.com/about-bmj/resources-readers/publications/epidemiology-uninitiated/4-measurement-error-and-bias
Colvonen, P. J. (2021). Response To: Investigating sources of inaccuracy in wearable optical heart rate sensors. Npj Digital Medicine, 4(1). https://doi.org/10.1038/s41746-021-00408-5
Council, J. (2020, March 20). Hospitals Tap AI to Help Manage Coronavirus Outbreak. Wall Street Journal. https://www.wsj.com/articles/hospitals-tap-ai-to-help-manage-coronavirus-outbreak-11584696601#:~:text=Jared%20Council
COVID PRECISE. (2020). COVID PRECISE: Living Review. Www.covprecise.org. https://www.covprecise.org/living-review/
COVID PRECISE. (2021). COVID Precise. Www.covprecise.org. https://www.covprecise.org/project/
Dankwa-Mullan, I., Scheufele, E. L., Matheny, M. E., Quintana, Y., Chapman, W. W., Jackson, G., & South, B. R. (2021). A Proposed Framework on Integrating Health Equity and Racial Justice into the Artificial Intelligence Development Lifecycle. Journal of Health Care for the Poor and Underserved, 32(2S), 300–317. https://doi.org/10.1353/hpu.2021.0065
Davenport, T., & Kalakota, R. (2019). The potential for artificial intelligence in healthcare. Future Healthcare Journal, 6(2), 94–98. ncbi. https://doi.org/10.7861/futurehosp.6-2-94
Deaton, A., & Lubotsky, D. (2003). Mortality, inequality and race in American cities and states. Social Science & Medicine, 56(6), 1139–1153. https://doi.org/10.1016/s0277-9536(02)00115-6
Debray, T. P., Damen, J. A., Riley, R. D., Snell, K., Reitsma, J. B., Hooft, L., Collins, G. S., & Moons, K. G. (2018). A framework for meta-analysis of prediction model studies with binary and time-to-event outcomes. Statistical Methods in Medical Research, 28(9), 2768–2786. https://doi.org/10.1177/0962280218785504
Diao, J. A., Inker, L. A., Levey, A. S., Tighiouart, H., Powe, N. R., & Manrai, A. K. (2021). In Search of a Better Equation — Performance and Equity in Estimates of Kidney Function. New England Journal of Medicine. https://doi.org/10.1056/nejmp2028243
Dilip Soman, Janice Gross Stein, & Wong, J. (2014). Innovating for the global south : towards an inclusive innovation agenda. University Of Toronto Press.
Dugdale, C. M., Rubins, D. M., Lee, H., McCluskey, S. M., Ryan, E. T., Kotton, C. N., Hurtado, R. M., Ciaranello, A. L., Barshak, M. B., McEvoy, D. S., Nelson, S. B., Basgoz, N., Lazarus, J. E., Ivers, L. C., Reedy, J. L., Hysell, K. M., Lemieux, J. E., Heller, H. M., Dutta, S., & Albin, J. S. (2021). Coronavirus Disease 2019 (COVID-19) Diagnostic Clinical Decision Support: A Pre-Post Implementation Study of CORAL (COvid Risk cALculator). Clinical Infectious Diseases, 73(12), 2248–2256. https://doi.org/10.1093/cid/ciab111
Ehmann, M. R., Zink, E. K., Levin, A. B., Suarez, J. I., Belcher, H. M. E., Daugherty Biddison, E. L., Doberman, D. J., D’Souza, K., Fine, D. M., Garibaldi, B. T., Gehrie, E. A., Golden, S. H., Gurses, A. P., Hill, P. M., Hughes, M. T., Kahn, J. P., Koch, C. G., Marx, J. J., Meisenberg, B. R., & Natterman, J. (2021). Operational Recommendations for Scarce Resource Allocation in a Public Health Crisis. Chest, 159(3), 1076–1083. https://doi.org/10.1016/j.chest.2020.09.246
Electronic Privacy Information Center. (2022, January 26). Two key AI transparency measures from Executive Orders remain largely unfulfilled past deadlines. EPIC - Electronic Privacy Information Center. https://epic.org/unfulfilled-ai-executive-orders/
Embi, P. J. (2021). Algorithmovigilance—Advancing Methods to Analyze and Monitor Artificial Intelligence–Driven Health Care for Effectiveness and Equity. JAMA Network Open, 4(4), e214622. https://doi.org/10.1001/jamanetworkopen.2021.4622
Ferryman, K. (2020). Addressing health disparities in the Food and Drug Administration’s artificial intelligence and machine learning regulatory framework. Journal of the American Medical Informatics Association, 27(12). https://doi.org/10.1093/jamia/ocaa133
Food and Drug Administration. (2021). Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD) Action Plan. https://www.fda.gov/media/145022/download
Forsee Medical. (n.d.). Medical Algorithms For Treatment & Diagnostic Testing. ForeSee Medical. https://www.foreseemed.com/medical-algorithms
Fraser, K. C., Meltzer, J. A., & Rudzicz, F. (2015). Linguistic Features Identify Alzheimer’s Disease in Narrative Speech. Journal of Alzheimer’s Disease, 49(2), 407–422. https://doi.org/10.3233/jad-150520
Friedman, E. (2014). U.S. Hospitals and the Civil Rights Act of 1964. Hhnmag.com. https://www.hhnmag.com/articles/4179-u-s-hospitals-and-the-civil-rights-act-of-1964
Garg, S. (2020). Hospitalization Rates and Characteristics of Patients Hospitalized with Laboratory-Confirmed Coronavirus Disease 2019 — COVID-NET, 14 States, March 1–30, 2020. MMWR. Morbidity and Mortality Weekly Report, 69(15). https://doi.org/10.15585/mmwr.mm6915e3
Google. (2019). Classification: ROC Curve and AUC | Machine Learning Crash Course. Google Developers. https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc
Guillamet, M. C. V., Guillamet, R. V., Kramer, A. A., Maurer, P. M., Menke, G. A., Hill, C. L., & Knaus, W. A. (2020). Toward a COVID-19 Score- Risk Assessments and Registry. MedRxiv. https://doi.org/10.1101/2020.04.15.20066860
HealthITAnalytics. (2020, September 23). Predictive Analytics Model Can Determine COVID-19 Outcomes. HealthITAnalytics. https://healthitanalytics.com/news/predictive-analytics-model-can-determine-covid-19-outcomes
Hunkele, K. (2014). Honors Theses and Capstones Student Scholarship Segregation in United States Healthcare: From Reconstruction to Deluxe Jim Crow. https://scholars.unh.edu/cgi/viewcontent.cgi?article=1189&context=honors
Jin, J., Agarwala, N., Kundu, P., Harvey, B., Zhang, Y., Wallace, E., & Chatterjee, N. (2020). Assessment of Individual- and Community-level Risks for COVID-19 Mortality in the US and Implications for Vaccine Distribution. Nature Medicine. https://doi.org/10.1101/2020.05.27.20115170
Jong, Y., Ramspek, C. L., Zoccali, C., Jager, K. J., Dekker, F. W., & Diepen, M. (2021). Appraising prediction research: a guide and meta‐review on bias and applicability assessment using the Prediction model Risk Of Bias ASsessment Tool (PROBAST). Nephrology, 12. https://doi.org/10.1111/nep.13913
Krieger, N., Jahn, J. L., & Waterman, P. D. (2016). Jim Crow and estrogen-receptor-negative breast cancer: US-born black and white non-Hispanic women, 1992–2012. Cancer Causes & Control, 28(1), 49–59. https://doi.org/10.1007/s10552-016-0834-2
Kusner, M. J., & Loftus, J. R. (2020). The long road to fairer algorithms. Nature, 578(7793), 34–36. https://doi.org/10.1038/d41586-020-00274-3
Landers, R. N., & Behrend, T. S. (2022, February 14). APA PsycNet. Psycnet.apa.org. https://psycnet.apa.org/fulltext/2022-30899-001.html
Larrazabal, A. J., Nieto, N., Peterson, V., Milone, D. H., & Ferrante, E. (2020). Gender imbalance in medical imaging datasets produces biased classifiers for computer-aided diagnosis. Proceedings of the National Academy of Sciences, 117(23), 12592–12594. https://doi.org/10.1073/pnas.1919012117
Larson, J., Mattu, S., Kirchner, L., & Angwin, J. (2016, May 23). How We Analyzed the COMPAS Recidivism Algorithm. ProPublica; ProPublica. https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm
Levy, T. J., Richardson, S., Coppa, K., Barnaby, D. P., McGinn, T., Becker, L. B., Davidson, K. W., Hirsch, J. S., Zanos, T. P., Cohen, S. L., Debnath, S., Dominello, A. J., Falzon, L., Gitman, M., Kim, E.-J., Makhnevich, A., Mogavero, J. N., Molmenti, E. P., Paradis, M. d., & Tóth, V. (2020). A predictive model to estimate survival of hospitalized COVID-19 patients from admission data. MedRxiv. https://doi.org/10.1101/2020.04.22.20075416
Lim, E., Miyamura, J., & Chen, J. J. (2015). Racial/Ethnic-Specific Reference Intervals for Common Laboratory Tests: A Comparison among Asians, Blacks, Hispanics, and White. Hawai’i Journal of Medicine & Public Health, 74(9), 302–310. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4578165/
Lim, E.-M. ., Cembrowski, G., Cembrowski, M., & Clarke, G. (2010). Race-specific WBC and neutrophil count reference intervals. International Journal of Laboratory Hematology, 32(6p2), 590–597. https://doi.org/10.1111/j.1751-553x.2010.01223.x
Lwowski, B., & Rios, A. (2021). The risk of racial bias while tracking influenza-related content on social media using machine learning. Journal of the American Medical Informatics Association, 24(4). https://doi.org/10.1093/jamia/ocaa326
Martin, J. (2016). Simkins v. Cone (1963). North Carolina History Project. http://northcarolinahistory.org/encyclopedia/simkins-v-cone-1963
McCradden, M. D., Joshi, S., Mazwi, M., & Anderson, J. A. (2020). Ethical limitations of algorithmic fairness solutions in health care machine learning. The Lancet Digital Health, 2(5), e221–e223. https://doi.org/10.1016/s2589-7500(20)30065-0
McDermott, R. (2016). Prospect theory | economics. In Encyclopædia Britannica. https://www.britannica.com/topic/prospect-theory
Moons, K. G. M., de Groot, J. A. H., Bouwmeester, W., Vergouwe, Y., Mallett, S., Altman, D. G., Reitsma, J. B., & Collins, G. S. (2014). Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies: The CHARMS Checklist. PLoS Medicine, 11(10), e1001744. https://doi.org/10.1371/journal.pmed.1001744
Moons, K. G. M., Wolff, R. F., Riley, R. D., Whiting, P. F., Westwood, M., Collins, G. S., Reitsma, J. B., Kleijnen, J., & Mallett, S. (2019). PROBAST: A Tool to Assess Risk of Bias and Applicability of Prediction Model Studies: Explanation and Elaboration. Annals of Internal Medicine, 170(1), W1. https://doi.org/10.7326/m18-1377
Nabi, J., Adam, A., Kostelanetz, S., & Syed, S. (2021). Updating Race-Based Risk Assessment Algorithms in Clinical Practice: Time for a Systems Approach. The American Journal of Bioethics, 21(2), 82–85. https://doi.org/10.1080/15265161.2020.1861365
National Artificial Intelligence Office. (n.d.). ABOUT. National Artificial Intelligence Initiative. https://www.ai.gov/about/
National Park Service. (2018). Jim Crow Laws.
Newkirk, V. R. (2016, May 18). Segregated Health Care. The Atlantic; The Atlantic. https://www.theatlantic.com/politics/archive/2016/05/americas-health-segregation-problem/483219/
O’Neil, C. (2018). Weapons of math destruction : how big data increases inequality and threatens democracy. Penguin Books.
Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464), 447–453. https://doi.org/10.1126/science.aax2342
Oct 06, S. H. P., & 2021. (2021, October 6). Latest Data on COVID-19 Vaccinations by Race/Ethnicity. KFF. https://www.kff.org/coronavirus-covid-19/issue-brief/latest-data-on-covid-19-vaccinations-by-race-ethnicity/
October 16, P. M. O. published on P. O., & 2018. (2018, October 16). Governments’ use of AI in immigration and refugee system needs oversight. Policy Options. https://policyoptions.irpp.org/magazines/october-2018/governments-use-of-ai-in-immigration-and-refugee-system-needs-oversight/.
Office, U. S. G. A. (2020, November 30). Artificial Intelligence in Health Care: Benefits and Challenges of Technologies to Augment Patient Care. Www.gao.gov. https://www.gao.gov/products/gao-21-7sp
Ogundimu, E. O., Altman, D. G., & Collins, G. S. (2016). Adequate sample size for developing prediction models is not simply related to events per variable. Journal of Clinical Epidemiology, 76(76), 175–182. https://doi.org/10.1016/j.jclinepi.2016.02.031
Paradies, Y., Truong, M., & Priest, N. (2013). A Systematic Review of the Extent and Measurement of Healthcare Provider Racism. Journal of General Internal Medicine, 29(2), 364–387. https://doi.org/10.1007/s11606-013-2583-1
Parikh, R. B., Teeple, S., & Navathe, A. S. (2019). Addressing Bias in Artificial Intelligence in Health Care. JAMA, 322(24), 2377. https://doi.org/10.1001/jama.2019.18058
Paulus, J. K., & Kent, D. M. (2020). Predictably unequal: understanding and addressing concerns that algorithmic clinical prediction may increase health disparities. Npj Digital Medicine, 3(1). https://doi.org/10.1038/s41746-020-0304-9
Peralta, C. A., Katz, R., DeBoer, I., Ix, J., Sarnak, M., Kramer, H., Siscovick, D., Shea, S., Szklo, M., & Shlipak, M. (2011). Racial and Ethnic Differences in Kidney Function Decline among Persons without Chronic Kidney Disease. Journal of the American Society of Nephrology, 22(7), 1327–1334. https://doi.org/10.1681/asn.2010090960
Pham, Q., Gamble, A., Hearn, J., & Cafazzo, J. A. (2021). The Need for Ethnoracial Equity in Artificial Intelligence for Diabetes Management: Review and Recommendations. Journal of Medical Internet Research, 23(2), e22320. https://doi.org/10.2196/22320
Pierson, E., Cutler, D. M., Leskovec, J., Mullainathan, S., & Obermeyer, Z. (2021). An algorithmic approach to reducing unexplained pain disparities in underserved populations. Nature Medicine, 27(1), 136–140. https://doi.org/10.1038/s41591-020-01192-7
Probast. (2021). Probast. Www.probast.org. https://www.probast.org/
Reynolds, P. P. (2004). Professional and Hospital DISCRIMINATION and the US Court of Appeals Fourth Circuit 1956–1967. American Journal of Public Health, 94(5), 710–720. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1448322/
Rieke, A., Bogen, M., & G. Robinson, D. G. (2018). Public Scrutiny of Automated Decisions: Early Lessons and Emerging Methods An Upturn and Omidyar Network Report 2. Upturn, Omidyar Network. https://luminategroup.com/storage/231/Public-Scrutiny-of-Automated-Decisions.pdf
Ron Wyden. (2022, February 3). U.S. Senate: Request not Accepted - Security Risk Detected. Www.senate.gov. https://www.wyden.senate.gov/news/press-releases/wyden-booker-and-clarke-introduce-algorithmic-accountability-act-of-2022-to-require-new-transparency-and-accountability-for-automated-decision-systems?peek=BH793HGzEX7gimi20t7HiHEg8n9b3vET476N7MsTy%2BcOuyHe
S. Al-Agba, N. (2020, January 14). How Structural Racism Affects Healthcare. Www.kevinmd.com. https://www.medpagetoday.com/blogs/kevinmd/84362
Schwartz, R., Vassilev, A., Greene, K., Perine, L., Burt, A., & Hall, P. (2022). Towards a Standard for Identifying and Managing Bias in Artificial Intelligence. https://doi.org/10.6028/nist.sp.1270
Shah, M., Sachdeva, M., & Dodiuk-Gad, R. P. (2020). COVID-19 and racial disparities. Journal of the American Academy of Dermatology, 83(1), e35. https://doi.org/10.1016/j.jaad.2020.04.046
Shin, T. (2020, May 6). What is statistical bias and why is it so important in data science? Towards Data Science. https://towardsdatascience.com/what-is-statistical-bias-and-why-is-it-so-important-in-data-science-80e02bf7a88d
Shipe, M. E., Deppen, S. A., Farjah, F., & Grogan, E. L. (2019). Developing prediction models for clinical use using logistic regression: an overview. Journal of Thoracic Disease, 11(S4), S574–S584. https://doi.org/10.21037/jtd.2019.01.25
Silberg, J., & Manyika, J. (2019). Notes from the AI frontier: Tackling bias in AI (and in humans). https://www.mckinsey.com/~/media/McKinsey/Featured%20Insights/Artificial%20Intelligence/Tackling%20bias
%20in%20artificial%20intelligence%20and%20in%20humans/MGI-Tackling-bias-in-AI-June-2019.pdf
Silverstein, J. (2015, April 13). Genes Don’t Cause Racial-Health Disparities, Society Does. The Atlantic; The Atlantic. https://www.theatlantic.com/health/archive/2015/04/genes-dont-cause-racial-health-disparities-society-does/389637/
Silverstein, J. (2018, April 26). Jim Crow Laws Are Gone But They’re Still Making Black People Sick. Www.vice.com. https://www.vice.com/en/article/wj73j9/health-effects-jim-crow-laws-cancer
Singh, K., Valley, T. S., Tang, S., Li, B. Y., Kamran, F., Sjoding, M. W., Wiens, J., Otles, E., Donnelly, J. P., Wei, M. Y., McBride, J. P., Cao, J., Penoza, C., Ayanian, J. Z., & Nallamothu, B. K. (2020). Evaluating a Widely Implemented Proprietary Deterioration Index Model Among Hospitalized COVID-19 Patients. MedRxiv. https://doi.org/10.1101/2020.04.24.20079012
Stat Analytica. (2019, December 17). What is Bias in Statistics? Its Definition and Types. Statanalytica. https://statanalytica.com/blog/bias-in-statistics/
Statista Research Department. (2022, March 17). AI market size 2018-2025. Statista. https://www.statista.com/statistics/607716/worldwide-artificial-intelligence-market-revenues/https://www.statista.com/statistics/672712/ai-funding-united-states/
Stevenson, M., & Doleac, J. L. (2019). Algorithmic Risk Assessment in the Hands of Humans. SSRN Electronic Journal, 12853. https://doi.org/10.2139/ssrn.3489440
The Fletcher School at Tufts University. (2020, July 31). Automated Racism: How Algorithmic Bias Can Drive Inequality – A Panel on Tufts Day of Reflection, Commitment, and Action for Racial Justice – The Hitachi Center for Technology and International Affairs. https://sites.tufts.edu/hitachi/event/automated-racism-how-algorithmic-bias-can-drive-inequality-a-panel-on-tufts-day-of-reflection-commitment-and-action-for-racial-justice
The New York Times. (1942, January 29). Red Cross to Use Blood of Negroes; New Policy, Formulated After Talks With Army and Navy, Is Hailed and Condemned. The New York Times. https://www.nytimes.com/1942/01/29/archives/red-cross-to-use-blood-of-negroes-new-policy-formulated-after-talks.html
Townson, S. (2020, November 6). AI Can Make Bank Loans More Fair. Harvard Business Review. https://hbr.org/2020/11/ai-can-make-bank-loans-more-fair.
Turner-Lee, N., Resnick, P., & Barton, G. (2019, May 22). Algorithmic bias detection and mitigation: Best practices and policies to reduce consumer harms. Brookings. https://www.brookings.edu/research/algorithmic-bias-detection-and-mitigation-best-practices-and-policies-to-reduce-consumer-harms
Vaid, A., Somani, S., Russak, A. J., De Freitas, J. K., Chaudhry, F. F., Paranjpe, I., Johnson, K. W., Lee, S. J., Miotto, R., Richter, F., Zhao, S., Beckmann, N. D., Naik, N., Kia, A., Timsina, P., Lala, A., Paranjpe, M., Golden, E., Danieletto, M., & Singh, M. (2020). Machine Learning to Predict Mortality and Critical Events in a Cohort of Patients With COVID-19 in New York City: Model Development and Validation. Journal of Medical Internet Research, 22(11), e24018. https://doi.org/10.2196/24018
Veinot, T. C., Mitchell, H., & Ancker, J. S. (2018). Good intentions are not enough: how informatics interventions can worsen inequality. Journal of the American Medical Informatics Association, 25(8), 1080–1088. https://doi.org/10.1093/jamia/ocy052
Vyas, D. A., Eisenstein, L. G., & Jones, D. S. (2020). Hidden in Plain Sight — Reconsidering the Use of Race Correction in Clinical Algorithms. New England Journal of Medicine, 383. https://doi.org/10.1056/nejmms2004740
Wawira Gichoya, J., McCoy, L. G., Celi, L. A., & Ghassemi, M. (2021). Equity in essence: a call for operationalising fairness in machine learning for healthcare. BMJ Health & Care Informatics, 28(1), e100289. https://doi.org/10.1136/bmjhci-2020-100289
White, D. B., & Lo, B. (2020). A Framework for Rationing Ventilators and Critical Care Beds During the COVID-19 Pandemic. JAMA, 323(18). https://doi.org/10.1001/jama.2020.5046
Williams, J. C., Anderson, N., Mathis, M., Samford, E., Eugene, J., & Isom, J. (2020). Colorblind Algorithms: Racism in the Era of COVID-19. Journal of the National Medical Association, 112(5). https://doi.org/10.1016/j.jnma.2020.05.010
Wolff, J., Pauling, J., Keck, A., & Baumbach, J. (2020). Systematic Review of Economic Impact Studies of Artificial Intelligence in Health Care. Journal of Medical Internet Research, 22(2), e16866. https://doi.org/10.2196/16866
Wolff, R. F., Moons, K. G. M., Riley, R. D., Whiting, P. F., Westwood, M., Collins, G. S., Reitsma, J. B., Kleijnen, J., & Mallett, S. (2019). PROBAST: A Tool to Assess the Risk of Bias and Applicability of Prediction Model Studies. Annals of Internal Medicine, 170(1), 51. https://doi.org/10.7326/m18-1376
Wynants, L., Van Calster, B., Bonten, M. M. J., Collins, G. S., Debray, T. P. A., De Vos, M., Haller, M. C., Heinze, G., Moons, K. G. M., Riley, R. D., Schuit, E., Smits, L. J. M., Snell, K. I. E., Steyerberg, E. W., Wallisch, C., & van Smeden, M. (2020). Prediction models for diagnosis and prognosis of covid-19 infection: systematic review and critical appraisal. BMJ, 369(m1328), m1328. https://doi.org/10.1136/bmj.m1328
Yadaw, A. S., Li, Y., Bose, S., Iyengar, R., Bunyavanich, S., & Pandey, G. (2020). Clinical features of COVID-19 mortality: development and validation of a clinical prediction model. The Lancet Digital Health, 2(10), e516–e525. https://doi.org/10.1016/s2589-7500(20)30217-x
Yearby, R., Clark, B., & Figueroa, J. F. (2022). Structural Racism In Historical And Modern US Health Care Policy. Health Affairs, 41(2), 187–194. https://doi.org/10.1377/hlthaff.2021.01466

[1] Petra Molnar, "As Canada experiments with using artificial intelligence (AI) in its immigration and refugee system, we must ensure the system protects human rights," Policy Options Politiques, 2018, https://policyoptions.irpp.org/magazines/october-2018/governments-use-of-ai-in-immigration-and-refugee-system-needs-oversight/.

[2] Sian Townson, “AI Can Make Bank Loans More Fair.,” Harvard Business Review, 2020, https://hbr.org/2020/11/ai-can-make-bank-loans-more-fair.

[3] Solon Barocas Manish Raghavan, Challenges for mitigating bias in algorithmic hiring, The Brookings Institution’s Artificial Intelligence and Emerging Technology (AIET) Initiative (Center for Technology Innovation, 2019), https://www.brookings.edu/research/challenges-for-mitigating-bias-in-algorithmic-hiring/.

[4] Claude Casteluccia, Daniel Le Metayer, “Understanding algorithmic decision-making: Opportunities and challenges”, European Parliamentary Research Service, 2019, https://www.europarl.europa.eu/RegData/etudes/STUD/2019/624261/EPRS_STU(2019)624261_EN.pdf

[5] Foresee Medical, “Medical algorithms & examples in healthcare”, https://www.foreseemed.com/medical-algorithms.

[6] Foresee Medical, “Medical algorithms & examples in healthcare”, https://www.foreseemed.com/medical-algorithms.

[7] Arjun S Yadwaw et al., “Clinical features of COVID-19 mortality: development and validation of a clinical prediction model,” The Lancet, Digital Health, 2020, https://www.thelancet.com/journals/landig/article/PIIS2589-7500(20)30217-X/fulltext#seccestitle130.

[8] Jessica Kent, “Predictive Analytics Model Can Determine COVID-19 Outcomes” Health IT Analytics, 2020, https://healthitanalytics.com/news/predictive-analytics-model-can-determine-covid-19-outcomes.

[9] COVID PRECISE, “Project”, 2021, https://www.covprecise.org/project/.

[10] Cathy O'Neil, 2016, “Weapons of Math Destruction”, pg. 20

[11] Cathy O'Neil, 2016, “Weapons of Math Destruction”.

[12] Paul Resnick Nicol Turner Lee, and Genie Barton, 2019, “Algorithmic bias detection and mitigation: Best practices and policies to reduce consumer harms”, https://www.brookings.edu/research/algorithmic-bias-detection-and-mitigation-best-practices-and-policies-to-reduce-consumer-harms/.

[13] Surya Mattu Jeff Larson, Lauren Kirchner and Julia Angwin, 2016, "How We Analyzed the COMPAS Recidivism Algorithm," (2016). https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm.

[14] Nira S. Al-Agba, 2020, “How Structural Racism Affects Healthcare,.”, https://www.medpagetoday.com/blogs/kevinmd/84362

[15] CDC, 2020, “COVID-19 Racial and Ethnic Health Disparities”, 2020, https://www.cdc.gov/coronavirus/2019-ncov/community/health-equity/racial-ethnic-disparities/index.html

[16] CDC, 2020, “COVID-19 Racial and Ethnic Health Disparities”, 2020, https://www.cdc.gov/coronavirus/2019-ncov/community/health-equity/racial-ethnic-disparities/index.html.

[17] Nambi Ndugga et al., 2021, “Latest Data on DOVID-19 Vaccinations by Race/Ethnicity”, Kaiser Family Foundation, https://www.kff.org/coronavirus-covid-19/issue-brief/latest-data-on-covid-19-vaccinations-by-race-ethnicity/.

[18] The Fletcher School at Tufts University, 2020, "Automated Racism: How Algorithmic Bias Can Drive Inequality – A Panel on Tufts Day of Reflection, Commitment, and Action for Racial Justice”, https://sites.tufts.edu/hitachi/event/automated-racism-how-algorithmic-bias-can-drive-inequality-a-panel-on-tufts-day-of-reflection-commitment-and-action-for-racial-justice/.

[19] Aaron Rieke, Miranda Bogen, David Robinson, “Public Scrutiny of Automated Decisions: Early Lessons and Emerging Models”, 2018, Upturn, Omidyar Network, https://luminategroup.com/storage/231/Public-Scrutiny-of-Automated-Decisions.pdf.

[20] The Aspen Institute, “Glossary for Understanding the Dismantling Structural Racism/Promoting Racial Equity Analysis”, https://www.aspeninstitute.org/wp-content/uploads/files/content/docs/rcc/RCC-Structural-Racism-Glossary.pdf

[21] Rugabiijah Yearby, Brietta Clark, Joe F. Figuueroa, 2022, “Structural Racism in Historical and Modern US Health Care Policy”, Health Affairs, https://www.healthaffairs.org/doi/10.1377/hlthaff.2021.01466.

[22] Kerri L. Hunkele, 2014, “Segregation in United States Healthcare: From Reconstruction to Deluxe Jim Crow”, https://scholars.unh.edu/cgi/viewcontent.cgi?article=1189&context=honors.

[23] P. Preston Reynolds, 2004, “Professional and Hospital DISCRIMINATION and the US Court of Appeals Fourth Circuit 1956-1967”, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1448322/.

[24] Hospitals & Health Networks, 2014, “U.S. Hospitals and the Civil Rights Act of 1964”, https://www.hhnmag.com/articles/4179-u-s-hospitals-and-the-civil-rights-act-of-1964.

[25] Jason Silverstein, 2018, “Jim Crow Laws Are Gone but They’re Still Making Black People Sick”, https://www.vice.com/en/article/wj73j9/health-effects-jim-crow-laws-cancer.

[26] P. Preston Reynolds, 2004, “Professional and Hospital Discrimination and the US Court of Appeals Fourth Circuit 1956-1967”, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1448322/.

[27] Jonathon Martin, 2016, “Simkins v. Cone (1963)”, http://northcarolinahistory.org/encyclopedia/simkins-v-cone-1963/.

[28] Nancy Krieger, Jaquelyn Jahn, Pamela Waterman, 2016, “Jim Crow and estrogen-receptor-negative breast cancer: US-born black and white non-Hispanic women, 1992-2012”, https://pubmed.ncbi.nlm.nih.gov/27988896/.

[29] Jason Silverstein, 2018, “Jim Crow Laws Are Gone but They’re Still Making Black People Sick”, https://www.vice.com/en/article/wj73j9/health-effects-jim-crow-laws-cancer.

[30] Jason Silverstein, 2015 “Genes Don’t Cause Racial-Health Disparities, Society Does”, https://www.theatlantic.com/health/archive/2015/04/genes-dont-cause-racial-health-disparities-society-does/389637/

[31] Vann R. Newkirk, 2016, “America’s Health Segregation Problem”, https://www.theatlantic.com/politics/archive/2016/05/americas-health-segregation-problem/483219/.

[32] Shikha Garg, 2019, “Hospitalization Rates and Characteristics of Patients Hospitalized with Laboratory-Confirmed Coronavirus Disease 2019”, https://www.cdc.gov/mmwr/volumes/69/wr/mm6915e3.htm/

[33] Monica Shah, 2020, “COVID-19 and racial disparities”, https://www.jaad.org/article/S0190-9622(20)30659-9/fulltext.

[34] CDC, 2019, “Health Equity Considerations & Racial & Ethnic Minority Groups”, https://www.cdc.gov/coronavirus/2019-ncov/community/health-equity/race-ethnicity.html.

[35] CDC, 2019, “Health Equity Considerations & Racial & Ethnic Minority Groups”, https://www.cdc.gov/coronavirus/2019-ncov/community/health-equity/race-ethnicity.html.

[36] Nambi Ndugga et al, 2022, “Latest Data of COVID-10 Vaccinations by Race/Ethnicity”, https://www.kff.org/coronavirus-covid-19/issue-brief/latest-data-on-covid-19-vaccinations-by-race-ethnicity/#:~:text=Across%20the%2038%20states%20for,for%20Black%20people%20(57%25).

[37] AJMS AlHasan, 2021, “Bias in medical artificial intelligence, Royal College of Surgeons of England”, https://publishing.rcseng.ac.uk/doi/full/10.1308/rcsbull.2021.111

[38] AJMS AlHasan, 2021, “Bias in medical artificial intelligence, Royal College of Surgeons of England”, https://publishing.rcseng.ac.uk/doi/full/10.1308/rcsbull.2021.111

[39] COVID PRECISE, Project, https://www.covprecise.org/project/.

[40] Nicol Turner Lee et al, 2019, “Algorithmic bias detection and mitigation: Best practices and policies to reduce consumer harms”, https://www.brookings.edu/research/algorithmic-bias-detection-and-mitigation-best-practices-and-policies-to-reduce-consumer-harms.

[41] Terence Shin, 2020, “What is statistical bias and why is it so important in data science?”, https://towardsdatascience.com/what-is-statistical-bias-and-why-is-it-so-important-in-data-science-80e02bf7a88d.

[42] Stat Analytica, 2019, “What is Bias in Statistics? Its Definition and Types”, https://statanalytica.com/blog/bias-in-statistics/.

[43] Terence Shin, 2020, “What is statistical bias and why is it so important in data science?”, https://towardsdatascience.com/what-is-statistical-bias-and-why-is-it-so-important-in-data-science-80e02bf7a88d.

[44] Terence Shin, 2020, “What is statistical bias and why is it so important in data science?”, https://towardsdatascience.com/what-is-statistical-bias-and-why-is-it-so-important-in-data-science-80e02bf7a88d.

[45] Terence Shin, 2020, “What is statistical bias and why is it so important in data science?”, https://towardsdatascience.com/what-is-statistical-bias-and-why-is-it-so-important-in-data-science-80e02bf7a88d.

[46] Terence Shin, 2020, “What is statistical bias and why is it so important in data science?”, https://towardsdatascience.com/what-is-statistical-bias-and-why-is-it-so-important-in-data-science-80e02bf7a88d.

[47] Colin Camerer, 1998, “Prospect Theory in the Wild: Evidence From the Field”, https://authors.library.caltech.edu/80314/1/sswp1037.pdf.

[48] Jessica Paulus, David Kent, 2020, “Predictably unequal: understanding and addressing concerns that algorithmic clinical prediction may increase health disparities”, https://www.nature.com/articles/s41746-020-0304-9.

[49] Ruha Benjamin, 2019, “Race After Technology: Abolitionist Tools for the New Jim Code”

[50] Rose McDermott, 2016, “Prospect Theory”, https://www.britannica.com/topic/prospect-theory.

[51] Wolff et al., 2019, “PROBAST: A Tool to Assess the Risk of Bias and Applicability of Prediction Model Studies”, https://www.probast.org/wp-content/uploads/2020/02/aime201901010-m181376.pdf.

[52] Wolff et al., 2019, “PROBAST: A Tool to Assess the Risk of Bias and Applicability of Prediction Model Studies”, https://www.probast.org/wp-content/uploads/2020/02/aime201901010-m181376.pdf.

[53] Moons et al., 2019, “PROBAST: A Tool to Assess the Risk of Bias and Applicability of Prediction Model Studies: Explanation and Elaboration”, https://www.probast.org/wp-content/uploads/2020/02/aime201901010-m181377.pdf.

[54] Moons et al., 2019, “PROBAST: A Tool to Assess the Risk of Bias and Applicability of Prediction Model Studies: Explanation and Elaboration”, https://www.probast.org/wp-content/uploads/2020/02/aime201901010-m181377.pdf.

[55] Moons et al., 2019, “PROBAST: A Tool to Assess the Risk of Bias and Applicability of Prediction Model Studies: Explanation and Elaboration”, https://www.probast.org/wp-content/uploads/2020/02/aime201901010-m181377.pdf.

[56] Wolff et al., 2019, “PROBAST: A Tool to Assess the Risk of Bias and Applicability of Prediction Model Studies”, https://www.probast.org/wp-content/uploads/2020/02/aime201901010-m181376.pdf.

[57] Shipe et al, 2019, “Developing prediction models for clinical use using logistic regression: an overview”, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6465431/.

[58] Ogundimu et al, 2016, “Adequate sample size for developing prediction models is not simply related to events per variable”, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5045274/.

[59]Google, 2019, “Classification: ROC Curve and AUC- Machine Learning Crash Course”, https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc.

[60] Cathy O'Neil, 2016, “Weapons of Math Destruction”.

[61] COVID PRECISE, 2020, “Living Review”, https://www.covprecise.org/living-review/.

[62] Laure Wynants, 2019, “Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal”, https://www.covprecise.org/wp-content/uploads/2020/05/2021-Wynants-et-al-Update-4-BMJ-LR-prediction-models.pdf.

[63] Nirav Bhakta et al, 2021, “Addressing Race in Pulmonary Function Testing by Aligning Intent and Evidence With Practice and Perception”, https://journal.chestnet.org/article/S0012-3692(21)03692-8/pdf.

[64] Nirav Bhakta et al, 2021, “Addressing Race in Pulmonary Function Testing by Aligning Intent and Evidence With Practice and Perception”, https://journal.chestnet.org/article/S0012-3692(21)03692-8/pdf.

[65] Eunjung Lim et al, 2015, “Racial/Ethnic-Specific Reference Intervals for Common Laboratory Tests: A Comparison among Asians, Blacks, Hispanics, and White”, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4578165/.

[66] Lim et al, 2010, “Race-specific WBC and neutrophil count reference intervals”, https://pubmed.ncbi.nlm.nih.gov/20236184/.

[67] Richard Landers, Tara Behrend, 2022, “Auditing the AI Auditors: A Framework for Evaluating Fairness and Bias in High Stakes AI Predictive Models”, https://psycnet.apa.org/fulltext/2022-30899-001.html.

[68] Jake Silberg, James Manyika, 2019 “Notes from the AI frontier: Tackling bias in AI (and humans)”, https://www.mckinsey.com/~/media/mckinsey/featured%20insights/artificial%20intelligence/tackling%20bias%20in%20artificial%20intelligence%20and%20in%20humans/mgi-tackling-bias-in-ai-june-2019.pdf.

[69] GAO, 2020, “Artificial Intelligence in Health Care: Benefits and Challenges of Technologies to Augment Patient Care”, https://www.gao.gov/products/gao-21-7sp

[70] Ruha Benjamin, 2019, Race After Technology, pg. 11.

[71] Statista Research Department, 2022, “Artificial intelligence software market revenue worldwide 2018-2025”, https://www.statista.com/statistics/607716/worldwide-artificial-intelligence-market-revenues/https://www.statista.com/statistics/672712/ai-funding-united-states/.

No competing interests reported.

APPENDIX.docx

Download PDF

Version 1

posted

You are reading this latest preprint version

Codified Racism in Digital Health Platforms A Meta-Analysis of COVID-19 Prediction Algorithms and their Policy Implications

Status:

Version 1

Abstract

Introduction

Background

Racism In the American Healthcare System

Covid-19 Racial Disparities

Algorithmic Bias

Methodology

Results

Constitution

Inputs And Outputs

Training Data

Transparency

Outcome

Scale

Policy

Discussion

Conclusion

Declarations

Data Availability

References

Footnotes

Additional Declarations

Supplementary Files

Status:

Version 1