Can a “goldilocks” Mortality Predictor Perform Consistently Across Time and Equitably Across Populations?

doi:10.21203/rs.3.rs-2109453/v1

Download PDF

Research Article

Can a “goldilocks” Mortality Predictor Perform Consistently Across Time and Equitably Across Populations?

https://doi.org/10.21203/rs.3.rs-2109453/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Objective: Advance care planning (ACP) facilitates end-of-life care, yet many die without one. Timely and accurate mortality prediction may encourage ACP. Therefore, we assessed performance equity and consistency for a novel 5-to-90-day mortality predictor.

Methods: Predictions were made for the first day of included adult inpatient admissions on a retrospective dataset. Performance was assessed across various demographies, geographies, and timeframes.

Results: AUC-PR remained at 29% both pre- and during COVID. Pre-COVID-19 recall and precision were 58% and 25% respectively at the 12.5% cutoff, and 12% and 44% at the 37.5% cutoff. During COVID-19, recall and precision were 59% and 26% at the 12.5% cutoff, and 11% and 43% at the 37.5% cutoff. Pre-COVID, recall dropped at both cutoffs if recent data was not made available to the model; and compared to the overall population, recall was lower at the 12.5% cutoff in the White, non-Hispanic subgroup and at both cutoffs in the rural subgroup. During COVID-19, precision at the 12.5% cutoff was lower than that of the overall population for the non-White and non-White female subgroups. No other statistically significant differences were seen between subgroups and the corresponding overall population.

Conclusions: Overall predictive performance during the pandemic was unchanged from pre-pandemic performance. Although some comparisons (especially precision at the 37.5% cutoff) were underpowered, precision at the 12.5% cutoff was equitable across most demographies, regardless of the pandemic. Mortality prediction to prioritize ACP conversations can be provided consistently and equitably across many studied timeframes, geographies, and demographies.

Mortality

Prognosis

Machine Learning

Health Equity

Advance Care Planning

Advance care planning (ACP, which may also refer to the resulting advance care plan) is a process to discuss and document patients’ preferences for end-of-life care.[1] Patients and clinicians agree that ACP enables each patient to receive their desired life-extending care while avoiding the pain, discomfort, social separation, and cost of end-of-life procedures that the patient does not want.[2], [3] Demonstrated ACP benefits include respecting end-of-life wishes, decreasing the burden on loved ones, stress reduction, improved patient satisfaction, and fewer in-hospital deaths.[3]

Although experts agree on the importance of ACPs, clinicians cite time constraints and poor communication with other providers as barriers to having end-of-life discussions.[3], [4] Reduced access to healthcare in mixed-rurality populations may make ACP even more unlikely.[5] Due to these barriers, many patients do not have documented preferences at the end-of-life and therefore do not achieve what has been termed an “ideal death”.[6]–[8]

Some algorithms predict mortality too early to create urgency or too late for meaningful ACP discussion. For example, the Charlson comorbidity index predicts mortality within the next ten years and may not create a sense of urgency,[9] while the qSOFA score predicts mortality risk during the current inpatient stay[10] when the ability to have meaningful discussion may be compromised (e.g., due to obtundation or mechanical ventilation).[11], [12]

Accordingly, NYU Langone Health developed an algorithm to predict mortality within 60 days after the start of an inpatient admission using data from their three medical centers in New York City. Their aim was to support identification of palliative care candidates. Their model utilized 9614 features and achieved 0.28 area under the precision-recall curve (AUC-PR).[13]

We sought a model to predict post-inpatient mortality to meet a different need – to help prioritize and encourage timely ACP conversations during the inpatient stay. Although our system aims for ACPs with every patient, time constraints and other factors can make this infeasible. Our system serves a mixed-rurality population, and rurality constraints (e.g., gaps in palliative care availability and longer travel distances for care) may further reduce ACP feasibility.[7] Predicting mortality using clinician gestalt alone may have limited accuracy, but combining gestalt with a predictive model may be synergistic.[14] Therefore, to help prioritize ACPs when resources are limited, and to encourage clinicians to have ACPs in those more likely to benefit, we developed a model to predict mortality occurring 5-to-90 days after the start of an inpatient admission. The 5–90 day window was chosen to be not too short and not too long (“Goldilocks”), providing at least 4 days to have the discussion (average length of an inpatient stay[15]), while also creating enough urgency to stimulate the ACP. To achieve adequate performance for our use case in our mixed-rurality population, a new model had to be created. The model appears to be novel because it was trained on a mixed-rurality population, utilizes a 5-90-day prediction window, and requires only 13 input features (easing implementation and the ability to explain predictions -- see Table 1). After development, the model achieved an AUC-PR of 0.30 on a holdout dataset.

Table 1

Available and selected features included in the model
Categories of Features Used for Model Development	Final Features Selected by the Model
● Demographics o Patient ID o Gender o Race o Ethnic group o Age at the date of the encounter o National and state Area Deprivation Index (ADI) o Information about the visit ● Summary information about prior visit counts ● Active problem list content such as total number of problems, number of cancers ● Counts of active medications on the medication list ● Counts of classes of prior lab results and changes in those counts over time ● Aggregate values (average, minimum, and maximum) for specific lab results and changes in those values over time. ● Charges for services, excluding charges from the 30 days immediately prior to the admission (as they may not yet have posted at the time of admission). ● Counts of surgeries and procedures on a patient over time, changes in those counts over time. ● Features that apply mathematical functions to medication counts and problem counts as of the time of a visit.	● Age ● Minimum albumin from the time of the encounter to one month prior. ● Count of cancers on the patient’s active problem list ● Average BNP from 12 months to 1 month prior to the encounter. ● Average blood albumin from time of the encounter to one month prior. ● Change in the minimum bilirubin comparing the period of 12 months prior to the encounter to 1 month prior to the encounter to the period of 1 month prior to the encounter to the time of the encounter. ● Range of BNP from the time of the encounter to one month prior. ● Change in the count of abnormal labs per day comparing the period of 12 months prior to the encounter to 1 month prior to the encounter to the period of 1 month prior to the encounter to the time of the encounter. ● Count per day of outpatient visits per day from 12 months to 1 month prior to the encounter. ● Average red blood cell count (RBC) from the time of the encounter to one month prior. ● Range of total bilirubin from the time of the encounter to one month prior. ● Count per day of inpatient visits per day from 12 months to 1 month prior to the encounter. ● Count per day of emergency department (ED) visits per day from 12 months to 1 month prior to the encounter.

However, algorithms can experience performance degradation over time due to “concept drift,”[16] and may perform differently across demographic groups.[17] Performance degradation can lead to mistrust of the model and loss of its benefits, while varying performance across demographic groups can lead to inequities in the provision of healthcare.[18] Therefore, this study assesses whether the model retains its predictive performance over time (especially during a global pandemic) and performs equitably across patient subgroups.

Objective

We sought to retrospectively assess the aforementioned 5-to-90-day mortality predictor’s performance on inpatient admissions over different timeframes, conditions, and demographic subgroups to assess and compare its consistency and equity of performance in those contexts.

This study was approved with exemption determination by the University of Illinois College of Medicine at Peoria Institutional Review Board.

Model Assessment

We assessed the above-described model on datasets retrospectively extracted from the health system’s enterprise data warehouse (EDW), which contains data from a variety of sources, including the health system’s electronic health record and another source [19] of death records. Datasets contained one row per inpatient visit during the selected timeframe, including visits for patients > = 18 years of age at the time of admission who had at least one lab test available in the EDW in the 365 days prior to the visit, and whose resuscitation status at the time the model was assessed (a proxy for status on admission) was either “Full Code” or null. No visit used to originally develop or assess the model was utilized in this analysis.

Model performance was assessed by populating datasets with the input features and target variable (5-90-day mortality), generating a prediction using the input variables, and assessing the model’s performance in different timeframes, for different patient subsets, under different implementation conditions, and at different cutoffs (the level of certainty separating “yes” from “no” predictions). All datasets ended at least 6 months prior to analysis to ensure at least 90 days had passed after the visit to populate the target variable plus another 90 days to account for death reporting delays.

We assessed performance on various demographic subgroups. Since non-Hispanic, non-White patients represent a small minority of the studied population, some race/ethnicity subgroups were combined to reduce the likelihood of overly small subgroups. We estimated socioeconomic disadvantage using the Area Deprivation Index (ADI).[20] A within-state ADI decile was assigned using each patient’s recorded 5-digit home zip code. Since multiple ADI values could be associated with a single 5-digit zip code, the average of all ADI values for each 5-digit zip code was used. To reduce the likelihood of overly small subgroups, patients were grouped into ADI deciles of < = 5 and > 5. Patients were excluded from those subgroups if an ADI could not be assigned (e.g., no matching zip code). Performance by level of rurality was assessed using Rural-Urban Commuting Area Codes (RUCA),[21] mapped using the patient’s home zip code and applying the suggested categorizations of codes 1–3 as not rural and codes 4–10 as rural[22], [23]. Patients were excluded from those subgroups if a RUCA code could not be assigned.

Finally, we assessed performance at two levels of the COVID case rate 7-day average in the state of Illinois[24]: a COVID lull period (low rate) and a COVID peak (up to 17x higher rate than the COVID lull).

Statistical Methods

Model performance analysis in the various contexts included calculation of precision (positive predictive value) and recall (sensitivity) at certainty cutoffs of 12.5% (for greater recall) and 37.5% (for greater precision), area under the receiver-operator characteristic curve (AUC-ROC), and AUC-PR. Statistical comparisons were performed using R (version 4.2.0). Precision and recall were compared between the total population and the population stratified by demographic variables using two proportion z-tests with unequal sample sizes with a two-sided alternative hypothesis at 5% significance (alpha = 0.05). A Bonferroni correction for 26 tests for the pre-COVID dataset and 28 tests for the during-COVID dataset (the numbers of population and subset pairings) was used to adjust p-values for multiple comparisons within each performance metric (precision and recall). Post-hoc power analysis was done to determine the sample size required to detect a small Cohen’s h effect size (0.2)[25] for a two-proportion z-test with unequal sample sizes with a power of 0.80. Correlation coefficients were calculated using Pearson r correlations.

In total, the datasets included 76,812 distinct inpatient visits, 47,750 prior to the COVID-19 pandemic and 29,062 during the pandemic.

The AUC-ROC and AUC-PR for the pre-COVID dataset were 82% and 29% respectively, and 81% and 29% for the during-COVID dataset. No significant differences were found in precision or recall at either the 12.5% cutoff or the 37.5% cutoff when comparing predictor performance on the full pre-COVID and during-COVID datasets (Table 2).

Table 2

Predictor validation pre- and during-COVID at selected cutoffs
Metric	Cutoff	pre-COVID Dataset	post-COVID Dataset	p-Value
Precision	12.5%	25%	26%	0.230
Recall	12.5%	58%	59%	0.524
Precision	37.5%	44%	43%	0.840
Recall	37.5%	12%	11%	0.286
All comparisons had > 80% power

Model performance on each demographic subset of the pre-COVID dataset was compared to its overall performance on that dataset (Table 3). The only significant differences in precision or recall between any of the demographic subgroups and the overall population were lower recall in the White non-Hispanic population at the 12.5% cutoff and lower recall in the rural population at both cutoffs. Additionally, the model had significantly lower recall at both the 12.5% and 37.5% cut off when the model was not provided with the freshest data (newest data made available to the model was at least two days prior to the inpatient admission). While a majority of the comparisons were adequately powered, a substantial minority of comparisons were underpowered.

Table 3

Predictor performance for subgroups pre-COVID
		12.5% Cutoff				37.5% Cutoff
Population	Preva-lence	Precision	Precision p-value	Recall	Recall p-value	Precision	Precision p-value	Recall	Recall p-value
All n = 47750	0.08	0.25	–	0.58	–	0.44	–	0.12	–
Female n = 28265	0.06	0.24	1	0.62	0.058	0.45	1*	0.13	1
Male n = 19485	0.10	0.26	1	0.59	1	0.43	1*	0.12	1
White Non-Hispanic n = 40643	0.09	0.25	1	0.53	0.003	0.45	1	0.11	1
Other Race/Ethnicity n = 6816	0.05	0.22	0.975	0.66	0.131*	0.32	0.27*	0.12	1*
White Non-Hispanic Female n = 23523	0.07	0.24	1	0.56	1	0.47	1*	0.11	1
White Non-Hispanic Male n = 17120	0.11	0.26	1	0.55	0.434	0.44	1*	0.11	1
Other Race/Ethnicity Female n = 4574	0.04	0.23	1	0.64	1*	0.32	1*	0.13	1*
Other Race/Ethnicity Male n = 2242	0.08	0.22	1	0.59	1*	0.31	1*	0.09	1*
High ADI Rank n = 36571	0.08	0.25	1	0.60	1	0.44	1	0.12	1
Low ADI Rank n = 10636	0.07	0.25	1	0.60	1	0.42	1*	0.12	1
Non-rural n = 33663	0.08	0.25	1	0.58	1	0.46	1	0.13	1
Rural n = 4325	0.09	0.24	1	0.52	0.023	0.38	1*	0.08	0.028
2 Day Old Data n = 47750	0.08	0.27	0.167	0.38	< 0.001	0.42	1*	0.07	< 0.001
ADI = Area Deprivation Index -- higher ADI values suggest greater levels of disadvantage; *asterisked items had Power < 80%; A Bonferroni correction for multiple comparisons was applied to the p-values; Prevalence is the fraction of patients in that group that died 5–90 days after the day of admission.

A similar assessment was applied to the during-COVID dataset, except the assessment of performance using less-fresh data was replaced by performance assessment during COVID lull and peak periods (Table 4). Compared to the overall population, the only significant differences among subgroups were lower precision in the Other Race/Ethnicity and the Other Race/Ethnicity female-only subgroups, but again, a substantial minority of comparisons (including all comparisons of precision at the 37.5% cutoff) were underpowered.

Table 4

Predictor performance for subgroups during-COVID
		12.5% Cutoff				37.5% Cutoff
Population	Preva-lence	Precision	Precision p-value	Recall	Recall p-value	Precision	Precision p-value	Recall	Recall p-value
All n = 29062	0.09	0.26	–	0.59	–	0.43	–	0.11	–
Female n = 16948	0.08	0.24	1	0.54	0.193	0.41	1*	0.10	1
Male n = 12114	0.12	0.27	1	0.55	0.666	0.45	1*	0.10	1
White Non-Hispanic n = 22230	0.10	0.27	1	0.58	1	0.45	1*	0.11	1
Other Race/Ethnicity n = 6585	0.06	0.21	0.004	0.56	1*	0.33	1*	0.09	1*
White Non-Hispanic Female n = 12695	0.08	0.26	1	0.61	1	0.44	1*	0.11	1
White Non-Hispanic Male n = 9544	0.12	0.28	0.900	0.59	1	0.46	1*	0.11	1
Other Race/Ethnicity Female n = 4120	0.05	0.18	0.001	0.53	1*	0.29	0.694*	0.09	1*
Other Race/Ethnicity Male n = 2473	0.08	0.23	1	0.57	1*	0.40	1*	0.10	1*
High ADI Rank n = 21394	0.10	0.26	1	0.55	0.499	0.44	1*	0.10	1
Low ADI Rank n = 7188	0.08	0.25	1	0.58	1	0.41	1*	0.11	1
Non-rural n = 21249	0.09	0.26	1	0.56	1	0.42	1*	0.10	1
Rural n = 2378	0.11	0.27	1	0.60	1	0.46	1*	0.11	1
COVID Lull n = 4391	0.08	0.23	1	0.58	1*	0.37	1*	0.11	1*
COVID Spike n = 4064	0.10	0.28	1	0.59	1*	0.42	1*	0.11	1*
ADI = Area Deprivation Index -- higher ADI values suggest greater levels of disadvantage; *asterisked items had Power < 80%; A Bonferroni correction for multiple comparisons was applied to the p-values; Prevalence is the fraction of patients in that group that died 5–90 days after the day of admission.

AUC-PR was also calculated for the demographic subsets (Fig. 1).

Outcome variable prevalence is known to affect predictor performance (particularly precision and AUC-PR).[26] Therefore, we compared precision to prevalence of 5-to-90-day mortality across all studied populations and subgroups except the “less-fresh data” analysis (Fig. 2). The relationship of precision to prevalence had a Pearson r of 0.79 (p < 0.001) at the 12.5% cutoff and 0.64 (p < 0.001) at the 37.5% cutoff.

ACP informs end-of-life care to respect patient preferences, ensure quality of life, and avoid costly, unnecessary, and unwanted interventions.[2], [27] Mortality prediction models may help spur ACP conversations. Timely predictions may strike the right balance between sufficient clinical urgency and an adequately long lead time to allow for these often time-consuming discussions.[4], [28] These predictions may be especially useful in mixed-rurality populations due to relatively reduced access to healthcare when compared to urban populations.

This work was inspired by the studies out of NYU Langone demonstrating the performance and impact of their 60-day mortality prediction model which was intended to encourage ACP discussions [14] as well as to encourage appropriate patient referrals to supportive and palliative care.[13] NYU Langone’s model performance, with an AUC-PR of 28%, was enough to achieve good rates of physician agreement with the alerts and greater use of ACPs.[14] Therefore, we hoped to achieve a similar level of performance in with our model in our mixed-rurality population and maintain that performance over time despite changing conditions. COVID-19 created significant systemic change in the healthcare, and systemic change often leads to performance degradation in machine learned models.[16] Our predictor demonstrated consistent performance and resistance to concept drift, achieving an AUC-PR of 29% on both the pre-COVID and during-COVID datasets.

NYU Langone selected a cutoff designed to achieve a precision of 75% to identify likely appropriate referrals to supportive and palliative care. The tradeoff for high precision was a recall of just 4.6%.[13] Since our intended use was solely to encourage ACP discussions, we evaluated two cutoffs designed to provide higher recall at the cost of reduced precision. On the full pre-COVID dataset at a 12.5% cutoff 12.5%, our model achieved 58% recall and 25% precision; at a 37.5% cutoff the model achieved 12% recall and 44% precision. Model performance on the full during-COVID dataset did not significantly differ from that of the full pre-COVID dataset for any of those measures, demonstrating resistance to concept drift and performance degradation.

Previous work suggests that racial differences exist in the relationship between physiologic and socioeconomic parameters and mortality prediction.[29] Many recommend accounting for potentially differing machine learning model performance among demographic groups.[30]–[32] The COVID-19 pandemic has disrupted healthcare, particularly affecting patients with low socioeconomic status.[33], [34] The timing and effectiveness of ACPs can be affected by socioeconomic circumstances, race, and geographic location.[35], [36] Given these considerations, we assessed model performance in different subgroups including rurality, level of socioeconomic disadvantage, gender, ethnicity, and race. We also assessed performance during a lull and a peak in COVID case rates. Finally, we assessed the importance of fresh data to the model’s performance.

Significant performance differences were not seen for most comparisons, with some notable exceptions and caveats. Fresh data seems important for model performance, at least at the higher cutoff, likely because a recent physiologic change cannot be recognized if that data is not available to the model. Recall was significantly lower than that of the overall pre-COVID population for White non-Hispanic patients and patients from rural areas. During COVID, the Other Race/Ethnicity subgroup and the female-only subset of that subgroup had lower precision than the overall population. Conclusions cannot be drawn and further research is warranted for a substantial minority of comparisons that were neither significantly different nor adequately powered. However, for the majority of comparisons, model performance was comparable to that of the overall population.

As expected, precision tended to be lower in subgroups having a lower prevalence of 5–90-day mortality (Fig. 2). In the two instances for which precision was statistically significantly lower than the overall group, prevalence of 5–90-day mortality was among the lowest of any subgroup. Since most precision comparisons were underpowered at the 37.5% cutoff, the 0.64 correlation at that cutoff may be underestimated. This analysis shows that differences among subgroups in predicted risk at a particular cutoff are associated with actual differences in risk.

For subgroups having significant differences in predictor performance, the cutoffs for those subgroups could be adjusted to equalize performance. However, changing the cutoff typically improves either precision or recall at the cost of worsening the other metric, so predictor performance cannot be simultaneously equalized for both metrics across subgroups. One must select a metric to equalize. In our scenario, selecting cutoffs that equalize precision across subgroups would increase the likelihood that all who receive an alert will have a similar risk of near-term death. However, this means that subgroups with a lower prevalence of near-term death (e.g., females in our study populations) will be less likely to receive an alert and therefore may less likely have an ACP. Instead, cutoffs could be selected to equalize sensitivity across subgroups so that an equal fraction of patients who actually suffer a near-term death receive an alert. However, subgroups with a lower prevalence of near-term death will be more likely to get an alert when they have a lower risk of death. This may lead to alert fatigue and/or mistrust of the predictor,[18] and the magnitude of variation in cutoffs among demographic groups that would lead to predictor distrust in this context is not known. In addition, if clinician capacity for ACPs is limited, patients with a lower risk of death may get an ACP at the expense of those with greater urgency and need. Cutoffs could be selected to equalize the frequency of positive alerts across subgroups to equalize the predictor’s impact on ACPs across subgroups. As with equalizing on sensitivity, however, this outcome may be lost if the resulting alerts on lower risk patients lead to alert fatigue and/or mistrust of the predictor. Also, those in greatest need of an ACP may be less likely to get one if clinician bandwidth to have ACPs is constrained. Other approaches may be taken, but all involve tradeoffs.

Existing literature suggests that equalizing the performance of a Boolean predictor among different subgroups is use-case dependent.[18], [17] For our use case, we suspect that equalizing precision across subgroups may best serve the clinical need by reducing the risk of alert fatigue and mistrust and prioritizing alerts to those with the greatest predicted need. However, since only a few statistically significant performance differences were seen among subgroups, and the statistical significance of those differences were inconsistent across the studied time periods, it may be wisest not to draw firm conclusions about whether or how to adjust cutoffs until the pandemic further stabilizes and the study can be repeated.

Our use of ADI to assess predictive model equity across levels of economic disadvantage along with the assessment of equity across different levels of rurality may be unique. A PubMed search on “ADI prediction equity” or “area deprivation index prediction equity”[37], [38] returned only one relevant result looking at the equity of a prediction model for various levels of ADI, and that study did not assess equity across levels of rurality.[39]

Limitations

Although assessments were designed to avoid “future leakage” (use of data that will not be available at the time of prediction), complete avoidance cannot be guaranteed in this retrospective study. Other confounders related to the retrospective nature of this study may have affected results. This study was performed at one multi-hospital health system serving a predominantly White and Midwestern population, potentially limiting generalizability. Some demographic data may be inaccurate, affecting results. We grouped RUCA codes based on published approaches,[22], [23] but different published groupings might have led to different rurality results.[23], [40] The ADI may not accurately represent the patient’s socioeconomic status, and our use of an average ADI for the five-digit zip code may not represent the actual ADI for the patient’s census tract. Some demographies were aggregated to avoid small group sizes, and the predictor may perform differently across the aggregated demographies. Use of current code status as a proxy for status on admission may have affected results, but we believe patients are more likely to change from null or full code status to something else than the reverse. Finally, our study was limited to model performance analysis, not the resulting impact on clinical care. These limitations represent fruitful areas of future research.

The predictor resisted concept drift and performance degradation before and during the pandemic. Using precision for performance equitability assessment, although some precision comparisons (especially at the 37.5% cutoff) were underpowered and warrant further study, precision at the 12.5% cutoff was equitable across most demographies, regardless of the pandemic.

For time-constrained clinicians unable to have ACP discussions with every inpatient, this model may consistently and equitably help prioritize patients likely to benefit in the near-term from these crucial conversations.

Ethics Approval and Consent to Participate

This study was approved by OSF Clinical Research and approved (as Exempt Review) by the University of Illinois College of Medicine at Peoria’s Institutional Review Board (IRB) under IRBNet Package #1872976-2. All methods were performed in accordance with the international and national guidelines and regulations. As the nature of the study is retrospective, the IRB has waived informed consent for this study.

Consent for Publication

Not applicable.

Availability of Data and Materials

The datasets analyzed during the current study are not publicly available, since they were extracted from patients’ electronic health records. Data on patients are protected by medical confidentiality. The IRB approval includes the assurance that individual patient data will not be released. Data requests can be addressed to the corresponding author, who will evaluate the possibility of fulfilling the request considering institutional policies, regulatory requirements, and the patients’ privacy.

Competing Interests

The authors declare that they have no competing interests.

Funding

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

Authors’ Contributions

All authors contributed to development of the methods and writing the manuscript. Jonathan Handler and Jeremy McGarvey contributed to the data analysis.

Acknowledgments

The authors would like to thank Safura Sultana for her support in obtaining IRB approval, project management administration, and manuscript support.

A. Brinkman-Stoppelenburg, J. A. C. Rietjens, and A. van der Heide, “The effects of advance care planning on end-of-life care: a systematic review,” Palliat. Med., vol. 28, no. 8, pp. 1000–1025, Sep. 2014, doi: 10.1177/0269216314526272.
R. S. Martin, B. Hayes, K. Gregorevic, and W. K. Lim, “The Effects of Advance Care Planning Interventions on Nursing Home Residents: A Systematic Review,” J. Am. Med. Dir. Assoc., vol. 17, no. 4, pp. 284–293, Apr. 2016, doi: 10.1016/j.jamda.2015.12.017.
H. D. Lum, R. L. Sudore, and D. B. Bekelman, “Advance care planning in the elderly,” Med. Clin. North Am., vol. 99, no. 2, pp. 391–403, Mar. 2015, doi: 10.1016/j.mcna.2014.11.010.
L. E. Dingfield and J. B. Kayser, “Integrating Advance Care Planning Into Practice,” Chest, vol. 151, no. 6, pp. 1387–1393, Jun. 2017, doi: 10.1016/j.chest.2017.02.024.
K. J. Johnston, H. Wen, and K. E. Joynt Maddox, “Lack Of Access To Specialists Associated With Mortality And Preventable Hospitalizations Of Rural Medicare Beneficiaries,” Health Aff. (Millwood), vol. 38, no. 12, pp. 1993–2002, Dec. 2019, doi: 10.1377/hlthaff.2019.00838.
B. Steffen-Bürgi, “[Ideas about a ‘good death’ in Palliative Care Nursing],” Pflege, vol. 22, no. 5, pp. 371–378, Oct. 2009, doi: 10.1024/1012-5302.22.5.371.
H. Nelson-Brantley, C. Buller, C. Befort, E. Ellerbeck, A. Shifter, and S. Ellis, “Using Implementation Science to Further the Adoption and Implementation of Advance Care Planning in Rural Primary Care,” J. Nurs. Scholarsh. Off. Publ. Sigma Theta Tau Int. Honor Soc. Nurs., vol. 52, no. 1, pp. 55–64, Jan. 2020, doi: 10.1111/jnu.12513.
K. N. Yadav et al., “Approximately One In Three US Adults Completes Any Type Of Advance Directive For End-Of-Life Care,” Health Aff. Proj. Hope, vol. 36, no. 7, Art. no. 7, Jul. 2017, doi: 10.1377/hlthaff.2017.0175.
M. E. Charlson, P. Pompei, K. L. Ales, and C. R. MacKenzie, “A new method of classifying prognostic comorbidity in longitudinal studies: development and validation.,” J. Chronic Dis., vol. 40, no. 5, pp. 373–383, 1987, doi: 10.1016/0021-9681(87)90171-8.
J. L. Vincent et al., “The SOFA (Sepsis-related Organ Failure Assessment) score to describe organ dysfunction/failure. On behalf of the Working Group on Sepsis-Related Problems of the European Society of Intensive Care Medicine.,” Intensive Care Med., vol. 22, no. 7, pp. 707–710, Jul. 1996, doi: 10.1007/BF01709751.
B. M. Sorger, B. Rosenfeld, H. Pessin, A. K. Timm, and J. Cimino, “Decision-making capacity in elderly, terminally ill patients with cancer.,” Behav. Sci. Law, vol. 25, no. 3, pp. 393–404, 2007, doi: 10.1002/bsl.764.
S. Cohen et al., “Communication of end-of-life decisions in European intensive care units.,” Intensive Care Med., vol. 31, no. 9, pp. 1215–1221, Sep. 2005, doi: 10.1007/s00134-005-2742-x.
V. J. Major and Y. Aphinyanaphongs, “Development, implementation, and prospective validation of a model to predict 60-day end-of-life in hospitalized adults upon admission at three sites.,” BMC Med. Inform. Decis. Mak., vol. 20, no. 1, p. 214, Sep. 2020, doi: 10.1186/s12911-020-01235-6.
E. Wang et al., “Supporting Acute Advance Care Planning with Precise, Timely Mortality Risk Predictions,” NEJM Catal., vol. 2, no. 3, 2021, doi: 10.1056/CAT.20.0655.
W. Freeman, A. Weiss, and K. Heslin, “Overview of U.S. Hospital Stays in 2016: Variation by Geographic Region,” Agency for Healthcare Research and Quality, Rockville, MD, 246, Feb. 2018. [Online]. Available: www.hcup-us.ahrq.gov/nisoverview.jsp
F. Bayram, B. S. Ahmed, and A. Kassler, “From concept drift to model degradation: An overview on performance-aware drift detectors,” Knowl.-Based Syst., vol. 245, p. 108632, Jun. 2022, doi: 10.1016/j.knosys.2022.108632.
J. K. Paulus and D. M. Kent, “Predictably unequal: understanding and addressing concerns that algorithmic clinical prediction may increase health disparities,” Npj Digit. Med., vol. 3, no. 1, p. 99, Dec. 2020, doi: 10.1038/s41746-020-0304-9.
A. Rajkomar, M. Hardt, M. D. Howell, G. Corrado, and M. H. Chin, “Ensuring Fairness in Machine Learning to Advance Health Equity,” Ann. Intern. Med., vol. 169, no. 12, p. 866, Dec. 2018, doi: 10.7326/M18-1990.
“ObituaryData.com,” Jul. 05, 2022. https://www.obituarydata.com/default.asp
A. J. H. Kind and W. R. Buckingham, “Making Neighborhood-Disadvantage Metrics Accessible — The Neighborhood Atlas,” N. Engl. J. Med., vol. 378, no. 26, pp. 2456–2458, Jun. 2018, doi: 10.1056/NEJMp1802313.
“USDA ERS - Rural-Urban Commuting Area Codes.” https://www.ers.usda.gov/data-products/rural-urban-commuting-area-codes.aspx (accessed Aug. 17, 2022).
“Research & Training Center on Disability in Rural Communities.” https://www.umt.edu/rural-institute/rtc/focus-areas/research-methods/defining-rural.php (accessed Aug. 17, 2022).
R. Cunningham et al., “Guidelines for Using Rural-Urban Classification Systems for Community Health Assessment.” Washington State Department of Health, Oct. 27, 2016. Accessed: Aug. 23, 2022. [Online]. Available: https://doh.wa.gov/sites/default/files/legacy/Documents/1500//RUCAGuide.pdf
Illinois Department of Public Health, “COVID-19 Surveillance Case Rate 7 Day Rolling Average.” Accessed: Jul. 21, 1922. [Online]. Available: https://dph.illinois.gov/covid19/data/surveillance.html#caseRateChart
J. Cohen, Statistical power analysis for the behavioral sciences, 2nd ed. Hillsdale, N.J.: L. Erlbaum Associates, 1988.
S. Tenny and M. R. Hoffman, “Prevalence - StatPearls - NCBI Bookshelf,” StatPerals, 2020. https://www.ncbi.nlm.nih.gov/books/NBK430867/#:~:text=Prevalence%20thus%20impacts%20the%20positive,decreases%20while%20the%20NPV%20increases. (accessed Sep. 10, 2020).
K. M. Detering, A. D. Hancock, M. C. Reade, and W. Silvester, “The impact of advance care planning on end of life care in elderly patients: randomised controlled trial.,” BMJ, vol. 340, p. c1345, Mar. 2010, doi: 10.1136/bmj.c1345.
R. L. Sudore et al., “Defining Advance Care Planning for Adults: A Consensus Definition From a Multidisciplinary Delphi Panel.,” J. Pain Symptom Manage., vol. 53, no. 5, pp. 821-832.e1, May 2017, doi: 10.1016/j.jpainsymman.2016.12.331.
“RACIAL DIFFERENCES IN PREDICTING MORTALITY,” The Gerontologist, vol. 56, no. Suppl_3, pp. 506–507, Nov. 2016, doi: 10.1093/geront/gnw162.2043.
J. W. Gichoya et al., “AI recognition of patient race in medical imaging: a modelling study,” Lancet Br. Ed., vol. 4, no. 6, pp. e406–e414, 2022, doi: 10.1016/S2589-7500(22)00063-2.
K. Palmer, “‘It’s not going to work’: Keeping race out of machine learning isn’t enough to avoid bias,” STAT.
M. Tan et al., “Including Social and Behavioral Determinants in Predictive Models: Trends, Challenges, and Opportunities.,” JMIR Med. Inform., vol. 8, no. 9, p. e18084, Sep. 2020, doi: 10.2196/18084.
A. N. Poudel et al., “Impact of Covid-19 on health-related quality of life of patients: A structured review.,” PloS One, vol. 16, no. 10, p. e0259164, 2021, doi: 10.1371/journal.pone.0259164.
J. A. W. Gold et al., “Dispensing of Oral Antiviral Drugs for Treatment of COVID-19 by Zip Code-Level Social Vulnerability - United States, December 23, 2021-May 21, 2022,” MMWR Morb. Mortal. Wkly. Rep., vol. 71, no. 25, pp. 825–829, 2022, doi: 10.15585/mmwr.mm7125e1.
J. L. Tripken, C. Elrod, and S. Bills, “Factors Influencing Advance Care Planning Among Older Adults in Two Socioeconomically Diverse Living Communities.,” Am. J. Hosp. Palliat. Care, vol. 35, no. 1, pp. 69–74, Jan. 2018, doi: 10.1177/1049909116679140.
N. Khosla, A. L. Curl, and K. T. Washington, “Trends in Engagement in Advance Care Planning Behaviors and the Role of Socioeconomic Status.,” Am. J. Hosp. Palliat. Care, vol. 33, no. 7, pp. 651–657, Aug. 2016, doi: 10.1177/1049909115581818.
“prediction equity area deprivation index - Search Results - PubMed,” PubMed. https://pubmed.ncbi.nlm.nih.gov/?term=prediction%20equity%20area%20deprivation%20index (accessed Sep. 13, 2022).
“adi prediction equity - Search Results - PubMed,” PubMed. https://pubmed.ncbi.nlm.nih.gov/?term=prediction%20equity%20adi (accessed Sep. 13, 2022).
G. E. Weissman, S. Teeple, N. D. Eneanya, R. A. Hubbard, and S. Kangovi, “Effects of neighborhood-level data on performance and algorithmic equity of a model that predicts 30-day heart failure readmissions at an urban academic medical center,” J. Card. Fail., vol. 27, no. 9, pp. 965–973, Sep. 2021, doi: 10.1016/j.cardfail.2021.04.021.
“Rural Urban Commuting Area Codes Data,” WWAMI Rural Health Research Center. https://depts.washington.edu/uwruca/ruca-uses.php (accessed Aug. 23, 2022).

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

Can a “goldilocks” Mortality Predictor Perform Consistently Across Time and Equitably Across Populations?

Status:

Version 1

Abstract

Figures

Background And Significance

Objective

Materials And Methods

Model Assessment

Statistical Methods

Results

Discussion

Limitations

Conclusion

Declarations

References

Additional Declarations

Status:

Version 1