Optimal Triage for COVID-19 Patients Under Limited Intensive Care Unit Capacity: Development of a Parsimonious Machine Learning Prediction Model and Threshold Optimization Using Discrete-Event Simulation


 Background: The coronavirus disease 2019 (COVID-19) pandemic has caused an unprecedented burden on healthcare systems. To effectively triage COVID-19 patients within situations of limited data availability and to explore optimal thresholds to minimize mortality rates while maintaining the healthcare system capacity.Methods: A nationwide sample of 5601 patients confirmed for COVID-19 up until April 2020 was retrospectively reviewed. XGBoost and logistic regression analysis were used to develop prediction models for the patients’ maximum clinical severity during hospitalization, classified according to the WHO Ordinal Scale for Clinical Improvement (OSCI). The recursive feature elimination technique was used to evaluate the extent of the model performance’s maintenance when clinical and laboratory variables are eliminated. Using populations based on hypothetical patient influx scenarios, discrete-event simulation was performed to find the optimal threshold within limited resource environments that minimizes mortality rates.Results: The cross-validated area under the receiver operating characteristics (AUROC) of the baseline XGBoost model that utilized all 37 variables was 0.965 for OSCI ≥ 6. Compared to the baseline model’s performance, the AUROC of the reduced model that utilized 17 variables was maintained at 0.963 with statistical insignificance. Optimal thresholds were found to minimize mortality rates in a hypothetical patient influx scenario. The benefit of utilizing an optimal triage threshold was clear, reducing mortality up to 18.1% compared to the conventional Youden Index.Conclusions: Our adaptive triage model and its threshold optimization capability reveal that COVID-19 management can be integrated using both medical and healthcare management sectors to guarantee maximum treatment efficacy.


Introduction
The high incidences of critical illness and mortality due to coronavirus disease (COVID- 19) have placed unprecedented burdens on healthcare systems. The World Health Organization (WHO) guidelines recommend that all countries should prepare for infection surges in their healthcare facilities, implementing appropriate triage protocols [1]. Unfortunately, these guidelines fail to provide a one-sizets-all approach that works for individual regions, while accounting for their unique outbreak surges. Until further improvements in treatments and epidemiological immunizations have been achieved, implementing an adaptive triage model is imperative to ensure the most effective utilization of local regions' healthcare resources.
Various triage measures are needed during a pandemic for the lack of established intensive care unit (ICU) resources. First, a reliable prediction model that provides accurate prognoses is needed to facilitate preemptive treatments. This model should not only focus on any particular clinical endpoint but also needs to provide accurate predictions for composite disease spectrums. Second, this prediction model should operate using readily available assessment parameters, thereby maintaining its reliability in resource-constrained environments. Third, this model should also consider the availability of ICU at either a facility or national level relative to concurrent patient in ux volumes. COVID-19 is associated with disruptions to most healthcare infrastructures. Therefore, an adjustable risk strati cation model that considers various regions' ICU availability, as well as one that identi es patients who will likely require intensive care, will help to reduce these systems' burdens.
Prognostic models have been developed to ensure effective triage for COVID-19 patients [2][3][4][5][6][7]. These models have a modest predictive accuracy; however, their generalizability has been questioned due to their con nement to single clinical outcome measures and reductions in their performance when using insu cient data. Most importantly, these models' classi cation thresholds, which are crucial for ensuring the effective ICU utilization, have been neglected, thereby limiting their practical implementation capabilities. In sum, these models' shortcomings highlight the need to combine large-scale multiinstitutional data with advanced prediction models, such as those using machine learning and simulation modeling.
Our objectives were three-fold. best-performing prediction model [11]. The models were developed and cross-validated using 5037 (89.9%) patient data and were then revalidated using a 564 (10.1%) hold-out cohort. Performance metrics were calculated using 10-fold cross-validation. Model development was performed using the caret package in R Model.

Variable elimination
RFE was performed for two full models: F + L+ (with laboratory data) and F + L-(excluding laboratory data). Shapley additive explanations (SHAP) was used to rank each variable based on its signi cance to the model [12]. At each RFE iteration, the lowest-ranked feature was eliminated, the model was re tted, and its performance was assessed. The reduced models (F-L + and F-L-) were then selected at a point wherein the number of features was minimized while the differences in the AUROC remained statistically insigni cant. Analysis was performed using caret and SHAPforXGBOOST package in R.

Model interpretation and comparison
To interpret the F + L + model, we used SHAP as it provides visible post-hoc interpretability to black-box machine learning models [12]. Patient-speci c plots were created by aggregating the SHAP score of each variable.
Hyperparameters of the XGBoost algorithm were optimized to maximize its area under the receiver operating characteristics (AUROC) value using a simple grid search. Accuracy, AUROC, sensitivity, positive predicted value (PPV), and negative predicted value (NPV) were calculated at a 90% speci city using the pROC package in R. We conducted DES to nd the optimal threshold that occurs within limited ICU capacity that minimizes mortality rates, as calculated by , using the simmer R package.
A hypothetical patient in ux scenario was created using the SIR model [13]. The total population calculated was xed at 60,000, considering that the largest historical in ux observed in South Korea (58654 cumulative patients: November 13 2020 -February 20 2021). I(0) and R(0) were xed at six and zero, respectively. The recovery rate gamma was set at 0.05 because the average COVID-19 recovery time was 20.1 days [14]. The transmission rate beta ranged between 0.75 and 5 when generating in uxes with different R0 (base reproduction rate) levels. The number of newly con rmed patients per day was obtained from the SIR modeling data (Fig. 1).

Probability generation
Out-of-fold prediction results of the 10-fold cross-validation were aggregated to generate an empirical probability distribution of the disease severity probability. We used the F + L-model because of its high performance and its potential use wherein there are limited diagnostic tools. Inverse transformation sampling was performed on the empirical probability distribution function, which was approximated using the Gaussian kernel density estimation and linear interpolation [15]. The process was performed separately for severe and non-severe patients, with the sampled probabilities being randomly matched with the generated patient in ux rates while maintaining the prevalence of severe patients. The prediction probability distributions of the out-of-fold and generated samples are presented in Fig. 2.

Simulation scenarios
Patients with a severe disease probability above the threshold are directed to the ICU, depending on its current capacity. Rejected patients are then directed to the general ward along with those who have a severe disease probability below the threshold. The probability of severe disease patients dying while in the ICU was 0.507, while it was 0.990 for those outside of the ICU [16]. Non-severe patients were assumed to survive regardless of ICU admission. Patient deaths were categorized into three types: resourceindependent deaths, wherein severe patients expire despite ICU care (Type I); resource-dependent deaths in which severe patients expired due to ICU unavailability (Type II); and threshold-dependent deaths wherein severe patients expire after being incorrectly classi ed as "non-severe" and are subsequently directed to the general ward (Type III).
The maximum capacity of the ICU was established as 504 beds based on the number of isolation beds under negative pressure [17]. To estimate the distribution of length of stay, we used a previously suggested gamma distribution with a shape parameter of 1.5488 and a rate parameter of 0.1331 for those who expired, with a shape parameter of 0.8904 and a rate parameter of 0.0477 for those who survived to approximate the median and interquartile range [16,18]. Simulations were repeated 20 times for each in ux scenario to ensure robustness.

Patient characteristics
Descriptive characteristics of the training and hold-out cohorts are provided in Additional le 1: Table S1. A total of 5330 (95.2%) patients exhibited non-severe symptoms with an OSCI < 6, while 271 (4.8%) exhibited severe symptoms with an OSCI ≥ 6.

Model interpretability
According to SHAP, age and lymphocyte count were the most important risk factors for predicting OSCI ≥ 6 ( Fig. 3a). Patient age, lymphocyte count, platelet count, BMI, hematocrit, and heart rate all exhibited non-linear in uences in predicting disease severity (Fig. 3b). In addition to the overall impact of each feature on the model's output, SHAP provides patient-speci c in uences of each variable on their predicted disease severity (Additional le 1: Fig. S1).

Predictive performance under limited data availability
An AUROC of 0.965 (95% CI: 0.958-0.972) was obtained with the baseline model (F + L+), which included all 37 variables. Notably, a reduction in its performance was found to be insigni cant when 20 variables were eliminated, resulting in the F-L + model (Fig. 4a). The F + L + model achieved both sensitivity and speci city of greater than 90%. The F-L + model achieved a sensitivity of 88% and a PPV of 31%. The F-L + model still outperformed the LR model regarding all performance measures.
The F + L-model obtained an AUROC of 0.946 (95% CI: 0.936-0.956), which included 32 variables (Additional le 1: Table S3). The reduction in performance was found to be insigni cant when 21 variables were eliminated, resulting in the F-L-model (Fig. 4b). The F + L-and F-L-models achieved sensitivities of 84% and 81%, respectively (Table 1). Differences in AUROCs were observed when laboratory variables were excluded, both in the full and reduced models, which implied that the laboratory variables have a solid discriminative power (all p ≤ 0.01). The AUROCs of all hold-out sets were statistically indifferent from the cross-validation results, inferring the model's generalizability to unseen data (Additional le 1: Table S4). Detailed results and the selected variables used at each step of the RFE are presented in Additional le 1: Tables S3 and S5.  We performed simulations using patient ow data that was generated using the SIR model with varying R0s. The overall DES work ow is illustrated in Fig. 5. The DES using hypothetical patient in ux scenarios revealed that the optimal threshold ranges from 0.02 to 0.66, while the respective minimized mortality rates ranged from 0.017 (1.7%) to 0.042 (4.2%) (Fig. 6). The optimal threshold values and minimized mortality rates for each R0 show that a larger R0 value tends to result in increases in both of these variables. The optimal threshold is increased along with the R0 values to increase precision for severe patients while fully utilizing the ICU. The optimized mortality rates were increased due to increased proportion of death outside ICU followed by a larger volume of patient in ux. The bene ts of utilizing an optimal triage threshold are clear when compared with the conventional Youden Index (J-index) as a benchmark value, which was 0.013. Decreased mortality rates (= ) were notably large in a magnitude ranging from 6.1-18.1%.
We observed a convex relationship of mortality rates in accordance with the thresholds (Fig. 7). The mortality rate was minimized at a point where the proportion of type I deaths, which has the lowest P death (50.7%), was maximized. For example, when R0 was 1.5, the proportion of type I deaths was maximized at the optimal threshold, accounting for 66.4% of total deaths. However, a threshold that is too low leads to inadequate capacity exhaustion with misclassi ed non-severe patients. Consequently, the resulting limited capacity for actual severe patients then decreases the proportion of type I deaths and increases those of type II deaths. Conversely, a threshold that is too high would result in unnecessary rejection for severe patients, which then decreases the proportion of type I deaths and increases those of type III deaths.
In situations of excessively high R0 values and increased ICU demands, increasing the triage threshold to reject more patients will still deplete the ICU capacity. Therefore, adjusting the threshold will mostly result in trade-offs between the number of threshold-and capacity-dependent rejections, limiting the in uence of threshold adjustment on minimizing patient mortality. In situations of su ciently low R0 values, the effect of threshold optimization is reduced along with its necessity. Nonetheless, the large reduction in mortality rates among the remaining in uxes outlines the substantial bene ts of optimizing the patient triage threshold under resource constraints.

Discussion
A distinctive feature of our F + L + model is its high discriminative power with an AUROC that exceeded 0.97 in both the cross-validation and hold-out settings. Previous prediction models for determining the clinical deterioration of COVID-19 patients have reported accuracies between 0.77 and 0.91 [2][3][4][5].
Additionally, these models require speci c diagnostic data, including laboratory data, peripheral oxygen saturation, or radiographic ndings, to maintain their accuracies. Moreover, to what extent these models' performance abilities are maintained during the partial absence of data has not been studied. For our F + L + model to be effectively implemented in practice, we con rmed that our reduced models maintained an adequate discriminative power even in the partial absences of data. The advantages of our reduced models include not only their generalizability to unseen data, but also their applicability within scenarios wherein there is limited clinical data. As timely triage is important for COVID-19 patients, our reduced models can be utilized during the early triage stages at a patient's arrival while waiting for the F + L + model to more accurately stratify their disease severity risk. Given the acute exacerbation of pneumonia in COVID-19 patients, our model can also be used to re-evaluate hospitalized patients in the short-term, so that those whose clinical manifestations are likely to exacerbate can be early identi ed [19].
A noteworthy feature of our model is its ability to discriminate between patient-speci c contributing factors for disease exacerbation and their individual contributions using SHAP values. Current COVID-19 treatment guidelines provide recommendations based on the average-risk patient under limited available insights into their disease stage [10]. These recommendations provide a one-size-ts-all approach to all patients, which is problematic for those with more complex or atypical disease presentations. Our model obviates the need for arbitrary patient risk groupings and is, therefore, useful in maximizing their survival odds based on individual risk strati cation. Furthermore, our models can be integrated into electronic medical record systems, which utilize coding algorithms, as a noti cation system that helps in the early identi cation of disease exacerbation risk factors.
The validity of our model is supported by the high consistency between the results of its interpretation using SHAP, and previously reported prognosticators of COVID-19 severity [20][21][22][23][24][25]. We noted that old age, followed by lymphopenia and thrombocytopenia, exhibit the highest Shapley values for disease exacerbation. We presumed older age interacts with relevant features in older adults, including poor functional performances and increased frailty, which are associated with adverse outcomes and increased mortality among patients with respiratory syndromes [26]. Our ndings also add to the literature that lymphopenia plays an important role in COVID-19 exacerbation [22][23][24][25]. Lymphopenia is exerted by the lowering of lymphocytes due to injured alveolar epithelial cells and is commonly observed in COVID-19 patients [27]. Consistent with previous studies, thrombocytopenia was also found to be associated with adverse COVID-19 outcomes [23,28]. It has been suggested that a reduction or morphological alternation of the pulmonary capillary bed exerts pathological platelet defragmentation because the lung is a platelet release site with mature megakaryocytes [29]. Our prediction model supports the notion that early identi cation of COVID-19 exacerbation, before a hematological crisis occurs, is necessary for ensuring a better prognosis.
There is no existing study that examines COVID-19 severity prediction models that provides an explicit solution for the delivery of optimal triage using threshold modi cation that accounts for limited resource availability. We conducted a DES on our F-L + model to examine its discrimination thresholds that are usable in an adaptive manner across various patient in ux scenarios and the related ICU availability. Our simulations reveal that applying the optimal thresholds will minimize the mortality rate of each patient in ux scenario. Our hypothesis is supported by the signi cant differences found in mortality rates between the J-index and our optimized thresholds when applied to the expected in ux volumes. This observation implies the potential or our model to substantially reduce COVID-19 mortality rates through its appropriate adjusting of triage thresholds.
A limitation of our study was its incorporation of a single, national cohort of Asian ethnicity, which impacts our ndings' generalizability. An external validation using a more multi-ethnic population is thus needed to determine if a similar discrimination performance occurs among other ethnic groups. However, to ensure our model's robustness, we implemented 10-fold cross-validation with additional con rmation using the hold-out cohort. Another limitation was that the triage threshold was evaluated using a simulation. Simulations do not yield concrete answers, nor are they able to assess all kinds of potential situations [30].

Conclusions
We developed and validated a robust prediction model, with an explanatory feature, that enhances the e ciency of COVID-19 triage. Our model has the potential for effective use because it can be integrated into electronic medical record systems that utilizes coding algorithms. We further proposed a novel, adaptive triage model that utilizes patient in ux volumes according to concurrent ICU capacity to establish the optimal thresholds for triage strategies. Our study reveals that COVID-19 treatment plans need to integrate both medical and healthcare management expertise to guarantee maximum e cacy.

Declarations
Ethics approval: This study was approved by an institutional ethics committee (2020-0883-001) and the Korea Disease Control and Prevention Agency (KDCA) epidemiological survey and analysis committee (20201120_4a).
Consent to participate: Informed consent was not required because the study did not modify the patient's management and the data were anonymously collected.
Availability of data and material: Data supporting this study's ndings are available from the corresponding author upon reasonable request. A portion of the study data is available within Additional le 1. The original dataset is not publicly available due to the KDCA's con dentiality policy.      Optimized results of the patient triage simulations. Hypothetical in ux simulation results. R0 = basic reproduction rate, decreased mortality rate = (J index mortality rate -optimized mortality rate) / J index mortality rate.