Comparison of Prognostic Models of Spontaneous Intracerebral Hemorrhage: Potential Tools for Personalized Care and Clinical Trial in ICH

Background: Several prognostic models have been developed for spontaneous intracerebral hemorrhage (ICH); however, none of them have been consistently used in routine clinical practice or clinical research. In the study, we systematically compared 27 ICH models with regard to mortality and functional outcome at 1-month, 3-month and 1-year after ICH. Methods: The validation cohort was derived from the Beijing Registration of Intracerebral Hemorrhage. Poor functional outcome was dened as modied Rankin Scale score (mRS) ≥ 3 at 1-month, 3-month and 1-year after ICH, respectively. The area under the receiver operating characteristic curve (AUROC) and Hosmer-Lemeshow goodness-of-t test were used to assess model discrimination and calibration. Moderate, good, excellent and outstanding discrimination were predened as AUROC of 0.75-0.79, 0.80-0.84, 0.85-0.89 and above 0.90, respectively. Results: A total number of 1575 patients were included. The mean age was 57.2±14.3 and 67.2% were male. The median NIHSS score on admission was 11 (IQR: 3-21). For predicting mortality at 3-month after ICH, AUROC of 27 ICH models ranged from 0.604 to 0.856. One model showed excellent discrimination; fteen models demonstrated good discrimination; and seven models demonstrated moderate discrimination. In pairwise comparison, the ICH-FOS (0.856, 95%CI=0.835-0.878, P<0.001) showed statistically better discrimination than other models for mortality at 3-month after ICH (all P<0.05). For predicting poor functional outcome (mRS ≥ 3) at 3-month after ICH, AUROC of 27 ICH models ranged from 0.602 to 0.880. Five models showed excellent discrimination; six models demonstrated good discrimination; and ten models demonstrated moderate discrimination. In pairwise comparison with other prediction models, the ICH-FOS was superior in predicting poor functional outcome at 3-month after ICH (all P<0.001). Several risk models were well calibrated (Hosmer-Lemeshow test P>0.05) for mortality and poor functional outcome at 3-month after ICH, however, the ICH-FOS showed the largest Cox and Snell R-square. Similar results were veried for mortality and poor functional outcome at 1-month and 1-year after ICH. Conclusion: Several risk models are externally validated to be effective for risk stratication and outcome prediction after ICH, especially the ICH-FOS, which would be useful tools for personalized care and clinical trial in ICH. score; (6) admission stroke severity based on the National Institutes of Health Stroke Scale score (NIHSS) and the Glasgow Coma Scale (GCS) score; (7) admission systolic and diastolic blood pressure (mmHg): (8) admission laboratory tests (white blood cell count, blood glucose, and creatinine); (9) neuroimaging variables: intracerebral hemorrhage volume (measured using the ABC/2 method[10]), hematoma location (supratentorial or infratentorial ICH), intraventricular extension (presence or absence) and subarachnoid extension (presence or absence). All images were prospectively viewed by a trained neuroradiologist blinded to clinical data. (10) etiology diagnosis at discharge (primary or secondary ICH); (11) surgical treatment (craniotomy evacuation, minimal-invasive surgical therapy or brain ventricle puncture and drainage); (12) withdrawal of medical care; and (13) length of hospital stay (LOS). both mortality poor functional outcome at 1-month, 3-month and 1-year after ICH. These results


Background
Spontaneous intracerebral hemorrhage (ICH) accounts for approximately 15-20% of all strokes and is one of leading causes of mortality and morbidity worldwide [1,2]. Despite advances in medical knowledge, treatment for ICH remains strictly supportive with not many evidence-based interventions currently available [3,4]. Effort continues over the development of accurate, reliable and practical clinical grading scales and outcome prediction models for ICH [5], which would be useful for data-driven discussion with patients or families, personalized care and clinical trial.
During the past decades, several prognostic models have been proposed for predicting mortality and functional outcome after ICH. In early stage, ICH models were mainly presented as equations and were not convenient for clinical practice. In 2001, Hemphill et al introduced the original ICH score, which represents as one of the rst simple and easily assessable clinical grading scale for ICH [6]. Since then, a number of modi cations to the original ICH score and other pragmatic ICH scores have been developed [5]. Although some of these ICH risk models have been internally or externally validated, none of them has been universally accepted and consistently used in routine clinical practice and clinical research. In addition, with many ICH grading system available, it is becoming increasingly di cult for clinician and researcher to determine which risk models provide optimal predictability and reliability in clinical practice and clinical trials. Therefore, it is necessary to conduct head-to-head comparison of ICH models in an independent cohort.
In the study, we aimed to systematically compare discrimination and calibration of ICH risk models with regard to mortality and functional outcome at 1-month, 3-month and 1-year after ICH following the TRIPOD (transparent reporting of a multivariable prediction model for individual prognosis or diagnosis) guideline [7,8].

Validation cohort
The validation cohort was derived from the Beijing Registration of Intracerebral Hemorrhage, which was a multicenter, prospective and observational cohort study. A total number of thirteen hospitals in Beijing area participated in the study.
Trained research coordinators at each hospital reviewed medical records daily to identify, consent and enroll consecutively eligible patients. To be eligible for the study, subjects had to meet the following criteria: (1) age 18 years or older; (2) hospitalized with a primary diagnosis of spontaneous ICH and con rmed by brain CT or MRI [9]; (3) direct admission to hospital from a physician's clinic or emergency department; (4) written informed consent from patients or their legal representatives. All patients were diagnosed by a certi ed vascular neurologist. Demographics, medical history, medications, pre-stroke modi ed Rankin Scale (mRS) score, neurological de cit on admission, blood pressure, laboratory test, neuroimaging, treatment, medical complications and follow-up information during 1-year after onset were prospectively recorded. The study protocol was approved by the Institutional Review Board (IRB) of the Beijing Tiantan Hospital (KY2014-023-02) and written informed consent was obtained.

Data Collection And De nition Of Variables
A standardized electronic case report form (eCRF) was used for data collection. Participating centers collected data and submitted it online to the coordinating center at Beijing Tiantan Hospital. For this study, the following candidate variables were analyzed: (1) demographics (age and gender); (2) time from onset to hospital; (3) stroke risk factors: hypertension (history of hypertension or anti-hypertensive medication use), diabetes mellitus (history of diabetes mellitus or anti-diabetic medication use), dyslipidemia (history of dyslipidemia or lipid-lowering medication use), atrial brillation (history of atrial brillation or documentation of atrial brillation on admission), history of stroke/TIA, myocardial infarction, heart failure, current smoking, and heavy alcohol consumption (≥ 2 standard alcohol beverage per day); (4) pre-admission antithrombotic medications (anticoagulation and antiplatelet agents); (5) pre-stroke mRS score; (6) admission stroke severity based on the National Institutes of Health Stroke Scale score (NIHSS) and the Glasgow Coma Scale (GCS) score; (7) admission systolic and diastolic blood pressure (mmHg): (8) admission laboratory tests (white blood cell count, blood glucose, and creatinine); (9) neuroimaging variables: intracerebral hemorrhage volume (measured using the ABC/2 method [10]), hematoma location (supratentorial or infratentorial ICH), intraventricular extension (presence or absence) and subarachnoid extension (presence or absence). All images were prospectively viewed by a trained neuroradiologist blinded to clinical data. (10) etiology diagnosis at discharge (primary or secondary ICH); (11) surgical treatment (craniotomy evacuation, minimal-invasive surgical therapy or brain ventricle puncture and drainage); (12) withdrawal of medical care; and (13) length of hospital stay (LOS).

Functional Outcome Assessment
The modi ed Rankin Scale was used to evaluate functional outcome during 1 year after ICH. Central follow-up blinded to baseline variables was made by telephone interview by trained interviewers based on a standardized interview protocol. For patients who lost to follow-up, we made telephone follow-up interview once a week for three times. In the study, poor functional outcome was de ned as mRS ≥ 3 at 1-month, 3-month and 1-year after ICH.

Statistical analysis
Continuous variables were summarized with mean and standard deviation (SD) or median and interquartile range (IQR). Categorical variables were summarized as proportions. Chi-square or Fisher exact test was used to compare categorical variables and Mann-Whitney test or independent t-test was employed to compare continuous variables between groups.
The resulting ICH models were validated by assessing model discrimination and calibration [7,8]. Discrimination was assessed by calculating the area under the receiver operating characteristic curve (AUROC). In the study, moderate, good, excellent and outstanding discrimination were prede ned as AUROC of 0.75-0.79, 0.80-0.84, 0.85-0.89 and above 0.90, respectively. Due to that existing ICH models were designed for predicting outcomes at different time points after onset, we compared discrimination of ICH models with regard to mortality and poor functional outcome at 1-month, 3-month and 1-year after ICH, respectively. Pairwise AUROC was compared by using Delong's method [37] and sensitivity, speci city, positive predict value (PPV), and negative predictive value (NPV) were calculated at each risk models' maximum Youden Index.
Calibration was assessed by performing the Hosmer-Lemeshow goodness-of-t test and plot of observed versus predicted risk according to 10 deciles of the predicted risk. The Cox and Snell R-square and Nagelkerke R-square of the Hosmer-Lemeshow goodness-of-t test were calculated.

Predictive Performance For Mortality After Ich
Discrimination. Figure 1 shows discrimination of 27 ICH models with regard to mortality and poor functional outcome (mRS ≥ 3) at 1-month, 3-month and 1-year after ICH, respectively. For predicting 3-month mortality after ICH, AUROC of 27 ICH risk models ranged from 0.604 to 0.856. Among them, one model (The ICH-FOS) showed excellent discrimination with AUROC between 0.85 and 0.89; fteen risk models demonstrated good discrimination with AUROC between 0.80 and 0.84 including the Essen ICH score, Weimar' equation-II, GP on STAGE score, EDICH score, modi ed ICH Score, Huang's score, ICHOP score, Lisk's equation, CAA-ICH score, FUNC Score, simpli ed ICH score, ICH index, IVH score, max-ICH score and Mase's equation; and seven risk models demonstrated moderate discrimination with AUROC between 0.75 and 0.79 (Table 2) (An additional gure shows this in more detail [see Additional Fig. 2]). The sensitivity, speci city, PPV, NPV and maximum Youden Index of 27 ICH models for predicting 3-month mortality after ICH are shown in Table 2. The ICH-FOS showed the maximum Youden Index. In pairwise comparison, the ICH-FOS (0.856, 95% CI = 0.835-0.878, P < 0.001) showed statistically better discrimination than other risk models for mortality at 3-month after ICH (all P < 0.05). Similar results were found with regard to mortality at 1month and 1-year after ICH (Additional tables show this in more detail [see Additional Table 2-3]).  Calibration. The predicted and observed risk according to 10 deciles of the predicted risk of mortality at 3-month after ICH was plotted (An additional gure shows this in more detail [see Additional Fig. 4]). The results of Hosmer-Lemeshow test are shown in Table 3 (Table 2) (An additional gure shows this in more detail [see Additional Fig. 3]). The sensitivity, speci city, PPV, NPV and maximum Youden Index of ICH risk models with regard to poor functional outcome at 3-month after onset are shown in Table 2. The ICH-FOS showed the maximum Youden Index. In pairwise comparison with other prediction models, the ICH-FOS was superior in predicting poor functional outcome at 3-month after ICH (all P < 0.001). Similar results were found with regard to poor functional outcome at 1-month and 1-year after ICH (Additional tables show this in more detail [see Additional Table2-3]).
Calibration. The predicted and observed risk according to 10 deciles of the predicted risk of poor functional outcome at 3month after ICH was plotted in supplementary Fig. 5 (An additional gure shows this in more detail [see Additional Fig. 5]). The results of Hosmer-Lemeshow test are shown in Table 3. Ten risk models have a signi cance level of Hosmer-Lemeshow test greater than 0.05. Among them, the models with top three largest Cox and Snell R-square of Hosmer-Lemeshow test were the ICH-FOS, max-ICH and Hallevy's score, respectively (Table 3). Similar results were found for poor functional outcome at 1month and 1-year after ICH (Additional tables show this in more detail [see Additional table 5-7]).

Discussion
In the study, we systematically compared discrimination and calibration of 27 ICH risk models with regard to mortality and poor functional outcome at 1-month, 3-month and 1-year after ICH, respectively. For predicting 3-month mortality after ICH, AUROC of 27 ICH risk models ranged from 0.604 to 0.856. Among them, one model showed excellent discrimination and fteen models demonstrated good discrimination with regard to mortality at 3-month after ICH. For predicting poor functional outcome (mRS ≥ 3) at 3-month after ICH, AUROC of 27 ICH risk models ranged from 0.602 to 0.880. Five model showed excellent discrimination and six models demonstrated good discrimination for poor functional outcome at 3-month after ICH. In pairwise comparison, the ICH-FOS showed statistically better discrimination than other ICH models for both mortality and poor functional outcomes at 3-month after ICH. Similar results were veri ed for mortality and poor functional outcome at 1month and 1-year after ICH.
A lot of literature conveyed that do-not-resuscitate order and withdraw of care contribute to worse outcomes in patients with ICH and majority of available ICH risk models are hindered by this self-ful lling prophecy [38,39]. The ideal way to precisely de ne prognosis after ICH would be to assess it in a cohort in which patients received full support, irrespective of the perceived probable outcome. The Beijing Registration of Intracerebral Hemorrhage had an overall low rate of withdrawal of care (6.6%) and allowed us to get closer to overcoming the self-ful lling prophecy prejudice.
For a clinical risk model to become effective and widely used, it must be accurate and reliable in risk-strati cation and outcome prediction. Our study indicated that among 27 ICH risk models, seventeen models had good or excellent discrimination (AUROC ≥ 0.80) for predicting mortality at 3-month after ICH; meanwhile, eleven models showed good or excellent discrimination for poor functional outcome (mRS ≥ 3) at 3-month after ICH (Table 2). Similar results were found for predicting mortality and poor functional outcome at 1-month and 1-year after ICH (Supplementary table 2-3). These results indicated that several ICH risk models are effective for predicting mortality and functional outcomes after ICH, which would be useful for data-driven discussion with patients or families, personalized care and clinical trial.
With several ICH prognostic models available, it is becoming increasingly di cult for clinician and researcher to determine which risk models provide optimal predictability and reliability in clinical practice and clinical trial. In pairwise comparison, the ICH-FOS showed statistically better discrimination than other ICH risk models for both mortality (all P < 0.05) and poor functional outcome (all P < 0.001) at 3-month after ICH. Similar results were veri ed for predicting mortality and poor functional outcome at 1-month and 1-year after ICH (Supplementary table 2-3). In addition, in subgroup analysis, the ICH-FOS steadily showed the highest AUROC in prespeci ed subgroups for poor functional outcome at 3-month and 1-year after ICH (Fig. 2). In calibration analysis, the ICH-FOS showed the largest Cox and Snell R-square of the Hosmer-Lemeshow goodness of t test for both mortality and poor functional outcome at 1-month, 3-month and 1-year after ICH. These results were consistent with the original study for developing the ICH-FOS based on the China National Stroke Registry (CNSR) [30]. Together, these results indicated that the ICH-FOS had signi cantly better discrimination and calibration than compared models for both mortality and poor functional outcome after ICH. Though it is promising, caution need to be taken when interpreting the results: rst, the study populations for derivation and validation of these ICH models are different. The baseline characteristics of our study were different from those of western cohorts, such as with younger age of ICH onset, less severity of neurological de cit, smaller hematoma volume, fewer intraventricular extension and lower rate of withdraw of care. Second, the intended outcome (functional outcome vs. mortality), timing (1-month, 3-month, 6-month and 1-year after ICH) and assessment tools (GOS vs. BI vs. mRS) are different for existing ICH models. Finally, there might be complex genetic, social, economic factors as well as regional management philosophies and preferences that are di cult to account for when risk models are developed or applied to a distinct population. These ICH models need to be further validated in more populations and larger samples in the future.
Despite advances in medical knowledge, treatment for ICH remains strictly supportive with not many evidence-based interventions currently available. Medical and surgical treatment, such as blood pressure control [40,41], hematoma evacuation [42][43][44], hemostatic therapy [45], neuroprotection [46] have not shown de nite bene t in improving ICH functional outcomes. It is the time that we need to rethink about the way how we conduct a clinical trial for improving ICH functional outcome. Currently, we have come into a new era of precision medicine. According to the Precision Medicine Initiative, precision medicine is a medical model that proposes the customization of healthcare, with medical treatment or prevention being tailored to individual patient [47]. In precision medicine, patients are strati ed by potential risk and treatment or prevention strategies are developed for patients with speci c risk strati cation. On the country, in traditional medicine model, patients were grouped by symptoms and signs and treatment or prevention strategies are developed for the average person (one size t all), with less consideration for the differences between individuals. Prior ICH clinical trials selected patients mainly based on hematoma volume, hematoma location and time window of symptom onset [40][41][42][43][44][45][46]. In this way, it is inevitable to include patients with unbalanced, too high, or too low risk of developing poor functional outcome in these clinical trials (An additional gure shows this in more detail [see Additional Fig. 6A]). Inspired by precision medicine model, we could use validated prognostic models to stratify patients by potential risk of developing poor functional outcome after ICH and then test speci c intervention in different risk strati cations (An additional gure shows this in more detail [see Additional Fig. 6B]). ICH trials conducted in this way will allow researchers to clarify more accurately which treatment or prevention strategies will work in which risk strati cation patients.
Our study has strength and limitation. To the best of our knowledge, we are the rst to systematically compare predictive performance of 27 ICH risk models with regard to mortality and functional outcome after ICH in a large independent cohort. The nding of the study will provide important information on selecting prognostic tools for personalized care and clinical trial in ICH. However, our study also has limitation that deserve comment: First, central follow-up blinded to baseline variables was made by telephone interview by trained interviewers based on a standardized interview protocol. For patients who lost to follow-up, we made telephone follow-up interview once a week for three times. Although a lot of effort has been done, there is still a relatively high rate of patients with missing follow-up information during 1-year after ICH. We compared the baseline characteristics of patients included and those excluded and they were not statistically different in admission NIHSS score, GCS score, hematoma volume and withdraw of care. Second, we cannot have all elements required for all ICH models and several ICH risk models cannot be externally validated in the study. Meanwhile, some risk models developed based on machine learning were not included in the study as well. Third, our study included only hospitalized patients and those patients died in emergency room or treated in outpatient clinics were not included. Meanwhile, like most registries, our registry required informed consent and selection bias was inevitable. Finally, validation cohorts originated from Asian population and the ICH models needed to be further validated in different populations.

Conclusion
Several risk models are externally validated to be effective for risk strati cation and outcome prediction after ICH, especially the ICH-FOS, which would be useful tools for personalized care and clinical trials in ICH.  Predictive performance of ICH models with regard to mortality and poor functional outcome at 1-month, 3-month and 1-year after onset (n=1575)