Derivation and validation of a novel and parsimonious prognosis prediction score for ischemic stroke patients: The S2AFI Score

Background To develop and validate a novel and integer-based score to predict the unfavorable outcome at 3-month after ischemic stroke using a dataset from the Chinese population. Methods We retrospectively included 394 patients presented to our stroke center within 24 hours after symptom onset. 70% of them were randomly selected as the model derivation dataset and the remaining 30% as the model validation dataset. Step-wise logistic regression were applied to identify the strong predictors of the unfavorable outcome, which was dened as modied Rankin Scale (mRS) >2. Results We identied 4 strong predictors in the nal prediction model: stroke severity, age, serum brinogen level and intravenous thrombolytic therapy. 2 points were assigned when patients with NIH stroke scale higher than 6 at admission. 1 point was assigned to patients older than 71 years old and serum brinogen level higher than 2.98g/L respectively. For patients miss the opportunity of intravenous thrombolysis therapy, 1 point was assigned. We called our score system as S2AFI score and it ranges from 0 to 5. The Youden index of the S2AFI score was 2 which means patients with 3 points or higher tend to have unfavorable outcome in the future. The discriminative ability of the S2AFI score as dichotomized predictor achieved an area under the curve of 0.72 (95% Condence Internal 0.66-0.79) in the derivation dataset, 0.72 (95% Condence Internal 0.61-0.83) in the validation dataset and 0.68 (95% Condence Internal 0.63-0.74) in all patients. While for the S2AFI score as continuous predictor, the score achieved an area under the curve of 0.76 (95% Condence Internal 0.69-0.83) in the derivation dataset, 0.74 (95% Condence Internal 0.62-0.87) in the validation dataset and 0.74 (95% Condence Internal 0.68-0.80) in all patients. There was a strong correlation between the S2AFI score and the risk of unfavorable outcome at 3-month (P


Background
Stroke has been ranked the leading cause of death and acquired disability in recent years for Chinese people and it is also associated with 43.7 million annual disability-adjusted life-years worldwide [1][2][3]. The situation is expected to be even worse in consideration of the population ageing. Ischemic stroke accounts for almost 80% of all strokes. As the advanced treatment like the intravenous thrombolytic therapy and the endovascular thrombectomy being applied increasingly in daily practice, the mortality rate of stroke patients has the inclination to be stable for last decades [3]. Unfortunately, not all the ischemic stroke patients are quali ed of receiving those therapies. Patients dead in the hospital or with severe disability after ischemic stroke are unacceptable for their family members. Answering the questions asked by patients or their family members regarding the prognosis is quite hard when stroke physicians based only on their experiences. There are a considerable number of prognosis prediction models developed to assist decision making for neurologists in clinical practice in recent years, while few of them are available for Chinese population [4][5][6][7][8][9][10][11]. The CoRisk score facilitate their application through online calculator, which the website development costs are quite high [4]. A prediction tool needs to be parsimonious like the ABCD2 score and with good predictive ability at the same time [12]. In this study, we aimed to develop and validate a novel and integer-based parsimonious prognosis prediction score using dataset from the Chinese population to assist stroke physicians in clinical practice.

Study design and population
This prediction score developed in accordance with the guidelines of the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis [13]. The study protocol was approved by the Institutional Review Board of Minhang Hospital a liated to Fudan University. Written informed consent was obtained from all patients or their welfare guardians for data collection and subsequent analysis. The data that support the ndings of this study could be available from the corresponding author under reasonable request.
This was a retrospective study using prospectively collected hospital-based dataset. We included all the ischemic stroke patients, con rmed by Computer Tomography (CT) or Magnetic Resonance Imaging (MRI), admitted to our stroke center within 24 hours after symptom onset from January, 2018 to July, 2019. 70% of them were randomly selected as the model development dataset and the remaining 30% of them as the model validation dataset. As for patients with unknown symptom onset time, we de ned the time last known well as the onset time. Patients with hemorrhage stroke, transient ischemic stroke or prestroke disability were excluded from this study. The unfavorable outcome at 3-month was de ned as modi ed Rankin Scale (mRS) >2 and it was evaluated through out-patient routine visit or structured telephone interview by experienced neurologists or trained nurses who were blind to the patients' archives. Patients lost follow-up or with severe complications during hospitalization were also excluded.
Ischemic stroke subtype was classi ed according to the Trial of Org 10172 in Acute Stroke Treatment (TOAST) criteria by the experienced neurologists at our center [14].

Predictors included in the multivariate logistic regression
Stroke severity was assessed on admission with the NIH Stroke Scale (NIHSS). Risk factors like alcohol consumption and smoking habit were de ned as binary variable (0 No or 1 Yes). We gave 1 (Yes) to the patients who had those habits no matter whether they have quit it or not. Other risk factors like hypertension and atrial brillation (AF) were de ned according to standard clinical criteria and con rmed within the 24 hours after admission. The blood sample was measured within 24 hours after admission. All laboratory tests (White Blood Cell, Low Density Lipoprotein, Homocysteine, Glycated hemoglobin, Creatinine, and Fibrinogen) were treated as binary variables in our model. Recanalization therapy including intravenous thrombolytic therapy (IVT) and endovascular thrombectomy (EVT) was also treated as binary variables (0 Yes 1 No).
Strategies for developing a novel and parsimonious prediction score In order to facilitate the use of our prediction model, we transformed our model into a scoring system. All the continuous predictors were dichotomized into binary variables by calculating their Youden index. The optimal cut-point of the variable age was 71 years old, stroke severity was 6, White Blood Cell (WBC) was 6.83×10 9 /L, Low Density Lipoprotein (LDL) was 2.93mmol/L, Homocysteine (HCY) was 13umol/L, Glycated hemoglobin (GH) was 6.2%, Creatinine was 81mmol/L, and Fibrinogen was 2.98g/L respectively. These dichotomized variables were put into the multivariate logistic regression. Forward and backward step-wise logistic regression were applied by using the likelihood ratio test with Akaike's information criterion (AIC) as the stopping rule. The P value of the selection was set at <0.1 and <0.2. The model with the smallest AIC value, indicating the best predictive ability, was our target model. The βcoe cients of the predictors in the target model was rounded to its nearest integer as the score of each predictor. The S 2 AFI score was generated by calculating the sum of each score. The optimal threshold of the novel score was determined from receiver operating characteristic (ROC) curve analysis.

Statistical analysis
Since we dichotomized all the continuous variables, Fisher exact test was used to compare the differences between the groups. The analysis of the variables as continuous variable was shown in online supplement material.
All the predictors included in the multivariate logistic regression analysis were con rmed with no strong collinearity (variation in ation factors <2). The Youden index of those predictors was calculated by Liu Method [15]. The performance of the novel prediction score was tested with the ROC curve analysis. After the prediction score developed, we validated its performances in two datasets: one is the 30% validation dataset and the other is all the patients included in this study. The area under the curve (AUC) of the score as a continuous predictor and dichotomized predictor in these two datasets was calculated respectively. The calibration of the score was assessed for goodness of t by plotting the estimated probability on the x-axis against the observed probability along the y-axis compared with the diagonal line, representing perfect calibration. P for trend test was applied to test whether the higher score is related to the higher risk of achieving an unfavorable outcome at 3-month. The statistical analysis was performed on STATA (Version 15.0 Stata Corp College Station, Texas, USA) and R software (R version 3.5.3 The R Foundation for Statistical Computing). Two tailed P value less than 0.05 was considered statistically signi cant.

Results
429 patients arrived at our stroke center within 24 hours after symptom onset during January, 2018 to July, 2019. There were 31 patients excluded from this study because of the discharge diagnosis is not ischemic stroke. Patients with cancer (1) and pre-stroke disability (2) were excluded. 1 patient died during hospitalization was also excluded. Finally, 394 patients were included in this study. Of them, 21.1% (83) had an unfavorable outcome at 3-month. The mean age of all the patients was 68.61 years.
In brief, the univariate and multivariate analysis of all the predictors in our dataset were shown in online supplement material. The β-coe cients of our novel prognosis prediction model were shown in Table 1. 2 points were assigned to the stroke severity when patients with NIHSS higher than 6 at admission. 1 point was assigned to patients older than 71 years old and serum brinogen level higher than 2.98g/L respectively. For patients miss the opportunity of intravenous thrombolysis therapy, 1 point was assigned.
We called our score system as S 2 AFI score and it ranges from 0 to 5. The performances of the S 2 AFI score to predict the unfavorable outcome at 3-month were shown in Table 2. The Youden index of the S 2 AFI score was 2 which means patients with 3 points or higher tend to have unfavorable outcome in the future. The discriminative ability of the S 2 AFI score as dichotomized predictor achieved an area under the curve of 0.72 (95% Con dence Internal 0.66-0.79) in the derivation dataset, 0.72 (95% Con dence Internal 0.61-0.83) in the validation dataset and 0.68 (95% Con dence Internal 0.63-0.74) in all patients (Figure 1).
While for the S 2 AFI score as continuous predictor, the score achieved an area under the curve of 0.76 (95% Con dence Internal 0.69-0.83) in the derivation dataset, 0.74 (95% Con dence Internal 0.62-0.87) in the validation dataset and 0.74 (95% Con dence Internal 0.68-0.80) in all patients ( Figure 2). The calibration plot was shown next to their ROC curve analysis. There was a strong correlation between the S 2 AFI score and the risk of unfavorable outcome at 3-month as shown in Figure 3 (P for trend <0.001).
Higher scores augmented the probability of achieving unfavorable outcome at 3-month.

Discussion
Our study showed the novel integer-based S 2 AFI score achieved within 24 hours after admission had good performance in the prediction of unfavorable outcome at 3-month after acute ischemic stroke. Early identi cation of those patients who at high risk of death or severe disability becomes more and more important in recent years. For one reason, patients or their family members wouldn't understand the cause of the post-stroke disability after spending plenty of money and this would aggravate the doctorpatient intense relationship in China, so early communication between them for those patients at high risk becomes quite critical. Our 5-score prediction tool had the potential to be helpful in explaining the risk of the unfavorable outcome in the future to the patients or their family members. Another reason is that the importance of early rehabilitation for those patients at high risk of post-stroke disability during the hospitalization. Since there is an upsurge in the number of young stroke patients in the past decade [16], post-stroke disability may cause not only the loss of huge amount of money but also the loss of the hope of bright future for them. Sometimes, the decision regarding implementing early rehabilitation based only on stroke physicians' experiences was not enough, a lean, easy-to-calculate, without human-biased S 2 AFI score had the potential to assist them.
There have been a lot of prognosis prediction models published in the leading clinical journals for ischemic stroke patients such as the ASTRAL score, iScore and the CoRisk score [4][5][6]. In 2017, Dr Quinn compared the performances of 8 prognosis scales using a large, independent, clinical trials dataset and his team found that the ASTRAL score showed the superior utility in comparison of the other prediction models [7]. The ASTRAL score is also an integer-based score like our S 2 AFI score, consisting of six items, developed from the dataset in Switzerland. In 2013, Dr Wang validated that the ASTRAL score is a reliable tool to predict long-term unfavorable outcome after ischemic stroke in the Chinese population [17]. Not like the ABCD2 score being widely used in the prediction of the risk of recurrent stroke, even with this good performance, the ASTRAL score still wasn't included in stroke guidelines. In the clinical practice, it is always debatable whether the performance perceived as statistically acceptable are also clinically acceptable. The Youden index of the S 2 AFI Score was 2 which means patients with 3 points or higher tend to have unfavorable outcome at 3-month, while it's worth noting the speci city at this cut-point was insu cient, indicating that almost one of two patients would be wrongly assigned a poor prognosis. It's acceptable when we utilize the score as a convenient tool to inform patients or their family members the risk of poor prognosis and the importance of early rehabilitation, while it's not recommended for introducing the high-cost medicine based on the cut-point of the score. Since the performance of the score as continuous predictor was superior to the dichotomized predictor, the application of the score could be quite exible.
Our novel and parsimonious S 2 AFI score included four strong predictors, which are all easily available within 24 hours after admission, ranging from 0 to 5. We didn't include the past medical history as potential predictors in the score development process as in clinical practice, many patients didn't know whether they have it or not and sometimes the answers from the patients or their family members are contrary to the test results. We con rmed that all the potential predictors included in the study in the rst 24 hours after admission. Including thrombolytic therapy in our S 2 AFI score accounts for the progress made in the last 2 decades and it also promotes the concept of fast reaction when stroke attacks. As the hometown of Stroke 1-2-0 campaign [18], this score had the potential to be applied in the stroke knowledge promotion. For patients presented late to the stroke center and miss the opportunity of administering the thrombolytic therapy, the S 2 AFI score would be higher than those patients acted immediately after symptom onset. Fibrinogen as a serum marker has been found to be associated with not only the development of stroke but also the outcome after stroke [19][20][21]. Early brinogen depletion after reperfusion therapy including IVT and EVT was also related to symptomatic intracranial hemorrhage [22]. We identi ed 2.98g/L as the optimal cut-point for the prediction of the unfavorable outcome after ischemic stroke which was consistent with the study published in 2009, showing that there were a considerable number of patients had an unfavorable outcome with the serum brinogen level lower than 3.0g/L [20]. Age and stroke severity as two strong predictors have been included in many prognosis prediction models, these two predictors alone were also the leanest prediction models [10,23].

Limitations
The limitations of this study were that this was a single-center retrospective study and the number of the patients included in the development of the S 2 AFI score was quite small. The low amount of the patients dead at 3-month also limited us to develop a model predicting the risk of death after the ischemic stroke. Although we developed the prediction model using four commonly available clinical variables at the early stage of ischemic stroke, the inclusion of neuroimaging predictors like the site of the occlusion and the infarct size may improve the performances of our model. In order to get a parsimonious prognosis prediction score, we dichotomized the continuous predictor into binary predictor and this may sacri ce part of their predictive ability. Lacking external validation of our S 2 AFI score in a large multicenter dataset was also a aw of this study, so multicenter collaboration is quite crucial in the future. An externally validated prediction model could also have the potential to be used in the evaluation of stroke care among different stroke centers in our country. Finally, further prospective studies assessing the prognostic accuracy of the S 2 AFI score was warranted in the future.

Conclusion
Our novel and parsimonious S 2 AFI score had good performance in the prediction of unfavorable outcome at 3-month after ischemic stroke. The score had the potential to be utilized in clinical practice as a convenient tool to inform patients or their family members the risk of poor prognosis and the importance of early rehabilitation after ischemic stroke. The study protocol was approved by the Institutional Review Board of Minhang Hospital a liated to Fudan University. Written informed consent was obtained from all patients or their welfare guardians for data collection and subsequent analysis.

Consent for publication
Not applicable.

Availability of data and materials
The data that support the ndings of this study could be available from the corresponding author under reasonable request. Tables Table 1 The β-coefficients of our novel prognosis prediction model   Figure 1 The discriminative ability of the S2AFI score as dichotomized predictor A : the derivation dataset B : the validation dataset C : the whole dataset The calibration plot was shown next to their ROC curve analysis.

Figure 2
The discriminative ability of the S2AFI score as continuous predictor. A : the derivation dataset B : the validation dataset C : the whole dataset The calibration plot was shown next to their ROC curve analysis.