This single-center study aimed to investigate the ability of the three commonly used scoring systems to predict mortality in critically ill patients. We revealed that although all of them were of comparable accuracy in predicting in-hospital mortality, either APACHE III or SAPS II should be recommended as the first-choice tool. The usefulness of the systems in 12-month outcome prediction in the ICU survivors proved to be limited.
We found that in-hospital ICU mortality rate was 35.6%. which was relatively high compared to international data, but lower than the value observed in the Silesia region (43.7%) (22). Higher mortality in Polish ICUs compared to other European countries (23) was under debate in recent years and is rather due to differences in patient populations. indications for ICU admission, availability of ICU beds and organization of end-of-life care in Poland. This is also due to the skeptical attitude of some practitioners regarding guidelines on futile therapy (24,25) and official ICU admission criteria (26). Patients admitted to Polish ICUs are more often at higher risk of death compared with other countries, but ICU mortality observed in the Silesian Registry of Intensive Care Units was lower than that predicted by the APACHE II scoring system (27).
In our study, APACHE II, APACHE III and SAPS II scores and the predicted ICU mortality were as follows: 19 points (i.e. mortality rate of 28.1%). 67 (mortality rate of 18.5%) and 44 points (mortality rate of 34.8%). respectively. In case of all three studied scales expected mortality was lower than observed. The cause of this phenomenon appears to be complex, it may result from substantial differences between the patient’s population in our unit (mixed admissions. including post-operative cases in the first priority) and target populations these prognostic models were developed for. Medical patients were confirmed to have higher mortality than surgical patients, which is in line with previous research on this issue (28).
The reliability of the data collected is important because poor source data quality, the number and type of missing physiological variables can influence the mortality predictions. In the original APACHE II study variables were missing in 13% of the cases (29). In our data series 14% of variables were missing in total in all three studied scoring systems that should be taken into account in data interpretation. The process of data collection is burdened with high risk of bias. In case of APACHE II score it was observed that the main causes of data errors are inconsistent choice between highest and lowest values and problems with GCS score determination in sedated patients (29). We used the pre-sedation GCS in sedated patients if available, data was always verified by two members of the study team independently.
Two main objective criteria are used for prognostic scales performance evaluation: calibration and discrimination. Calibration refers to how closely the estimated probabilities of mortality correlate with the observed mortality and is of great importance for clinical trials or comparison of care between ICUs. Discrimination refers to the ability of a prognostic score to classify patients as survivors or non-survivors and is measured by ROC curves (i.e. AUC). In our study all three investigated systems predicted in-hospital mortality with good diagnostic accuracy (AUC of ~ 0.8), with no statistically significant differences between them. Our observations remain consistent with the previous studies proving high accuracy of the scoring systems (28,30–32). The most powerful tool was APACHE III (AUC = 0.793) together with SAPS II (AUC = 0.792). SAPS II seems to perform better in our clinical setting as its observed-to-predicted mortality rate was 1.02 compared to 1.27 and 1.92 for APACHE II and III, respectively. In a study by Beck et al. who validated the same prognostic models in 16,646 adult ICU patients in Southern UK similarly good discrimination was reported for all three scales. but calibration was imperfect (28). APACHE II score was more reliable than SAPS II and APACHE III in ICU patients in a study by Gilani et al (31). Similar findings come from a study by Khwannimit et al. who compared SAPS II and APACHE II. The latter model performed better in Thai ICU patients, however also in this case calibration of both scores was poor. In contrast, Sungurtekin et al. reported better prognostic accuracy for SAPS II than APACHE II in organophosphate poisoned ICU patients (33). Another study by Godinjak et al. demonstrated comparable high diagnostic accuracy of APACHE II and SAPS II (32). Differences in the performance of scoring systems might result from variation in case mix. standards. structure and organization of medical care. lifestyles and genetic differences between populations (7). Therefore, despite numerous studies performed so far on this subject, there is still a need to validate these prognostic models using data of independent samples from different ICUs in different countries or even regions.
In the present study APACHE III score was the most powerful in predicting early mortality among surgical ICU patients. Surgical patients have better survival prognosis than medical ICU patients (6,30). The explanation of this fact is quite simple: in these patients the reason for ICU admission is mostly their unstable condition resulting from the performed long-lasting extensive surgical procedure, and not as much from their poor general condition prior the surgery or their comorbidities.
All three investigated systems predicted 12-month post-discharge mortality in a statistically significant way, however their diagnostic accuracy was much lower (AUC of ~ 0.7). In a study by Angus et al. (19) APACHE II score was also predictive of 1-year mortality (AUC of 0.671) in patients undergoing liver transplants. In contrast, a study by Lee et al. reported no relation between the scores calculated on admission post-discharge mortality (34). Lower diagnostic accuracy in predicting long-term mortality could be due to various reasons. The scores are calculated during the first 24 h following admission using the worst results. The treatment implemented during ICU stay, eventual complications and the quality of the follow-up care and rehabilitation, influence patient’s outcome and can change the results provided by the scoring systems. Lee et al. found that the discharge APACHE II score was a good predictor of post-ICU mortality and readmission (34). Therefore, it would be more reasonable to focus on the scores calculated to estimate the long-term prediction of the patients on their discharge from the ICU. Because currently available tools have not been initially designed for such an application further studies should be conducted to create scores estimating the long-term prediction. In this context one ought to bear in mind that proper screening and accurate identification of patients who will stay at risk after their successful discharge from the ICU might be of great importance to avoid ICU readmissions, further deterioration of quality of life and higher post-discharge mortality.
The present study has some limitations. As a single-center study, there may be bias with regard to heterogeneous population and relatively small sample size. The final results in the scoring systems may be affected by the confounding effect of the data selection process and Glasgow Coma Scale results calculation. The follow up period in our study was limited to 12 months after the date of ICU admission. Finally, we did not include SOFA score into our analysis. However, that scoring system was primarily created for prognostication among septic patients so seems less universal in the mixed ICU setting than APACHE or SAPS (35).