Establishment of a Highly Predictive Survival Nomogram For Patients With Sepsis: A Retrospective Cohort Study

Background: Sepsis is a critical illness common in intensive care units (ICUs) and emergency rooms worldwide and is associated with high morbidity and mortality. However, exisiting methods to predict mortality from sepsis, such as the Sequential Organ Failure Assessment (SOFA) score, are insucient. This paper aimed to construct a nomogram for predicting the 30-, 60-, and 90-days mortality risks of patients with sepsis that is more accurate than the SOFA score alone. Methods: Data on sepsis patients were obtained from the Medical Information Mart for Intensive Care (MIMIC) database and analysed retrospectively. The included patients were randomly divided into a training cohort and a validation cohort. The variables included in the training cohort were selected using a backward stepwise selection method with Cox regression, which were then used to construct a prognostic nomogram. In the validation cohort, we compared our prognostic nomogram with the existing SOFA using the area under the time-dependent receiver operating characteristic curve (AUC), net reclassication improvement (NRI), integrated discrimination improvement (IDI), calibration plotting and decision-curve analysis (DCA). Results: We included a total of 5240 patients in the study, who were divided into the training (n=3667) and validation (n=1573) cohorts. Patient age and the following clinical parameters obtained on the rst day of ICU admission were included in the nomogram: SOFA score, metastatic cancer, SpO 2 , lactate, body temperature, albumin, and red blood cell distribution width (RDW). The AUCs for the 30-, 60-, and 90-days mortality risks were better for our nomogram than for the SOFA score, with values of 0.766, 0.771, and 0.772, respectively, in the training cohort and 0.759, 0.770, and 0.760, respectively, in the validation cohort. The nomogram also showed good calibration, as indicated by the evidence that the predicted values of the survival risks at 30-, 60-and 90-days were in good agreement with the observed values both in the training and validation sets. In addition, IDI and NRI improved


Background
Sepsis, which is de ned as an infection with organ failure characterized by a dysregulation of the host's response to the infection, including in ammation and non-immune responses involving multiple systems according to the International Consensus on the De nition of Sepsis and Septic Shock Version III in 2016,is common in intensive care units (ICUs) worldwide and has a high morbidity and mortality 1-2 . There are more than 19 million sepsis cases in the world annually 1 , and the fatality rate is reportedly as high as 26.10-42.86% 2 .
In the clinic, organ dysfunction can be indicated by an increase in the Sequential Organ Failure Assessment (SOFA) score of 2 points or more, which is associated with an in-hospital mortality rate of more than 10% 3 . SOFA score is a scoring system proposed by the European Society of Critical Care Medicine, which is used to express organ dysfunction. It evaluates the function of six organ systems, including respiration, blood, liver, cardiovascular, central nervous system and kidney, from 0 (no organ dysfunction) to 4 (severe organ dysfunction). The scores of each organ are added together to get the nal total score to evaluate the function of multiple organs throughout the body 4 . Randomised controlled trials have con rmed that SOFA score is associated with mortality in ICU patients with suspected infection and has a predictive effect on in-hospital mortality [5][6] . If a patient has a de nite site of infection and the SOFA score is greater than or equal to 2, the patient can be diagnosed with sepsis. Septic shock, considered a subtype of sepsis, is characterized by particularly serious circulatory, cellular, and metabolic abnormalities associated with a greater risk of mortality than sepsis alone 7 . Patients with septic shock can be identi ed by the need for an antihypotensive agent to maintain a mean arterial pressure of 65 mmHg or greater and a serum lactate level greater than 2 mmol/L (>18 mg/dL) in the absence of hypovolemia 8 . This combination is associated with a hospital mortality rate greater than 40% 32 . A National Institute of Sport, Expertise, and Performance (INSEP) study found that sepsis mortality in the ICU and general hospital was 37.3% and 43.3%, respectively, when using the old sepsis de nition in septic shock and 44.3% and 50.9%, respectively, when using the Sepsis-3 de nition for sepsis 9 .
Early sepsis prognostic assessment is as critical as early sepsis diagnosis and can lead to greater vigilance among medical workers and the provision of timely and appropriate treatment for the patient. Studies have shown that the SOFA and quick SOFA (qSOFA) scores can accurately predict the prognosis of sepsis patients, but nevertheless demonstrate certain limitations [10][11] . For example, improving SOFA does not necessarily reduce mortality, and SOFA scores do not necessarily improve when mortality is reduced. Mortality rates can be signi cantly affected by factors not included in the SOFA score 4 . The SOFA score was developed more than 20 years ago, and the working group that developed the third de nition of sepsis also noted that a number of new markers have emerged that may be superior to existing SOFA indicators, but need to be further validated 6 .
A nomogram is a graphical tool based on a statistical prediction model 12 that is used to determine the probability that a single clinical event can occur in a patient. It combines several risk factors to make an accurate prediction and it is widely used for clinical prediction, such as tumor, sepsis and so on [13][14] . However, so far there is no accurate prediction of the prognosis of sepsis at 30, 60 and 90-day.
Here, we aimed to develop a nomogram that can predict the prognosis of patients with sepsis using SOFA score, metastatic cancer, SpO 2 , lactate, body temperature, albumin, and red blood cell distribution width (RDW). Notably, the developed nomogram is better than the SOFA score at predicting sepsis prognosis.
This nomogram can potentially be used to inform important treatment decisions in order to decrease the risk of death in patients with sepsis. This database provides a large amount of real data that can be utilized in clinical research and comprises information related to patients admitted to critical care units at large tertiary care hospitals. Data include vital signs, medications, laboratory measurements, observations and notes charted by care providers, uid balance, procedure codes, diagnostic codes, imaging reports, hospital length of stay, survival data, and more. All data can be extracted in the SQL language for further analysis. The personnel involved in this research participated in a series of courses provided by the NIH and obtained authorization to access the MIMIC-III database after passing the required assessment (certi cate number 38601114). The patient information in the database is anonymous, so informed consent is not required.

Study population
We used ICD-9 codes 99591, 99592, and 78552 to extract data from 7770 patients diagnosed with sepsis, severe sepsis, and septic shock from the MIMIC-III database. According to the new de nition of sepsis, septic shock has already included severe sepsis in the old de nition. However, since the data collected in the 1.4 version of the database was from 2001 to 2012, the de nition of sepsis may still be used in the old version, so we still used the diagnosis of sepsis, severe sepsis and septic shock when extracting the data. The inclusion criteria were as follows:(1) patients who were 18 years of age or older; (2) patients with more than a 24-hour stay in the ICU to ensure su cient data for evaluation; and(3) patients diagnosed with sepsis according to the Third International Consensus De nitions of Sepsis and Septic Shock (Sepsis-3), including infection and organ failure (SOFA score ≥2).For patients who have been admitted to ICU twice or more, we only focus on the information of their rst admission to ICU.

Data extraction
All data were translated into SQL for further analysis. The hadm_id variable for each included patient was used to extract the following information from the MIMIC-III database: sex; age; SOFA score; continuous renal replacement therapy (CRRT) use; rst care unit (SICU,TSICU,MICU,CCU,CSRU); comorbidities namely congestive heart failure, cardiac arrhythmia, renal failure, liver disease, metastatic cancer (MC), diabetes, coagulopathy, uid electrolytes, and blood loss anaemia; laboratory tests namely white blood cell count (WBC), neutrophil percentage (NET), red blood cell distribution width (RDW), haematocrit (HCT), sodium, potassium, albumin, lactate, and blood pH; and vital signs such as(namely) heart rate, respiratory rate, body temperature, and SpO 2 All of the above information and data were extracted from the rst 24 h of ICU stay

Statistical analysis and nomogram construction
Continuous variables are expressed as the mean and standard deviation, while categorical variables are expressed as percentages. Multivariate Cox regression was used to select variables for plotting the 30-, 60-, and 90-days survival curves of the patients. The survival-probability nomogram was constructed using Cox regression. The included patients were divided into a training cohort and a validation cohort . The training cohort data were subjected to multifactor Cox regression analysis to control for confounding factors. The analysis presumed that the effects of the predictor variables were constant over time and that there was a linear relationship between the endpoint and predictor variables. Predictor variables that had a highly skewed distribution were subjected to logarithmic transformation to reduce the effect of extreme values, in which case the value of log(variable) could be entered as a predictor variable. SPSS (version 24.0, Chicago, Illinois, USA) and R (version 4.0.2; https://www.r-project.org/) were used for data analysis. A P value <0.05 in a two-sided test was considered statistically signi cant.

Nomogram validation and performance evaluation
The validity of the nomogram was assessed based on its discrimination performance and by constructing both internal (with the training cohort) and external (with the validation cohort) calibration curves. A comparison can be made between the performance of the two models using receiver operating characteristic (ROC) curve analysis and the area under the ROC curve AUC , The predictive accuracy of the model) was determined by calculating the integrated discrimination improvement (IDI) and the net reclassi cation improvement (NRI) 17 . NRI is used to compare the diagnostic ability of two indicators at a certain threshold, whether one indicator is more accurate than the other. If the NRI is much higher than 0, it is positive improvement (improved prediction ability); if it is little more than 0, it is negative improvement (decreased prediction ability); if it is equal to 0, it is no improvement. IDI is used to re ect the overall improvement of the model at different thresholds. Similar to NRI, if IDI > 0, it is positive improvement, indicating that the prediction ability of the new model is improved compared with the old model; if IDI < 0, it is negative improvement, and the prediction ability of the new model is decreased; if IDI = 0, it is considered that the new model is not improved. "Positive outcomes are situations such as succeeding, winning, or being cured of an illness, while negative outcomes are situations such as failing, losing, or succumbing to an illness. Finally, the net clinical bene t of the predictive model developed in the present study was assessed using decision-curve analysis (DCA) 18 .

Characteristics of the study patients
A total of 7,770 patients diagnosed with sepsis, severe sepsis and septic shock were collected from the database between 2001 and 2012. Finally, after excluding duplicate patients, 5240 patients were included in the study. For the laboratory examination results, the indexes with> 20% missing values were omitted, and the remaining data were lled with the multiple difference complement method. The outcomes were known for all 5240 sepsis patients included from the MIMIC-III database. These patients were randomly assigned to the training (70%, n=3667) and validation (30%, n=1573) cohorts for constructing and validating the nomogram, respectively. The median SOFA score of the entire patient cohort was 6, indicating XXX. The median ages in the training and validation cohorts were 68 and 67 years, respectively; most patients were male (55.7% and 54.9%);. the medical ICU (MICU) was the most common rst care unit (69.2% and 69.9%); uid electrolyte was the most common complication (54.8% and 53.6%). Moreover, most patients in both groups had a WBC count between 10 and 40 k/µl (61.5% and 61.7%); the NET was mainly >70% (78.6% and 78.2%); the median SpO 2 values were 97.14 and 97.19, respectively; and the median albumin levels were 2.90 mg/dl and 2.80 mg/dl, respectively. In both cohorts, the median lactate level was 2.20 mmol/l, the median RDW was 15.40%, the most common sodium level was between 130 and 149 k/µl, the most common potassium level was between 3.5 and 5.6 k/µl, the most common PaCO 2 value was between 36 and 45 mmHg, the most common heart rate was between 60 and 100 min-1, the most common respiratory rate was between 20 and 30 min-1, and the most common body temperature was between 36 and 37.2°C. The distributions of all variables are presented in Table 1.
Multivariate Cox regression analysis and construction of a predictive nomogram with the training cohort Multifactor Cox regression analysis was performed with the data from the training cohort to control for confounding factors. The results of the Cox regression analysis are listed in Table 2. (After performing a comprehensive evaluation of the variables and then applying Occam's razor 18 (the simplest explanation is preferable to one that is more complex), the variables included in the nomogram for predicting the 30-, Validation of the performance of the nomogram As described above, a nomogram for predicting the outcome of sepsis patients was established by including age, SOFA score, MC, SpO 2 , lactate, body temperature, albumin, and RDW. There was a correlation between the calibration and standard curves for the present nomogram in the calibration graphs of the training and validation cohorts, indicating that the predicted values of the survival risks at 30-, 60-, and 90-days were in good agreement with the observed values ( Figure 3).

Clinical use
Mathematical models incorporate various data sources and innovative computational methods to portray real-world disease transmission and translate the basic science of infectious diseases into decisionsupport tools for public health. DCA was used to verify the clinical value of the model and its impact on actual decision-making. We can see that the net bene t of the nomogram is greater than the SOFA score for any predicted probability, both in the training set and in the validation set. So the preferred model is the nomogram, the net bene t of which was larger over the range of SOFA score. The resulting plots show XXX, indicating that it has a substantial net bene t in predicting 30-, 60-, and 90-days survival rates ( Figure 4).

Discussion
In this retrospective study using data from the MIMIC-III database, we integrated age, SOFA score, MC, SpO 2 , lactate, body temperature, albumin, and RDW to generate prediction models using Cox regression analysis, and the best-tting model was visualized as a nomogram. the new model has better predictive performance than the model that does not include surgery There was a correlation between the calibration and standard curves for the present nomogram in the calibration graphs of the training and validation cohorts.
Many patients with various complex diseases present to the emergency room and ICU, and accurate evaluations of the severity of critical illnesses, reasonable judgement of prognoses, and selection of appropriate interventions are essential clinical skills for emergency room physicians. Several scores are widely used for such clinical situations, including the SOFA score, Logistic Organ Dysfunction Score (LODS), Acute Physiology And Chronic Health Evaluation II (APACHE ) score, and Simpli ed Acute Physiology Score (SAPS) [19][20] . The SOFA score in particular has been demonstrated to be a valuable tool for predicting short-term mortality in patients with sepsis [21][22] . The present nomogram represents an improvement over the SOFA score.
The AUC of the SOFA score were used to assess the performance of the nomogram prediction model, demonstrating the excellent discriminability of the model. The calibration curves of the new nomogram matched the standard curve (that is, the identity curve) very well in both the training and validation cohorts; in other words, the predicted 30-, 60-, and 90-days survival probabilities for sepsis patients obtained from the model were in good agreement with the observed values. Moreover, the 28-day DCA curves for both the training and validation cohorts demonstrated that the nomogram produces a net bene t. Together, these results show that clinical patients can bene t from the use of the nomogram based on our prediction model.
In our nomogram, age had a relatively strong in uence. Studies have shown that age is an independent risk factor for death in patients with sepsis and that the fatality rate increases linearly with age 23 . Our research revealed the same trend; the nomogram score increased with age, which might be due to elderly patients being more susceptible to infection with gram-negative bacteria 23 , having more comorbidities such as cancer 24 , and having weaker immune function 25 . The predictions made using the nomogram regarding whether a patient has MC are also related to the immune system, and the inclusion of albumin in the nomogram re ects the tendency of older patients to have a worse nutritional status than their younger counterparts. XXX [26][27] The haemodynamic performance of patients with septic shock is exceptionally complicated. In sepsis, endothelial cell dysfunction, the interaction of leukocytes/platelets and endothelial cells, coagulation activation, in ammation, abnormal haemorheology, and functional shunting together lead to microcirculatory disorders, insu cient tissue perfusion, and hypoxia, ultimately leading to multiple-organ dysfunction or even septic shock 28-29 . SpO 2 can re ect the body's real-time oxygen supply state and the degree of hypoxia and can therefore serve as a factor associated with sepsis. Additionally, it can be measured more conveniently and rapidly than the arterial blood gas level [19][20] , making it an attractive parameter for use in prediction models and justifying its recruitment and inclusion into our nomogram.
Patients with sepsis or septic shock will experience anaerobic metabolism due to microcirculatory disturbances and lactate accumulation. The serum lactate level has traditionally been interpreted as a sign of tissue hypoxia and is often used clinically to indicate the severity and prognosis of sepsis/septic shock [30][31] . Lactate also has a high predictive value in our nomogram, with the score increasing with the lactate level.
The indicator with the highest weight in our nomogram was RDW, which has been used as a prognostic biomarker for cardiovascular disease, stroke, and metabolic syndrome mortality 32 . Another retrospective study based on the MIMIC-III database found that the RDW value was useful in predicting the long-term all-cause mortality of severe sepsis patients 33 . A multicentre observational study showed that RDW is a powerful predictor of the risk of all-cause death and blood infection in critically ill patients 34 . Moreover, an increase in the distribution of red blood cells when patients are discharged from the ICU is a powerful predictor of subsequent all-cause mortality 35 . These observations together show the predictive value of RDW. However, while this index is commonly measured in clinical workups, it is often ignored and, given its importance in the above studies and our results, deserves greater attention.
The limitations of this study include its retrospective design based on the MIMIC-III database. We also internally validated our nomogram using only data from one database. This limitation could be addressed in future studies by using a separate database or data from a separate group or study. XXX Additionally, for septic microbial infections, it is necessary to include the time the cultures were initiated and for antibiotics to be administered within a speci c period. This information is di cult to extract from the database and should be addressed in future research on sepsis outcome prediction.
Sepsis is a complex disease, and while the guidelines on sepsis are constantly being updated, the mechanism underlying the pathophysiological changes it induces remains di cult to characterize 36-37 .
Our understanding of sepsis could be strengthened by summarizing patient data to provide timely and effective treatment when attempting to control the disease. The nomogram that we constructed here has considerable strengths in providing accurate predictions of mortality risk of patients with sepsis and is easy to use and, therefore, highly suitable for clinical application.

Conclusions
In the information era, data reusability and data sharing strategies are receiving increasing attention worldwide 38-37 . Nomograms are a signi cant component of modern medical decision-making. The nomogram that we established here better predicts the prognosis of patients with sepsis than the SOFA score. However, this nomogram still needs to be externally validated in another population from a different country. In addition, a larger population will yield better results.

Declarations
Ethics approval and consent to participate This study was approved by the research ethics committee of Jinan University.

Consent for publication
Not applicable.

Competing interests
The authors declare that there is no con ict of interest.

Authors' contributions
Haiyan Yin and Jun Lyu: conceptualized the study and revising the article; Hui Liu: data analysis, drafting the article, revising the article, and nal approval.  Figure 1 Nomogram predicting 30-, 60-, and 90-day survival. This is the histogram we created to evaluate the 30day, 60-day, and 90-day survival of patients with sepsis. All parameters in the rst column is (the eight parameters from age to red blood cell distribution width) of the score, according to the different parameter values, we can draw a vertical line to get the parameters of the score, the score of eight parameters together, get the value, according to the score values correspond to the following scale of survival rates 30 days, 60 days and 90 days of patients with draw a vertical line can be individualized for 30 days, 60 days and 90 days of survival.   Decision-curve analysis of the training set (A, C, E) and validation set (B, D, F) for 30-, 60-and 90-days survival. In the gure, the abscissa is the threshold probability, and the ordinate is the net bene t rate. The horizontal solid line indicates where all samples are negative and no patient is treated, with a net bene t