Non-survivors and discharged patients with SARS-CoV-2 pneumonia differed significantly in thirty- eight laboratory findings. By using machine learning method, we established a predication model involving seven laboratory features. The model was found highly accurate in distinguishing non- survivors from discharged patients. The seven features selected by artificial intelligence also indicated that dysfunction of multiple organs or systems correlated with the prognosis of SARS-CoV-2 pneumonia.
The SARS-CoV-2 spreads and invades through respiratory mucosa, triggers a series of immune responses and induces cytokine storm in vivo, resulting in changes in immune components.18,5 When immune response is dysregulated, it will result in an excessive inflammation, even cause death.19,7 We found leukocyte and neutrophils count were significantly higher in non-survivors than in survivors.
Excessive neutrophils may contribute to acute lung damage, and are associated with fatality.20 The absolute value of lymphocytes was reduced in SARS-CoV-2 non-survivors, suggesting depletion of lymphocytes caused by strong innate inflammatory immune response. Higher serum levels of pro- inflammatory cytokines (IL-2r and IL-6) and C-reactive protein were found in non-survivors, also indicating excessive immune response. In addition, high leukocyte count in SARS-CoV-2 patients may be also due to secondary bacterial infection.21,5 Elevated procalcitonin was seen in fatal cases, representing more prominent inflammation.22 All these laboratory parameters mentioned above may be associated with prognosis of SARS-CoV-2 pneumonia.
Lung lesions have been considered as the major damage caused by SARS-CoV-2 infection. Severe cases may develop acute respiratory distress syndrome (ARDS) and respiratory failure. However, liver injury has also been reported to occur during the course of the disease,23,24 and is associated with the severity of diseases. Abnormal transaminase levels accompanied by decreased serum albumin and increased serum bilirubin levels were observed in fatal cases. The levels of liver function associated markers were significantly higher in non-survivors compared to survivors. Acute kidney injury could
have been related to direct effects of the virus, hypoxia, or shock.25,26 Blood urea, and creatinine levels continued to increase, until death occurred. Non-survivors had lower eGFR and higher blood urea compared to survivors. Myocardial injury was seen in non-survivors, which was suggested by elevated level of myoglobin, high sensitive cardiac troponin I, or MB isoenzyme of creatine kinase. The pathologic mechanisms of multiple organ dysfunction or failure may be associated with the death of patients with SARS-CoV-2 pneumonia. Some patients with SARS-CoV-2 infection progressed rapidly with sepsis shock, which is well established as one of the most common causes of disseminated intravascular coagulation (DIC).27 Conventional coagulation parameters during course may be also associated with prognosis of SARS-CoV-2 pneumonia. The non-survivors in our cohort revealed significantly longer prothrombin time and APTT compared to survivors. At the late stages of SARS- CoV-2 infection, levels of fibrin-related markers (D-dimer and FgDP) markedly elevated in most cases, suggesting a secondary hyperfibrinolysis condition in these patients.
A number of laboratory features were compared between non-survivors and discharged patients with SARS-CoV-2 pneumonia. The two groups differed significantly in as many as thirty-eight features.
However, none of the futures provided adequate accuracy in predicating the outcome of SARS-CoV-2 pneumonia. Thus, a novel accurate predication model involving multiple features was established in the study. With machine learning methods previously used in radiomics, a predication model combining seven out of the thirty-eight laboratory features was highly accurate in predicating the outcome of SARS-CoV-2 pneumonia, for either training cohort or validation cohort.
The mRMR algorithm was used for assessing significant features to avoid redundancy between features. The features were ranked according to their relevance-redundancy scores. The mRMR score of a feature is defined as the mutual information between the status of the patients and this feature minus the average mutual information of previously selected features and this feature.28,29,17 The top fifteen features with high mRMR scores were selected for the next step of modeling. The least absolute shrinkage and selection operator logistic regression model was used to processing the features selected by mRMR algorithm. LASSO is actually a regression analysis method that improves the mode prediction accuracy and interpretability.30 Some candidate features coefficients were shrunk to zero and the remaining variables with non-zero coefficients were selected. After using LASSO, new signature could be calculated with selected features and their coefficients. The signature used for predication of
outcome can be positive or negative number, corresponding with poor and good prognosis respectively.
Our results showed that the signature provides excellent efficiency for discriminating survivor from
non-survivor. The sensitivity and specificity were both excellent. The AUC of the signature was 10~40% higher than AUC of a single laboratory feature.
As this predication model was established by artificial intelligence, all we did was to match the age and gender of discharged patients and non-survivors before providing laboratory findings to computer. Although the modeling process is a black box to us, the choice of features seems reasonable. PTA can more accurately reflect the coagulation function compared to prothrombin time, and can also reflect the degree of liver injury. Urea is a good index to reflect the degree of renal function damage. WBC can not only reflect immune status, but also be used to evaluate secondary infection. IL-2r is an indicator of inflammation and immune response.20 IB is related to both liver function and possible hemolysis.
Myoglobin reflects the degree of myocardial injury. The increase of FgDP is related to coagulation disorders including DIC. Thus the current model involves multiple important systems closely related to the prognosis. Based on the high accuracy of the prediction model, it seems that we can deduce the following conclusions: liver, kidney, myocardial damage, coagulation disorder and excess immune response all contribute to the outcome of SARS-CoV-2 pneumonia.
One limitation of this model is that it did not cover all laboratory tests. Some important laboratory tests, such as lymphocyte, albumin or creatinine, were not included. Fortunately, there are moderate to high correlations between the unselected and selected features, which is confirmed by our statistical analysis. Furthermore, models involving too many features are not easy for clinicians to use. Another limitation of the model is that it did not involve clinical variables, because we focused on maximizing the predication value of objective laboratory variables.
Our study has some limitations. First, this is a single-center retrospective study with relatively small sample size. There were only 88 patients in training cohort and 22 patients in validation cohort. Multi- center large-sample studies are required to validate our predication model. Second, due to the difference of instrument among centers, the same patient may have different values for the same laboratory test in different hospitals. Our model based on the laboratory data from the author’s center may not be directly used in other centers. However, they could easily establish a predication model using their own data with machine learning method. Third, age and gender were matched for
discharged patients and non-survivors in the current study. It is well established age and gender influence the results of laboratory tests. Because we eliminated the interference of age and gender, the difference of laboratory feature was caused by the disease severity. This study focused on the real predictive value of laboratory tests and aimed to improve prediction accuracy by combining multiple laboratory findings. However, a more complex model combining laboratory features and clinical variables should be constructed in future study. Fourth, it is difficult for general clinicians to understand the method of artificial intelligence. With more and more artificial intelligence used in medical diagnosis, this prediction model will be paid more attention to.
In conclusion, it is feasible to establish a accurate prediction model of outcome of SARS-CoV-2 pneumonia using machine learning method. Injury of liver, kidney and myocardium, coagulation disorder and excess immune response all correlate with the outcome of SARS-CoV-2 pneumonia.