Study Design and Participants
The retrospective cohort study consists of 733 patients diagnosed with COVID-19, the collected patients were admitted to Huangpi Hospital of Traditional Chinese Medicine (Wuhan, China) from January to March 2020 by the Guangxi Medical Team joined the battle against COVID-19. Method for laboratory confirmation of SARS-CoV-2 infection have been described elsewhere [17], [18]. Briefly, the methods of next-generation sequencing, real-time reverse-transcriptase polymerase chain reaction (RT-PCR) or Immunoglobulin M (IgM) and Immunoglobulin G (IgG) antibodies can be utilized to diagnose patients with COVID-19 [18]. All patients obtained the throat-swab specimens and reviewed every other day via treating.
This study had been approved by the First Affiliated Hospital of Guangxi Medical University Hospital Ethics Committee and the requirement for informed consent was waived.
Data Collection
The data were extracted from electronic medical records. For each patient, three types of factors including demographic, clinical and laboratory results were extracted. The demographic factors include the medical history and census information, such as gender, age, presence or absence of comorbidities, time from onset to admission, time from admission to ICU care and death, main symptoms at admission. The clinical and laboratory examination includes chest radiographs or CT scans, treatment measurement, and daily routine tests minutely recorded (12 factors such as pulse, respiration rate, blood pressure, body temperature, oxygen saturation, heart rate, etc.). The symptoms present referred to the first symptoms related to the main complaint such as fever, cough, fatigue, diarrhea, etc. There are in total 909 factors are indexed for each patient, resulting in a comprehensive characterizing the disease progression. All data were handled by computer professionals and checked by two physicians (HW and JZ).
Laboratory Procedures
Routine blood examinations include complete blood count, coagulation profile, serum biochemical tests (including liver function (twelve items), renal function electrolyte (twelve items), blood lipid and blood glucose (three items), procalcitonin detection and fluorescence, glucose determination (various enzymatic methods), six sets of coagulation, five categories of complete blood count + CRP), respiratory tract infection pathogen IgM 9 items and influenza A/B virus antigen detection. Considering, 173 examination indicators extracted from the inpatients were collected.
Study Definitions
Fever was defined as axillary temperature of at least 37.3℃. The illness severity of COVID-19 was defined according to the Chinese management guide for COVID19 (version 7.0) [4]. The critical patients indicate that they should be admitted into the ICU. The criteria for inclusion in the ICU were 1) respiratory failure and requires mechanical ventilation, 2) shock, 3) combined with other organ failures. Due to the limited medical resources, it is not guaranteed that those who meet the above three conditions can be included in the ICU. The critical patients who should be admitted into ICU yet they did not due to the lack of ICU beds, herein this type of patient is named Missing ICU. All patients in the ICU meet the aforementioned three conditions or even serious. The mortality of the patients who have admitted into ICU was named by ICU-mortality. Hepatorenal insufficiency indicated liver or kidney dysfunction, such as cirrhosis, hepatic carcinoma, renal cyst, etc. CT scan for double lung infection indicates abnormal CT manifestations, such as Ground-glass Opacity, Consolidation, Reversed Halo Sign, Fibrosis, Septal Thickening, etc.
Continuous variables were quantified by six statistical measurements, including median value, mean value, maximum value, minimum value, standard deviation, and interquartile range (IQR) [10]. The six measurements are enough comprehensive for variables following normal distribution. Categorical variables were expressed as 0 or 1. All features (909) were extracted from demographic, clinical and laboratory results for modeling, analysis and forecasting. Statistics reveal that 143 factors were continuous variables (858 features) and 51 factors were categorical variables (51 features).
The patients were dichotomized into two subgroups by thresholds. Accordingly, we calculated the resulted values including true positive rate (TPR) and false positive rate (FPR) and draw its receiver operating characteristic curve (ROC). The area under the curve (AUC) was calculated to measure the prognostic power for each factor. The value the close to 1, the better prognostic power. The top ten factors with the largest AUCs were extracted to build a prognostic classification model.
Statistical Analysis
The Mann Whitney-U test, T-test, χ2 test, or Fisher’s exact test were utilized to compare the differences between the identified two subgroups where it applies. We involved the top ten factors which have the largest AUC value. Boxplots were drawn to illustrate the statistical differences.
Estimating the MI-mortality for the patients who may survive
This study aimed to estimate the mortality for the critical patient who should be admitted into the ICU intervention in early time yet did not due to various causes. To this end, we firstly built a prognostic model for identifying the patients who were critical patients, i.e., who need ICU care. The study chart is demonstrated in Fig. 1.
The building of a prognostic model for identifying the critical in-patients who need ICU care. We involved the patients who were firstly admitted in-hospital and then received ICU care. Such patients were labeled by “ICU-care”. Those in-hospital patients who were not received in ICU until discharge were labeled by “Non-ICU-care”. For the two types of patients, their clinical measures collected during in-hospital were extracted. The whole samples were randomly divided into two datasets. One was used to build a classifier while the other one was used to test the prognostic performance of the classifiers. The training and testing dataset consisted of 586 [20 566] patients and 147 [5 142] patients, respectively. We considered the prognostic prediction on whether a patient needs ICU care as a supervised learning problem. We firstly involved the top ten factors which have the largest AUC when evaluated its prognostic power individually. The found ten factors were then used to build a composite classification model by the benchmark model of support vector machine (SVM) [19]. We employed balance-sampling with ensemble learning strategy [20], given that the dataset was severely class-imbalanced. We divided 566 Non-ICU-care samples into 29 groups, each of which was consisted by 20 ICU-care samples. Thus, the 29 groups of balanced training subset, was utilized to train 29 SVM classifiers. After training, 29 classifiers were obtained via the bootstrap sampling scheme. The obtained 29 classifiers were applied on the test samples and the prediction of its label was obtained by majority voting.
Estimating the MI-mortality for the patients who may survive. The COVID-19 costed average mortality of 6.9% worldwide. In a radical time of shorting ICU beds, a very tough decision needs be made to grant high priority for the solvable patient. However, it remains unknown the mortality for the patients should be treated in ICU, as predicted by the first step, yet not been admitted to ICU due to various causes. Given the high sensitivity or specificity of 1 and 0.8239 (Table 2) of the classification model in the first step in prediction whether a patient should be admitted to ICU, we reasoned that the predicted positive patients do need ICU care. Consequently, we involved the dying patients who were classified as the one should receive ICU care yet not. We defined the ratio of a number of such patients over a total number of dead people as Missing-ICU-mortality. MI-mortality measured the necessity of ICU in selecting patients in critical conditions. It also measured the reliability of the model built in the first step. Furthermore, the mortality of the patients who have admitted into ICU was also estimated for comparing the difference of MI-mortality and ICU-mortality. This difference can not only help us to understand the difference in mortality between countries, but also help us to rationally plan ICU resources in emergencies.
Table 2
The results in prediction whether the patients who need ICU care. Each patient was characterizing by 909 clinical features. The prediction is based on all features, and the top ten features selected based on their AUC value.
| All features (909) | ROC (10) |
Train | Test | Whole | Train | Test | Whole |
Sensitivity Specificity Accuracy AUC | 0.9966 | 0.6000 | 0.9200 | 0.8465 | 1.0000 | 0.9200 |
1.0000 | 0.7676 | 0.8164 | 0.9417 | 0.8239 | 0.8489 |
0.9983 | 0.7619 | 0.8199 | 0.8935 | 0.8299 | 0.8513 |
0.9983 | 0.6838 | 0.8682 | 0.8941 | 0.9120 | 0.8844 |