Our observation is that ML methods are adopted reluctantly in medical research, which itself is firmly grounded in classical statistics. Several authors have described the divide between the “two cultures”. The main difference in the two approaches can be described by noting that classical statistics assumes that the data are generated by a given stochastic data model, whereas ML uses algorithmic models and treats the data mechanism as unknown (29).
Though classical statistics can be seen as the foundation of all scientific medicine, ML is only used in selected areas, such as intensive care medicine. Some projects facilitate diagnosis, whereas others try to create early warning systems based on a variety of data, increasing the effectiveness of treatment. An ML approach to predicting ICU readmission has been shown to be significantly more accurate than previously published algorithms in internal validation (30). Many ML systems being used in medical settings are artificial neural networks, mainly in image recognition (i.e., radiology, histology), and are being used to, for example, differentiate between malignant and benign tumors (31, 32).
The problem with neural networks is that they constitute so-called black boxes, which means that their decisions cannot be readily explained (33), which constitutes a significant challenge in a medical setting.
Here, we tried to show that both approaches can coexist and complement one another. Interestingly enough, the ML tree-based methods were developed largely by statisticians in the 1970s (34). We see them as very well equipped to bridge the gap between the “two cultures” because of their firm grounding in classical statistics and convenient availability as mature packages in the R package ecosystem. As we have shown in this paper, tree-based methods are readily comprehensible and can provide new insights, even for data sets that have already been analyzed with more traditional methods.
The general idea of the OneR algorithm is to go through each attribute and evaluate how well it is able to function as a predictor of the dependent variable. The algorithm creates frequency tables for each attribute, providing the number of occurrences at all different levels of the respective attribute and the dependent variable. For each frequency table (i.e., each attribute), a total error is calculated by summing the minima of each level of the attributes. The attribute with the smallest total error is the attribute that is chosen as the best predictor. The rules that are being generated take every level of this predictor and match it with the most frequent class of the dependent variable (17).
Numeric attributes have to be discretized before they can be used by the OneR algorithm. Different discretization methods exist for implementation of the OneR algorithm (package “OneR”) used in this paper. Significant further enhancement of the original OneR algorithm is achieved by the discretization methods to optimally align cut points in relation to the dependent variable (function “optbin”). The method “infogain” used here is an entropy-based method taken from information theory, which calculates cut points based on “information gain”. The idea is that uncertainty is minimized by making the resulting categories as pure as possible. This method is also the standard method of many decision tree algorithms (18).
Natural generalization of the OneR algorithm is conveyed by decision trees. Though OneR only uses one attribute for its predictions, decision trees are not bounded by this restriction, often resulting in better accuracy but worse interpretability (a trade-off well known in the ML area) (33). Further generalization is achieved with random forests, which will not be covered in this paper (35).
There are several different decision tree generation algorithms (e.g., ID3, C4.5, and C5.0), we used CART in the rpart implementation (20). Unlike linear models, such as Pearson correlation or linear regression, decision trees map non-linear relationships well (36). Interestingly, the opening example of Breiman’s seminal work was a medical example in the area of cardiology (19). In our population, we used decision trees to create a model predicting mortality or remaining CKD or KRT.
For CART, trees are constructed by repeated splits of subsets of the population into two descendant subsets (19). With OneR, an attribute can be split into several subsets, but the splits in CART are only binary. For numeric data, cutoff values are determined. The splitting is conducted in a recursive manner, and the same attribute can be used several times on different levels of the resulting tree. Unlike other tree methods and OneR above, the splitting criterion is based not on entropy, but on Gini impurity. In practice, both methods often lead to similar results (23).
An important consideration is the depth of the tree. The deeper a tree, the better it represents the data, but the less interpretable it becomes. An additional problem is overfitting, another well-known problem in the ML literature (37); a fully grown tree could mean that only one example per leave remains, a result that would render the decision next to useless in practice and would fail to generalize the data (i.e., model the noise in the data). CART prune to an optimal level according to some cost function (38).
We have also performed ANOVA and a standard t-test with the collected parameters. Proteinuria has been commonly observed during SARS-CoV2 infection and is reported in 7 to 63% of cases (25, 39). Proteinuria is mostly reported as unselective due to tubular injury, but in some cases there is a selective proteinuria as an indication of glomerular damage (40). A direct link between proteinuria and mortality in COVID-19 patients has not been shown, though previous data from critically ill patients due other causes strongly suggest that link (28). Gross et al. already hypothesized in May that the occurrence of proteinuria could be an early marker of AKI onset or a severe course (41). Our single center observations included too few patients to transfer this hypothesis to COVID-19.
We found a relationship between higher age and mortality as shown previously in various retrospective studies (5, 42, 43). In our study population, the average age of surviving critically ill patients was 60 years, but 69 years among deceased patients. This is also similar to prior results, in which age > 65 years was shown to be a risk factor for higher mortality (2, 10).
In the ANOVA, we found no significant relationship between selective or non-selective proteinuria and the development of AKI, permanent CKD, or increased mortality or protracted disease progression. Prior data suggest that > 40% of the cases are affected with abnormal proteinuria at hospital admission, and 20–40% of the critically ill patients develop AKI (13, 39). In our center, only 9 patients did not experience acute kidney failure (24.3%), and 20 of the affected developed AKI 3 (54.1%). This may be due, among other reasons, to the fact that serum creatinine does not match the baseline creatinine when taken in an already critically ill state. Pei et al. found that 75.4% of 333 patients had abnormal urine dipstick tests or AKI, 50% of them developed AKI 3. Among 35 patients who developed AKI in Guangchang Pei´s work, 45.7% experienced complete recovery of kidney function (26). We were able to reproduce these results in our patient cohort.
Nine patients died (24.3%), all of them experienced AKI 3 with a need for KRT. In this group, six patients had no or only low grade CKD, and three were admitted with CKD 3b or 4. Other data reported similar mortality rates among ICU patients (44), especially for patients requiring mechanical ventilation (45). One meta-analysis showed that the presence of AKI is associated with 13-fold increased risk of mortality, whereas the incidence of AKI is up to 20% in critically ill patients. Higher age, diabetes, hypertension, and baseline serum creatinine levels were associated with increased AKI incidence (46).
Several studies have reported higher BMI as a significant risk factor (47, 48). A meta-analysis by Hussain et al. demonstrated significantly higher mortality in patients with BMI > 25 kg/m2, and obesity (BMI > 30 kg/m2) as a significant factor for critical illness during COVID-19 (49). The BMI among deceased patients was 33 kg/m2 in our study population, but it was 28 kg/m2 among surviving patients; nevertheless, this was not significant in our cohort.
From a meta-analysis of a multinational database (50), the incidence of AKI in mechanically ventilated patients was reported to be 22%, slightly higher than among general inpatients (51). We found an incidence of AKI 1–3 of 75.7%; in 24 of the 37 cases it was > AKI 2 (64.9%). A Chinese meta-analysis reported the incidence of AKI in hospitalized Chinese adults was up to 50% for those in the ICU, and the presence of AKI was associated with a higher severity of infection (52). The high incidence of AKI at our center may be due to patient selection as a center for ECMO therapy (53, 54).