We obtained the data for 46,665,942 patients from the DPC database during the study period and divided them randomly into two groups with a ratio of 95:5 as the derivation (n = 44,334,477) and validation (n = 2,331,465) cohorts. We excluded patients who died or were discharged within one day of admission from the validation cohort according to the exclusion criteria, which left 2,277,968 patients as the validation cohort (Fig. 1). The characteristics of the derivation and validation cohorts are shown in Table 1. The average lengths of stay were 14.2 days and 14.5 days and in-hospital mortality was 4.3% and 3.7 % in the derivation and validation cohorts, respectively. Patients in the validation cohort were slightly older and had more comorbidities than those in the derivation cohort.
Table 1
Characteristics of the patients in the derivation and validation cohorts.
|
Derivation cohort (n = 44,334,477)
|
Validation cohort (n = 2,277,968)
|
p value
|
Death, n (%)
|
1,905,286 (4.3)
|
83,292 (3.7)
|
< 0.001
|
Length of hospital stay (days), mean (sd)
|
14.2 (24.1)
|
14.5 (24.2)
|
< 0.001
|
Age (years), mean (sd)
|
60.1 (24.4)
|
60.4 (24.2)
|
< 0.001
|
Sex (male), n (%)
|
23,480,628 (53.0)
|
1,207,886 (53.0)
|
0.066
|
History of hospitalization within 180 days, n (%)
|
12,282,386 (27.7)
|
632,362 (27.8)
|
0.066
|
Charlson comorbidity index, n (%)
|
|
|
< 0.001
|
0–1
|
28,734,890 (64.8)
|
1,465,779 (64.3)
|
|
2–3
|
11,432,403 (25.8)
|
594,500 (26.1)
|
|
≥ 4
|
4,165,579 (9.4)
|
217,605 (9.6)
|
|
Table 2
A: Structure of main model.
Layer
|
Input
|
Output
|
Number of weights
|
1: Input
|
49,297
|
1,000
|
49,297,000
|
2: Drop-out
|
|
|
|
3: Hidden 1
|
1,001
|
1,000
|
1,001,000
|
4: Drop-out
|
|
|
|
5: Hidden 2
|
1,001
|
1,000
|
1,001,000
|
6: Drop-out
|
|
|
|
7: Hidden 3
|
1,001
|
1,000
|
1,001,000
|
8: Drop-out
|
|
|
|
9: Output
|
1,001
|
2
|
2,002
|
Sum of weights
|
|
|
52,302,002
|
Table 2
B. Summary of the main and disease-specific models.
Model
|
Input node
|
Total number of weights
|
Main model
|
49297
|
52,302,002
|
Acute myocardial infarction model
|
9
|
3,014,002
|
Stroke model
|
54
|
3,059,002
|
Heart failure model
|
9
|
3,014,002
|
Pneumonia model
|
9
|
3,014,002
|
The structure of the main model is shown in Table 2A. There were 49,297 predictor variables, including 3 demographic variables (age, sex, history of hospitalization in the 180 days before admission), 19,930 diagnoses at admission, and 29,364 procedures (drugs, examinations, surgical and non-surgical treatments). We inserted a dropout layer between the layers to avoid overfitting. Overall, 52,302,002 weights (= 49297 × 1000 + 1001 × 1000 + 1001 × 1000 + 1001 × 1000 + 1001 × 2) of links between the layers were optimized in the derivation. The script for the deep learning model including model weights is available on our website (https://researchmap.jp/ptmatsui).
An overview of the main and disease-specific models used in this study is given in Table 2B. Total number of weights = the number of input nodes × 1000 + 1001 × 1000 + 1001 × 1000 + 1001 × 1000 + 1001 × 2.
The AUC of the main model in the validation cohort was 0.954 (95% CI 0.9537–0.9547).
The calibration curves of the observed and estimated mortality in the validation cohort are shown in Fig. 2.Observed and estimated mortality were strongly correlated, but the estimated mortality was slightly lower than the observed mortality.
The AUCs of the main and disease-specific models are shown in Table 3. The AUCs of the main model for the AMI, HF, stroke, and pneumonia subgroups were 0.944, 0.832, 0.921, and 0.918, respectively. The AUCs of the disease-specific models for the AMI, HF, stroke, and pneumonia subgroups were 0.876, 0.745, 0.894, and 0.863, respectively. The main model showed significantly higher discriminant ability than the disease-specific models for all four subgroups.
Table 3
Performances of the main and disease-specific models.
Population
|
n
|
Main model AUC (95% CI)
|
Disease-specific model AUC
(95% CI)
|
Acute myocardial infarction
|
14,213
|
0.944 (0.938–0.950)
|
0.876 (0.866–0.887)
|
Heart failure
|
43,792
|
0.831 (0.825–0.837)
|
0.745 (0.738–0.753)
|
Stroke
|
82,454
|
0.921 (0.918–0.925)
|
0.894 (0.890–0.898)
|
Pneumonia
|
87,775
|
0.918 (0.915–0.920)
|
0.863 (0.859–0.867)
|
AUC, area under the receiver operating characteristic curve; CI, confidence interval. |
The calibration curves for main and disease specific models for the subgroups are shown in Fig. 3. The correlations between the observed and estimated mortality were better with the main model than with the disease-specific models for AMI, HF, and stroke subgroups (Fig. 3A–C). For the pneumonia subgroup, the correlations were similar between the main and disease-specific models when the predicted mortality was ≤ 0.8. However, the disease specific model failed to estimate morality well when the predicted mortality was ≥ 0.8. (Fig. 3D).