3.1 Clinical Data
A total of 114 patients with consciousness disorders due to brain damage underwent 18F-FDG PET/CT examinations at the Department of Nuclear Medicine, Xijing Hospital. Following stringent exclusion criteria, our study included 87 patients (58 (66.7%) males, mean age ± SD 64.4 ± 8.04 years). During PET scans, 32 (36.8%) patients were diagnosed with unresponsive wakefulness syndrome, while 55 (63.2%) were diagnosed with a minimally conscious state according to CRS-R. Among these, 45 (51.7%) patients had traumatic brain injuries, 33 (37.9%) had intracerebral hemorrhage and ischemic stroke, and 9 (10.3%) had toxic encephalopathy. The average time from the event occurrence to PET scan was 56.42 ± 42.5 days (ranging from 28 to 210 days). CRS-R scores at the PET scan were significantly higher in the consciousness recovery groups (P < 0.001). Patients with non-traumatic brain injuries exhibited worse consciousness recovery than those with traumatic brain injuries (P = 0.026). No significant differences were observed in age, sex, or time from event occurrence to PET scan between the consciousness recovery and non-recovery groups (P > 0.05). Detailed clinical characteristics are presented in Table 1.
Table 1
Detailed clinical characteristics in DOC patients
| Consciousness recovery (n = 52) | Consciousness not recovery (n = 35) | P |
Gender (n,%) Male Female | 35 (67.3%) 17 (32.7%) | 23(65.7%) 12(34.3%) | 0.877 |
Age (years, mean ± SD) | 46.0 ± 17.2 | 50.74 ± 16.1 | 0.432 |
Diagnosis at PET scan (n,%) UWS MCS | 5 (9.6%) 47 (90.4%) | 27 (77.1%) 8 (22.9%) | 0.001 |
CRS-R scores at PET scan | 4.75 ± 0.94 | 2.00 ± 0.00 | 0.001 |
Time from event occur to PET scan (days, mean ± SD) | 60.3 ± 53.1 | 54.6 ± 35.6 | 0.326 |
Etiology(n,%) Traumatic Non-traumatic | 32 (61.5%) 20 (38.5%) | 13 (37.1%) 22 (62.9%) | 0.026 |
3.2 Performance of image-based deep learning classifier
We assessed the ability to predict consciousness recovery in all patients with five tasks: MIBH + CT, SUVR + CT, MIBH only, SUVR only, and CT only.
The MIBH-only task outperformed the SUVR-only task in predicting consciousness recovery, with AUCs of 0.764 ± 0.028 vs. 0.667 ± 0.039, 0.686 ± 0.170 vs. 0.670 ± 0.091, and 0.751 ± 0.093 vs. 0.412 ± 0.104 in the training, validation, and independent test sets, respectively. Accuracy in these datasets between MIBH and SUVR tasks was 0.670 ± 0.059 vs. 0.610 ± 0.027, 0.771 ± 0.123 vs. 0.605 ± 0.061, and 0.629 ± 0.070 vs. 0.500 ± 0.000. Sensitivity in the SUVR task was higher than that in the MIBH task in the training, validation, and independent test sets (0.852 ± 0.035 vs. 0.788 ± 0.100, 0.834 ± 0.034 vs. 0.796 ± 0.103, and 0.830 ± 0.040 vs. 0.707 ± 0.096, respectively). Specificity in the MIBH task was higher than that in the SUVR task in these datasets (0.629 ± 0.153 vs. 0.500 ± 0.006, 0.695 ± 0.133 vs. 0.522 ± 0.061, 0.796 ± 0.123 vs. 0.532 ± 0.052). Grad-CAM was utilized to visualize the MIBH heatmap for a single patient, as shown in Fig. 2.
The MIBH + CT task demonstrated superior performance in predicting consciousness recovery, achieving AUCs of 0.803 ± 0.024, 0.804 ± 0.059, and 0.784 ± 0.073 in the training, validation, and independent test sets, respectively. In these datasets, sensitivity was 0.813 ± 0.041, 0.845 ± 0.069, and 0.794 ± 0.055, while specificity was 0.807 ± 0.043, 0.806 ± 0.076, and 0.807 ± 0.062, respectively. The SUVR + CT task exhibited poorer performance compared to MIBH + CT, with AUCs of 0.791 ± 0.036, 0.632 ± 0.067, and 0.612 ± 0.192 in the training, validation, and independent test sets, respectively. The MIBH + CT task outperformed SUVR + CT in both sensitivity and specificity. Figure 3 displays the average AUC comparisons for MIBH + CT and SUVR + CT classifiers.
The CT-only task yielded suboptimal results in predicting consciousness recovery, with AUCs of 0.632 ± 0.080, 0.563 ± 0.155, and 0.624 ± 0.160 in the training, validation, and independent test sets, respectively. In these datasets, sensitivity was 0.570 ± 0.140, 0.585 ± 0.153, and 0.580 ± 0.160, while specificity was 0.870 ± 0.260, 0.820 ± 0.186, and 0.820 ± 0.136, respectively. The average performance results are summarized in Table 2. Figure 4 illustrates the boxplots of accuracy, AUC, sensitivity, and specificity on the test dataset for all five tasks after 5-fold cross-validation.
Table 2
Multimodal classification performance
Modality | Dataset | Accuracy (mean ± SD) | AUC (mean ± SD) | Sensitivity (mean ± SD) | Specificity (mean ± SD) |
MIBH + CT | training | 0.816 ± 0.025 | 0.803 ± 0.024 | 0.813 ± 0.041 | 0.807 ± 0.043 |
validation | 0.808 ± 0.084 | 0.804 ± 0.059 | 0.845 ± 0.069 | 0.806 ± 0.076 |
test | 0.723 ± 0.054 | 0.784 ± 0.073 | 0.794 ± 0.055 | 0.807 ± 0.062 |
SUVR + CT | training | 0.685 ± 0.020 | 0.785 ± 0.034 | 0.722 ± 0.124 | 0.752 ± 0.272 |
validation | 0.630 ± 0.024 | 0.629 ± 0.060 | 0.692 ± 0.090 | 0.733 ± 0.075 |
test | 0.636 ± 0.049 | 0.623 ± 0.173 | 0.687 ± 0.070 | 0.700 ± 0.288 |
MIBH | training | 0.670 ± 0.059 | 0.764 ± 0.028 | 0.788 ± 0.100 | 0.629 ± 0.153 |
validation | 0.771 ± 0.123 | 0.686 ± 0.170 | 0.796 ± 0.103 | 0.695 ± 0.133 |
test | 0.629 ± 0.070 | 0.751 ± 0.093 | 0.707 ± 0.096 | 0.796 ± 0.123 |
SUVR | training | 0.610 ± 0.027 | 0.667 ± 0.039 | 0.852 ± 0.035 | 0.500 ± 0.006 |
validation | 0.605 ± 0.061 | 0.670 ± 0.091 | 0.834 ± 0.034 | 0.522 ± 0.061 |
test | 0.500 ± 0.000 | 0.412 ± 0.104 | 0.830 ± 0.040 | 0.532 ± 0.052 |
CT | training | 0.517 ± 0.034 | 0.632 ± 0.080 | 0.570 ± 0.140 | 0.870 ± 0.260 |
validation | 0.535 ± 0.055 | 0.563 ± 0.155 | 0.585 ± 0.153 | 0.820 ± 0.186 |
test | 0.500 ± 0.000 | 0.624 ± 0.160 | 0.580 ± 0.160 | 0.820 ± 0.136 |
3.3 Performance of Tabular based deep learning classifier based on MIBH + CT image depth features and CRS-R scores
Compared to image-based deep learning classifiers, the tabular-based classifier combined MIBH + CT deep features and clinical CRS-R scores achieved optimal classification performance, with an accuracy of 88.5% and 82.2%, AUC of 0.950 and 0.933, sensitivity of 0.93 and 0.8, and specificity of 0.83 and 0.85 on the training and test sets, respectively. Figure 5 displays the average AUC comparisons between the combination of CRS-R scores and deep features abular-based classifier and MIBH + CT image-based classifiers. Visualization of the Tabular-based classifier, combined with clinical CRS-R scores and PET/CT deep features, is presented in Fig. 6. The t-SNE plot resulted in a clear differentiation between MCS and UWS two clusters which reflected the feasibility and advantages of using a joint of image deep features and clinical behavioral scores for classification.