General descriptors of the variables in the data set are given in Table 1. According to descriptors, 1.1% of the patients were younger than 19 years old or equal, 7.0% were in the 20-44 age range, 42.3% were in the 45-64 age range, and 49.6% were 65 years old or older. While 88.8% of the patients were White, 5.8% were Black, and 5.3% were from other races. In addition, the male-female ratio was 58.4% / 41.6%. The table shows the primary site, laterality, and surgery information of the patients. Tumor sizes of the patients are also grouped, and the patients' vital status and follow-up periods are given (Table 1).
Table 1
Description of the variables in the data for patients with glioblastoma
Variables
|
|
|
Age, n (%)
|
≤19 years
|
343 (1.1)
|
20-44 years
|
2208 (7.0)
|
45-64 years
|
13403 (42.3)
|
≥65 years
|
15709 (49.6)
|
Race, n (%)
|
White
|
28127 (88.8)
|
Black
|
1849 (5.8)
|
Other
|
1687 (5.3)
|
Gender, n (%)
|
Male
|
18479 (58.4)
|
Female
|
13184 (41.6)
|
Primary Site, n (%)
|
Frontal Lobe
|
10113 (31.9)
|
Temporal Lobe
|
8936 (28.2)
|
Parietal Lobe
|
5490 (17.3)
|
Occipital Lobe
|
1461 (4.6)
|
Ventricle
|
154 (0.5)
|
Cerebellum
|
273 (0.9)
|
Brain Stem
|
201 (0.6)
|
Overlapping Lesion of Brain
|
5696 (19.8)
|
Laterality, n (%)
|
Unilateral
|
31023 (98.0)
|
Bilateral
|
640 (2.0)
|
Surgery, n (%)
|
Not Performed
|
6414 (20.3)
|
Performed
|
25249 (79.7)
|
Tumor Size, n (%)
|
Less than 1 cm
|
170 (0.6)
|
Between 1 cm and 2 cm
|
1291 (4.7)
|
Between 2 cm and 3 cm
|
3329 (12.2)
|
Between 3 cm and 4 cm
|
5117 (18.8)
|
Between 4 cm and 5 cm
|
7336 (27.0)
|
Greater than 5 cm
|
9976 (36.7)
|
Follow-up Time (months)
|
Mean±SD
|
13.21±17.14
|
Median (Min.-Max.)
|
8.00 (0.00-143.00)
|
Vital Status, n (%)
|
Alive
|
4409 (13.9)
|
Dead
|
27254 (86.1)
|
SD: Standard Deviation, Min.:Minimum, Max: Maximum |
Table 2 shows the survival analysis results of the patients. The median OS of the patients was found to be 9.00±0.09 months. In addition, all variables in the table were statistically significant risk factors for survival except gender. Median life expectancy was found to be 16.00±0.93 months for those younger than or equal to 19 years of age, 22.00±0.58 months for 20-44 years old, 14.00±0.14 months for 45-64 years old, and 5.00±0.07 months for over 65 years old. When evaluated in terms of race, the median life expectancy was 9.00±0.10 months for the White race, and 10.00±0.39 months and 12.00±0.47 months for the Black and other races, respectively. In the study, the median life expectancy of women was equal to that of men.
Table 2
Kaplan-Meier results (SE: Standard error) of the study
Variables
|
Survival
|
|
1 year (%)
|
3 year (%)
|
5 year (%)
|
Survival Time
|
P value
|
Mean±SE
|
Median±SE
|
Overall
|
40.5
|
10.2
|
5.2
|
17.03±0.17
|
9.00±0.09
|
-
|
Age
|
≤19 years
|
56.9
|
22.8
|
14.7
|
33.99±2.75
|
16.00±0.93
|
<0.001
|
20-44 years
|
72.7
|
32.6
|
20.2
|
39.50±1.11
|
22.00±0.58
|
45-64 years
|
53.5
|
13.0
|
6.5
|
21.03±0.27
|
14.00±0.14
|
≥65 years
|
24.4
|
4.5
|
1.8
|
10.09±0.14
|
5.00±0.07
|
Race
|
White
|
39.8
|
9.9
|
5.1
|
16.76±0.18
|
9.00±0.10
|
<0.001
|
Black
|
42.9
|
11.9
|
6.2
|
18.26±0.71
|
10.00±0.39
|
Other
|
48.9
|
14.6
|
6.8
|
19.96±0.76
|
12.00±0.47
|
Gender
|
Male
|
40.8
|
9.8
|
4.7
|
16.60±0.21
|
10.00±0.12
|
0.544
|
Female
|
42.0
|
10.8
|
5.9
|
17.64±0.28
|
10.00±0.15
|
Primary Site
|
Frontal Lobe
|
39.9
|
11.3
|
5.9
|
17.87±0.32
|
9.00±0.16
|
<0.001
|
Temporal Lobe
|
45.4
|
10.6
|
5.0
|
17.69±0.30
|
11.00±0.17
|
Parietal Lobe
|
40.7
|
9.7
|
5.1
|
17.01±0.40
|
9.00±0.22
|
Occipital Lobe
|
43.2
|
9.9
|
5.0
|
16.92±0.70
|
10.00±0.40
|
Ventricle
|
34.5
|
11.7
|
6.1
|
18.20±2.74
|
6.00±1.05
|
Cerebellum
|
37.8
|
10.3
|
5.4
|
16.52±1.79
|
6.00±0.78
|
Brain Stem
|
35.7
|
10.3
|
6.7
|
16.60±2.01
|
8.00±0.84
|
Overlapping Lesion of Brain
|
32.4
|
8.2
|
4.2
|
14.06±0.37
|
6.00±0.20
|
Laterality
|
Unilateral
|
40.8
|
10.3
|
5.2
|
17.11±0.17
|
9.00±0.09
|
<0.001
|
Bilateral
|
26.1
|
7.9
|
4.2
|
12.74±1.03
|
5.00±0.43
|
Tumor Size
|
Less than 1 cm
|
50.2
|
15.3
|
6.6
|
19.85±2.35
|
12.00±0.97
|
<0.001
|
Between 1 cm and 2 cm
|
48.7
|
14.8
|
6.3
|
19.11±0.83
|
12.00±0.41
|
Between 2 cm and 3 cm
|
46.4
|
12.3
|
5.4
|
18.85±0.52
|
11.00±0.30
|
Between 3 cm and 4 cm
|
42.1
|
10.3
|
5.3
|
17.52±0.42
|
10.00±0.22
|
Between 4 cm and 5 cm
|
41.9
|
9.8
|
5.0
|
17.06±0.33
|
10.00±0.20
|
Greater than 5 cm
|
36.5
|
9.6
|
4.7
|
16.10±0.30
|
8.00±0.15
|
Surgery
|
Not Performed
|
14.4
|
3.0
|
1.3
|
7.16±0.21
|
3.00±0.05
|
<0.001
|
Performed
|
47.0
|
12.1
|
6.2
|
19.53±0.20
|
11.00±0.10
|
When survival is evaluated in primary site types, the lowest median survival time is found in the group classified as ventricle, cerebellum, and overlapping brain lesion, followed by the brain stem, parietal, frontal, occipital, and temporal lobes, respectively. Survival statistics for laterality, tumor size, and surgery are also given in Table 2.
Gain Ratio Attribute Evaluation and Information Gain Attribute Evaluation attribute selection methods in WEKA were used. Using these methods, the importance of the variables and the values added to the data set were examined for last 2-year (2017-2018). A total of 8 variables (7 independent variables and one dependent variable) were used from the data set. These variables are surgery, age, laterality, primary site, tumor size, race, gender, and vital status. Percentages of variable importance according to the dependent variable vital status were given in Figure 1A. For 1-year data set, a total of 8 variables (7 independent variables and 1 dependent variable) used. These variables are surgery, age, laterality, primary site, tumor size, race, gender and vital status. Percentages of variable importance according to dependent variable vital status was given in Figure 1B.
The performance criteria of ML Methods for the 2-year survival prediction model are given in Table 3. Looking at the ML results, the Hybrid Model gave the best results according to Accuracy, F-measure, and MCC performance criteria, which are the most accepted criteria in the literature. Considering these three performance criteria, the Hybrid model is followed by J48, Naïve Bayes, Logistic Regression, Bagging, and Multilayer Perceptron, respectively. According to the hybrid model, which has the best performance, the diagnosis of alive/dead in 74 out of 100 patients can be interpreted as correct. As another explanation, when a patient is diagnosed as alive/dead with the hybrid model method, the accuracy rate of this diagnosis is 74.1%.
Table 3
Performance results of Machine Learning methods for 2-year survival
Methods
|
|
Performance Criteria
|
Accuracy
|
F-measure
|
MCC
|
PRC Area
|
ROC Area
|
Logistic Regression
|
Alive
|
0.589
|
0.613
|
0.272
|
0.648
|
0.681
|
Dead
|
0.682
|
0.657
|
0.272
|
0.688
|
0.681
|
Overall
|
0.636
|
0.636
|
0.272
|
0.668
|
0.681
|
Naive Bayes
|
Alive
|
0.591
|
0.614
|
0.272
|
0.648
|
0.682
|
Dead
|
0.680
|
0.657
|
0.272
|
0.689
|
0.682
|
Overall
|
0.637
|
0.636
|
0.272
|
0.669
|
0.682
|
Multilayer Perceptron
|
Alive
|
0.648
|
0.618
|
0.218
|
0.622
|
0.653
|
Dead
|
0.570
|
0.598
|
0.218
|
0.660
|
0.653
|
Overall
|
0.608
|
0.608
|
0.218
|
0.641
|
0.653
|
Bagging
|
Alive
|
0.601
|
0.611
|
0.250
|
0.639
|
0.668
|
Dead
|
0.649
|
0.639
|
0.250
|
0.676
|
0.668
|
Overall
|
0.626
|
0.625
|
0.250
|
0.658
|
0.668
|
J48
|
Alive
|
0.568
|
0.607
|
0.279
|
0.629
|
0.664
|
Dead
|
0.708
|
0.668
|
0.279
|
0.647
|
0.664
|
Overall
|
0.640
|
0.638
|
0.279
|
0.638
|
0.664
|
Hybrid Model
|
Alive
|
0.698
|
0.725
|
0.481
|
0.714
|
0.764
|
Dead
|
0.781
|
0.755
|
0.481
|
0.793
|
0.764
|
Overall
|
0.741
|
0.740
|
0.481
|
0.754
|
0.764
|
MCC: Matthews correlation coefficient, PRC: Precision Recall Curve, ROC: Receiver Operating Characteristic |
The performance criteria of ML methods for the 1-year survival prediction model are given in Table 4. Looking at the ML results, the Hybrid Model gave best results according to Accuracy, F-measure and MCC performance criteria, which are the most accepted performance criteria in the literature. Considering these three performance criteria, the Hybrid model is followed by J48, Naïve Bayes, Logistic Regression, Bagging and Multilayer Perceptron, respectively. According to the hybrid model which has the best performance, the diagnosis of alive/dead in 85 out of 100 patients can be interpreted as correct. As another explanation, when a patient is diagnosed as alive/dead with the hybrid model method, the accuracy rate of this diagnosis is 84.9%.
Table 4
Performance results of Machine Learning methods for 1-year survival
Methods
|
|
Performance Criteria
|
Accuracy
|
F-measure
|
MCC
|
PRC Area
|
ROC Area
|
Logistic Regression
|
Alive
|
0.927
|
0.816
|
0.297
|
0.814
|
0.704
|
Dead
|
0.295
|
0.409
|
0.297
|
0.548
|
0.704
|
Overall
|
0.719
|
0.682
|
0.297
|
0.726
|
0.704
|
Naive Bayes
|
Alive
|
0.918
|
0.814
|
0.297
|
0.815
|
0.704
|
Dead
|
0.312
|
0.422
|
0.297
|
0.543
|
0.704
|
Overall
|
0.718
|
0.685
|
0.297
|
0.725
|
0.704
|
Multilayer Perceptron
|
Alive
|
0.877
|
0.796
|
0.257
|
0.776
|
0.665
|
Dead
|
0.340
|
0.427
|
0.257
|
0.506
|
0.665
|
Overall
|
0.700
|
0.675
|
0.257
|
0.687
|
0.665
|
Bagging
|
Alive
|
0.914
|
0.812
|
0.292
|
0.810
|
0.704
|
Dead
|
0.313
|
0.421
|
0.292
|
0.540
|
0.704
|
Overall
|
0.716
|
0.683
|
0.292
|
0.721
|
0.704
|
J48
|
Alive
|
0.938
|
0.818
|
0.301
|
0.722
|
0.609
|
Dead
|
0.281
|
0.399
|
0.301
|
0.468
|
0.609
|
Overall
|
0.721
|
0.680
|
0.301
|
0.638
|
0.609
|
Hybrid Model
|
Alive
|
0.941
|
0.893
|
0.647
|
0.958
|
0.856
|
Dead
|
0.661
|
0.742
|
0.647
|
0.698
|
0.856
|
Overall
|
0.849
|
0.843
|
0.647
|
0.872
|
0.856
|
MCC: Matthews correlation coefficient, PRC: Precision Recall Curve, ROC: Receiver Operating Characteristic |