This section presents the results for analysing the performance of the proposed RE-E-NNS algorithm. The performance of the algorithm is validated, with twelve datasets taken from the UCI and KEEL repository. A detailed description of the datasets is shown, in Table 5. All the experiments are done in windows environment by the Matlab. Ten-fold cross validation is done to validate the experimental results.
5.1 Results and Comparisons
The proposed RE-E-NNS algorithm is compared with four significant rule extraction algorithms, Recursive Rule Extraction (Re-RX) [19], Reverse Engineering Recursive Rule Extraction (RE-Re-RX)[6], Eclectic Rule Extraction from Neural Network Recursively (ERENNR)[7], and Eclectic Rule Extraction from Neural Network with Multi-Hidden Layer (ERENN_MHL)[32] algorithms.
Table 7 shows the comparison of RE-E-NNS algorithm with the four algorithms based on the average testing accuracies of 10-fold cross validation results. The bold values represent the highest accuracies in the respective datasets. Results show that the proposed RE-E-NNS algorithm has performed better than the four algorithms in all the datasets except Echocardiogram, Eye, and Wine datasets. A high increase in the accuracy is observed, in German, Pima Indians Diabetes, and Car Evaluation datasets. Figure 6 depicts the graphical comparisons between the algorithms for better understanding and comparison.
The rules are also evaluated based on fidelity. Table 8 shows the comparison between the RE-E-NNS, Re-RX, RE-Re-RX, ERENNR, and ERENN_MHL algorithms with the average testing fidelity of 10-fold cross validation results. The bold values represent the highest fidelity in the respective datasets. Results show in five out of the twelve datasets, the fidelity of the rule-sets generated by the RE-E-NNS algorithm is better compared to all the four algorithms. If compared individually, the fidelity of the rules constructed by RE-E-NNS is better than Re-RX in all the datasets, better than RE-Re-RX in all the datasets except Pima Indians Diabetes, better than ERENNR in nine datasets, and better than ERENN_MHL in five datasets and equal in the Echocardiogram dataset.
Table 7
Comparison of accuracies with the average of 10-fold cross validation results (in %)
Datasets
|
Re-RX
|
RE-Re-RX
|
ERENNR
|
ERENN_MHL
|
RE-E-NNS
|
Credit Approval
|
81.39
|
84.31
|
86.62
|
86.77
|
87.39
|
Australian Credit Approval
|
73.48
|
74.49
|
85.51
|
85.65
|
86.52
|
Echocardiogram
|
93.33
|
96.67
|
96.67
|
98.33
|
98.33
|
Statlog (Heart)
|
72.59
|
75.56
|
80.37
|
82.59
|
83.70
|
Breast Cancer
|
90.59
|
94.85
|
96.47
|
96.76
|
97.06
|
German
|
71.4
|
72.4
|
72.9
|
74.6
|
79.2
|
Eye
|
53.98
|
55.15
|
55.23
|
55.36
|
55.13
|
Pima Indians Diabetes
|
68.57
|
74.68
|
76.88
|
76.23
|
79.22
|
Census Income
|
82.49
|
83.59
|
84.29
|
84.39
|
85.74
|
Thyroid
|
94.07
|
94.48
|
94.87
|
95.08
|
95.67
|
Wine
|
63.70
|
81.85
|
98.15
|
98.89
|
97.78
|
Car Evaluation
|
88.87
|
88.87
|
89.86
|
90.12
|
94.82
|
Accuracy is not enough to conclude that a classification model is good. So, the algorithms are also compared based on some other performance measures, as shown in Table 9. The bold values represent the highest measures in the respective datasets.
• All the measures for RE-E-NNS are better than Re-RX, RE-Re-RX, ERENNR, and ERENN_MHL in the Thyroid and Car Evaluation datasets.
• In some of the datasets, the RE-E-NNS has produced higher average False positive (FP) value compared to one or more algorithms. As a consequence, few measures for the RE-E-NNS are less in those datasets. In the cases of Credit Approval and Australian Credit Approval datasets, RE-E-NNS has produced rules with higher average FPR and lower average specificity only compared to the Re-RX algorithm. In the case of Echocardiogram dataset, RE-E-NNS has constructed rules with lower average precision, lower average specificity, and higher average FPR compared to RE-Re-RX and ERENNR algorithms. In the case of Census Income dataset, RE-E-NNS has constructed rules with lower average precision, lower average specificity, and higher average FPR compared to all the algorithms. For the Pima Indians Diabetes also, the RE-E-NNS has constructed rules with lower average precision (except RE-Re-RX), lower average specificity (except ERENN_MHL), and higher average FPR (except ERENN_MHL) compared to the others. However, a maximum of the measures is better for RE-E-NNS in the above-mentioned datasets.
Table 8
Comparison of fidelity with the average of 10-fold cross validation results (in %)
Datasets
|
Re-RX
|
RE-Re-RX
|
ERENNR
|
ERENN_MHL
|
RE-E-NNS
|
Credit Approval
|
86.92
|
91.08
|
96.62
|
96.92
|
96.77
|
Australian Credit Approval
|
70.58
|
74.49
|
93.19
|
95.07
|
96.52
|
Echocardiogram
|
91.67
|
85
|
88.33
|
95
|
95
|
Statlog (Heart)
|
68.52
|
70.74
|
85.19
|
86.29
|
83.70
|
Breast Cancer
|
90.74
|
95
|
96.91
|
96.47
|
98.38
|
German
|
64.89
|
65.22
|
69.56
|
70
|
68.89
|
Eye
|
81.98
|
81.72
|
77.90
|
98.33
|
100
|
Pima Indians Diabetes
|
77.92
|
84.03
|
86.75
|
81.69
|
78.31
|
Census Income
|
93.46
|
94.23
|
94.42
|
99.12
|
96.27
|
Thyroid
|
90.14
|
90.4
|
90.82
|
92.56
|
96.81
|
Wine
|
46.67
|
73.33
|
87.78
|
95
|
91.67
|
Car Evaluation
|
49.13
|
49.13
|
59.94
|
54.39
|
62.95
|
• In some of the datasets, the proposed RE-E-NNS has classified patterns with higher average False Negative (FN) value due to which few measures for RE-E-NNS are less in those datasets. In the cases of Credit Approval and Australian Credit Approval datasets, the RE-E-NNS has produced lower average recall compared to ERENNR and ERENN_MHL. In the case of Statlog(Heart) dataset, RE-E-NNS has constructed rule-sets with lower average recall and lower average f-measure compared to ERENNR. In the case of Breast Cancer dataset, the RE-E-NNS has produced lower average recall compared to Re-RX and ERENNR. In the case of German dataset, the RE-E-NNS has constructed rule-sets with lower average recall compared to RE-Re-RX and ERENN_MHL. However, most of the measures are better for RE-E-NNS compared to Re-RX, RE-Re-RX, ERENNR, and ERENN_MHL in the above-mentioned datasets.
• For the Eye dataset, RE-E-NNS has produced rules with better average recall and f-measure compared to the four algorithms. If compared individually, RE-E-NNS has performed better in 5, 4, 4, and 2 measures compared to the Re-RX, RE-Re-RX, ERENNR, and ERENN_MHL algorithms, respectively. In the case of Wine dataset, RE-E-NNS has performed better than two (Re-RX and RE-Re-RX) algorithms for all the measures.
All the results show that RE-E-NNS performed better compared to the four algorithms for a maximum of the measures in most of the datasets. So it can be concluded, the average performance of RE-E-NNS is better than the other algorithms.
Table 9
Comparison of performance measures with the average of 10-fold cross validation results (all in % except MCC)
Datasets
|
Algorithms
|
Precision
|
Recall
|
F-measure
|
FPR
|
Specificity
|
BA
|
MCC
|
Credit Approval
|
Re-RX
|
79.07
|
78.62
|
78.59
|
17.12
|
82.88
|
80.75
|
0.6177
|
RE-Re-RX
|
79.42
|
86.87
|
82.81
|
18.33
|
81.67
|
84.27
|
0.6843
|
ERENNR
|
79.51
|
94.4
|
86.17
|
20.1
|
79.89
|
87.15
|
0.7429
|
ERENN_MHL
|
80.05
|
93.76
|
86.2
|
19.23
|
80.77
|
87.26
|
0.7436
|
RE-E-NNS
|
81.62
|
92.83
|
86.73
|
17.31
|
82.68
|
87.76
|
0.7518
|
Australian Credit Approval
|
Re-RX
|
78.02
|
57.69
|
65.65
|
14.19
|
85.8
|
71.75
|
0.4658
|
RE-Re-RX
|
73.53
|
66.45
|
69.54
|
19.39
|
80.61
|
73.53
|
0.4793
|
ERENNR
|
79.13
|
92.48
|
85.07
|
19.93
|
80.07
|
86.28
|
0.7235
|
ERENN_MHL
|
79.19
|
92.79
|
85.23
|
19.93
|
80.07
|
86.43
|
0.7267
|
RE-E-NNS
|
82.57
|
90.54
|
85.97
|
16.28
|
83.72
|
87.13
|
0.7404
|
Echocardiogram
|
Re-RX
|
86.667
|
100
|
92
|
10
|
90
|
95
|
0.8828
|
RE-Re-RX
|
100
|
96
|
97.778
|
0
|
100
|
98
|
0.9264
|
ERENNR
|
100
|
96
|
97.778
|
0
|
100
|
98
|
0.9264
|
ERENN_MHL
|
96.67
|
100
|
98
|
2.5
|
97.5
|
98.75
|
0.9707
|
RE-E-NNS
|
96.67
|
100
|
98
|
2.5
|
97.5
|
98.75
|
0.9707
|
Statlog (Heart)
|
Re-RX
|
74.59
|
67.97
|
67.84
|
21.50
|
78.49
|
73.23
|
0.4857
|
RE-Re-RX
|
78.05
|
69.45
|
70.54
|
17.68
|
82.32
|
75.89
|
0.5374
|
ERENNR
|
77.92
|
91.64
|
83.74
|
32.78
|
67.22
|
79.43
|
0.6162
|
ERENN_MHL
|
81.72
|
80.12
|
79.21
|
14.93
|
85.07
|
82.59
|
0.6633
|
RE-E-NNS
|
82.42
|
80.29
|
80.95
|
14.04
|
85.96
|
83.13
|
0.6686
|
Breast Cancer
|
Re-RX
|
80.84
|
95.44
|
86.93
|
11.96
|
88.04
|
91.74
|
0.8016
|
RE-Re-RX
|
90.95
|
95.26
|
92.85
|
5.62
|
94.38
|
94.82
|
0.8848
|
ERENNR
|
92.46
|
98.28
|
95.27
|
5.3
|
94.69
|
96.49
|
0.9216
|
ERENN_MHL
|
97.12
|
94.96
|
95.93
|
2.97
|
97.03
|
95.99
|
0.9306
|
RE-E-NNS
|
97.57
|
95.33
|
96.37
|
1.98
|
98.02
|
96.67
|
0.9383
|
German
|
Re-RX
|
74.00
|
91.34
|
81.63
|
75.00
|
24.99
|
58.17
|
0.2163
|
RE-Re-RX
|
74.42
|
92.31
|
82.29
|
74.07
|
25.93
|
59.12
|
0.2469
|
ERENNR
|
75.95
|
90.54
|
82.18
|
71.09
|
28.90
|
59.72
|
0.2407
|
ERENN_MHL
|
76.09
|
92.88
|
83.51
|
68.44
|
31.56
|
62.22
|
0.3231
|
RE-E-NNS
|
81.44
|
91.35
|
85.95
|
48.04
|
51.96
|
71.66
|
0.4765
|
Eye
|
Re-RX
|
37.29
|
3.74
|
6.79
|
5.12
|
94.89
|
49.31
|
-0.033
|
RE-Re-RX
|
51.39
|
0.75
|
1.47
|
0.56
|
99.44
|
50.09
|
0.011
|
ERENNR
|
52.11
|
3.27
|
6.14
|
2.46
|
97.54
|
50.40
|
0.0244
|
ERENN_MHL
|
86.55
|
1.03
|
1.99
|
0.441
|
99.56
|
50.30
|
0.0439
|
RE-E-NNS
|
55.53
|
20
|
71.39
|
19.98
|
80.02
|
50.01
|
0.0299
|
Pima Indian Diabetes
|
Re-RX
|
89
|
13.26
|
22.22
|
1.38
|
98.62
|
55.94
|
0.2505
|
RE-Re-RX
|
70.29
|
48.03
|
56.29
|
10.98
|
89.02
|
68.53
|
0.4134
|
ERENNR
|
81.75
|
45.30
|
57.21
|
6.12
|
93.89
|
69.59
|
0.4736
|
ERENN_MHL
|
74.30
|
52.22
|
58.72
|
12.39
|
87.61
|
69.91
|
0.4523
|
RE-E-NNS
|
73.79
|
63.39
|
67.85
|
11.97
|
88.04
|
75.71
|
0.5369
|
Census Income
|
Re-RX
|
75
|
0.04
|
0.18
|
0.90
|
99.09
|
50.02
|
0.0257
|
RE-Re-RX
|
87.89
|
1.35
|
3.09
|
0.053
|
99.947
|
50.64
|
0.0958
|
ERENNR
|
100
|
1.79
|
4.10
|
0.00
|
100
|
50.89
|
0.1265
|
ERENN_MHL
|
90.45
|
2.63
|
5.10
|
0.049
|
99.951
|
51.28
|
0.1319
|
RE-E-NNS
|
62.73
|
18.91
|
29.49
|
1.34
|
98.66
|
59.03
|
0.3332
|
Thyroid
|
Re-RX
|
91.09
|
91.09
|
91.09
|
4.45
|
95.55
|
93.32
|
0.8665
|
RE-Re-RX
|
91.72
|
91.72
|
91.72
|
4.14
|
95.86
|
93.79
|
0.8758
|
ERENNR
|
92.31
|
92.31
|
92.31
|
3.85
|
96.15
|
94.23
|
0.8846
|
ERENN_MHL
|
92.63
|
92.63
|
92.63
|
3.69
|
96.31
|
94.47
|
0.8894
|
RE-E-NNS
|
93.51
|
93.51
|
93.51
|
3.24
|
96.76
|
95.14
|
0.9027
|
Wine
|
Re-RX
|
45.56
|
45.56
|
45.56
|
27.22
|
72.78
|
59.17
|
0.1833
|
RE-Re-RX
|
72.78
|
72.78
|
72.78
|
13.61
|
86.39
|
79.58
|
0.5917
|
ERENNR
|
97.22
|
97.22
|
97.22
|
1.39
|
98.61
|
97.92
|
0.9583
|
ERENN_MHL
|
98.33
|
98.33
|
98.33
|
0.83
|
99.17
|
98.75
|
0.975
|
RE-E-NNS
|
96.67
|
96.67
|
96.67
|
1.67
|
98.33
|
97.5
|
0.95
|
Car Evaluation
|
Re-RX
|
77.75
|
77.75
|
77.75
|
7.42
|
92.58
|
85.16
|
0.7033
|
RE-Re-RX
|
77.75
|
77.75
|
77.75
|
7.42
|
92.58
|
85.16
|
0.7033
|
ERENNR
|
79.71
|
79.71
|
79.71
|
6.76
|
93.24
|
86.47
|
0.7295
|
ERENN_MHL
|
80.23
|
80.23
|
80.23
|
6.59
|
93.41
|
86.82
|
0.7364
|
RE-E-NNS
|
89.65
|
89.65
|
89.65
|
3.45
|
96.55
|
93.10
|
0.8620
|
The algorithm is also validated, with the Freidman non-parametrical test and Least Significant Difference (LSD) post-hoc test. The null hypothesis for this test is “There is no significant difference between Re-RX, RE-Re-RX, ERENNR, ERENN_MHL, and RE-E-NNS”. Ten-fold cross validation results are used for the tests. Table 10 shows the results for the Friedman test:- the mean rank for each algorithm, the Freidman statistic: Chi-square value, and the p-value. In all the datasets, the null hypothesis is rejected at a highly significant (p < 0.01) p-value, except the Echocardiogram dataset.
The Friedman test has indicated that there is a significant difference between Re-RX, RE-Re-RX, ERENNR, ERENN_MHL, and RE-E-NNS algorithms. However, the test did not show the significant differences between the algorithms. So, LSD post-hoc analysis is performed, with the results of the Freidman test. LSD performs all possible pairwise comparison of group means obtained from Freidman test. Table 11 shows the results for the LSD test. Only the results for the pairs of interest are shown, in the table. The p-value for the pair:- (Re-RX and RE-E-NNS) is significant at (p < 0.01) in all datasets except Echocardiogram. The p-value for the pair:- (RE-Re-RX and RE-E-NNS) is significant at (p < 0.01) or (p < 0.05) in all datasets except Echocardiogram, Eye and Wine. The p-value for the pair:- (ERENNR and RE-E-NNS) is significant at (p < 0.01) or (p < 0.05) in all datasets except Echocardiogram, Eye, and Wine datasets. The p-value for the pair:- (ERENN_MHL and RE-E-NNS) is significant at (p < 0.01) or (p < 0.05) or (p < 0.1), in the Credit Approval, German, Eye, Pima Indians Diabetes, and Car Evaluation datasets. So, after analyzing all the mean ranks, pairwise comparisons, and p-values in Table 10 and 11, it can be concluded that in most of the cases, the results for the RE-E-NNS algorithm are significant enough to reject the null hypothesis. So, the overall results for RE-E-NNS algorithm are statistically significant compared to Re-RX, RE-Re-RX, ERENNR, and ERENN_MHL algorithms.
Table 10
Multiple comparisons using Friedman statistical test
Datasets
|
Algorithms
|
Mean Ranks
|
Friedman statistic
|
p-value
|
Credit Approval dataset
|
Re-RX
|
1.5
|
36.0412
|
0.000020277***
|
RE-Re-RX
|
2.05
|
ERENNR
|
3.3
|
ERENN_MHL
|
3.45
|
RE-E-NNS
|
4.7
|
Australian Credit Approval
|
Re-RX
|
1.05
|
39.8191
|
0.000000047178***
|
RE-Re-RX
|
1.95
|
ERENNR
|
3
|
ERENN_MHL
|
4
|
RE-E-NNS
|
5
|
Echocardiogram
|
Re-RX
|
2.7
|
1.92
|
0.7505
|
RE-Re-RX
|
3
|
ERENNR
|
2.9
|
ERENN_MHL
|
3.2
|
RE-E-NNS
|
3.2
|
Statlog (Heart)
|
Re-RX
|
1.35
|
32.5155
|
0.0000015009***
|
RE-Re-RX
|
2.05
|
ERENNR
|
2.9
|
ERENN_MHL
|
3.8
|
RE-E-NNS
|
4.9
|
Breast Cancer
|
Re-RX
|
1.15
|
38.7208
|
0.000000079555***
|
RE-Re-RX
|
1.95
|
ERENNR
|
2.9
|
ERENN_MHL
|
4
|
RE-E-NNS
|
5
|
German
|
Re-RX
|
1.45
|
29.1224
|
0.0000073822***
|
RE-Re-RX
|
2.3
|
ERENNR
|
2.8
|
ERENN_MHL
|
3.45
|
RE-E-NNS
|
5
|
Eye
|
Re-RX
|
1
|
26.2769
|
0.000027825***
|
RE-Re-RX
|
2.95
|
ERENNR
|
3.25
|
ERENN_MHL
|
4.5
|
RE-E-NNS
|
3.3
|
Pima Indians Diabetes
|
Re-RX
|
1.1
|
29.0103
|
0.0000077798***
|
RE-Re-RX
|
2.75
|
ERENNR
|
3.35
|
ERENN_MHL
|
3
|
RE-E-NNS
|
4.8
|
Census Income
|
Re-RX
|
1.45
|
29.8523
|
0.0000052451***
|
RE-Re-RX
|
2.3
|
ERENNR
|
3.05
|
ERENN_MHL
|
3.75
|
RE-E-NNS
|
4.45
|
Thyroid
|
Re-RX
|
1.05
|
39.0553
|
0.000000067857***
|
RE-Re-RX
|
2.05
|
ERENNR
|
2.9
|
ERENN_MHL
|
4
|
RE-E-NNS
|
5
|
Wine
|
Re-RX
|
1.1
|
32.2553
|
0.0000016965***
|
RE-Re-RX
|
2
|
ERENNR
|
3.8
|
ERENN_MHL
|
4.3
|
RE-E-NNS
|
3.8
|
Car Evaluation
|
Re-RX
|
1.7
|
33.93684
|
0.00000076775***
|
RE-Re-RX
|
1.7
|
ERENNR
|
2.8
|
ERENN_MHL
|
3.8
|
RE-E-NNS
|
5
|
*** means (P < 0.01), ** means (P < 0.05), *means (P < 0.1), Bold values in the Mean Ranks column represent highest mean ranks.
|
Table 11
Pairwise comparisons using Least Significant Difference (LSD) post-hoc test
Datasets
|
Pairwise comparison
|
p-value
|
Credit Approval
|
Re-RX
|
RE-E-NNS
|
0.0000030462***
|
RE-Re-RX
|
RE-E-NNS
|
0.0001109***
|
ERENNR
|
RE-E-NNS
|
0.04114**
|
ERENN_MHL
|
RE-E-NNS
|
0.06825609*
|
Australian Credit Approval
|
Re-RX
|
RE-E-NNS
|
0.000000021415***
|
RE-Re-RX
|
RE-E-NNS
|
0.00001531***
|
ERENNR
|
RE-E-NNS
|
0.0046***
|
ERENN_MHL
|
RE-E-NNS
|
0.1563
|
Echocardiogram
|
Re-RX
|
RE-E-NNS
|
0.2482
|
RE-Re-RX
|
RE-E-NNS
|
0.6442
|
ERENNR
|
RE-E-NNS
|
0.4884
|
ERENN_MHL
|
RE-E-NNS
|
1
|
Statlog (Heart)
|
Re-RX
|
RE-E-NNS
|
0.00000034416***
|
RE-Re-RX
|
RE-E-NNS
|
0.0000427***
|
ERENNR
|
RE-E-NNS
|
0.0041***
|
ERENN_MHL
|
RE-E-NNS
|
0.1142
|
Breast Cancer
|
Re-RX
|
RE-E-NNS
|
0.000000041108***
|
RE-Re-RX
|
RE-E-NNS
|
0.00001386***
|
ERENNR
|
RE-E-NNS
|
0.0028***
|
ERENN_MHL
|
RE-E-NNS
|
0.1542
|
German
|
Re-RX
|
RE-E-NNS
|
0.00000039484***
|
RE-Re-RX
|
RE-E-NNS
|
0.00011472***
|
ERENNR
|
RE-E-NNS
|
0.0017***
|
ERENN_MHL
|
RE-E-NNS
|
0.0268**
|
Eye
|
Re-RX
|
RE-E-NNS
|
0.00098727***
|
RE-Re-RX
|
RE-E-NNS
|
0.6162
|
ERENNR
|
RE-E-NNS
|
0.9429
|
ERENN_MHL
|
RE-E-NNS
|
0.0857*
|
Pima Indians Diabetes
|
Re-RX
|
RE-E-NNS
|
0.0000001079***
|
RE-Re-RX
|
RE-E-NNS
|
0.0032***
|
ERENNR
|
RE-E-NNS
|
0.0373**
|
ERENN_MHL
|
RE-E-NNS
|
0.0097***
|
Census Income
|
Re-RX
|
RE-E-NNS
|
0.00000088605***
|
RE-Re-RX
|
RE-E-NNS
|
0.00042718***
|
ERENNR
|
RE-E-NNS
|
0.0218**
|
ERENN_MHL
|
RE-E-NNS
|
0.2514
|
Thyroid
|
Re-RX
|
RE-E-NNS
|
0.000000021415***
|
RE-Re-RX
|
RE-E-NNS
|
0.000028845***
|
ERENNR
|
RE-E-NNS
|
0.0029***
|
ERENN_MHL
|
RE-E-NNS
|
0.1563
|
Wine
|
Re-RX
|
RE-E-NNS
|
0.000082042***
|
RE-Re-RX
|
RE-E-NNS
|
0.4658
|
ERENNR
|
RE-E-NNS
|
1
|
ERENN_MHL
|
RE-E-NNS
|
0.4658
|
Car Evaluation
|
Re-RX
|
RE-E-NNS
|
0.0000016833***
|
RE-Re-RX
|
RE-E-NNS
|
0.0000016833***
|
ERENNR
|
RE-E-NNS
|
0.0014***
|
ERENN_MHL
|
RE-E-NNS
|
0.0817*
|
*** means (P < 0.01), ** means (P < 0.05), *means (P < 0.1)
|
5.3 Discussion
The proposed RE-E-NNS aims to demonstrate the high performance of NNEs through rule extraction, and the results presented in the preceding subsection proves that the algorithm can generate productive rules from NNEs. The RE-E-NNS algorithm is implemented, on twelve datasets collected from the UCI and KEEL repository. Results (Table 7) show that the rule-sets generated by the proposed RE-E-NNS algorithm are more accurate compared to Re-RX, RE-Re-RX, ERENNR, and ERENN_MHL algorithms in most of the datasets. The average fidelity of the algorithm is better compared to Re-RX, RE-Re-RX, and ERENNR algorithms and almost similar to ERENN_MHL (Table 8). The performance of the algorithm is also validated with precision, recall, f-measure, False Positive Rate (FPR), specificity, Balanced Accuracy (BA), and Matthews Correlation Coefficient (MCC) (Table 9). The statistical significance of the proposed algorithm is demonstrated, with two well known statistical tests: the Friedman test and LSD test (Table 10 and Table 11). The proposed algorithm is also compared with a popular rule extraction algorithms named Three-MLP Ensemble Re-RX which also uses decision tree for extracting rules from NNEs, and the results in Table 12, 13, and 14 proves the superiority of RE-E-NNs compared to the Three-MLP Ensemble Re-RX algorithm.
If the comprehensibility issue is considered, the RE-E-NNS may generate many rules compared to the other rule extraction algorithms. Because it uses a decision tree to extract rules from an NNE and it also merges the extracted rule-sets from many NNEs to obtain the final rule-set. The RE-E-NNS may construct rules with many conditions because it does not uses networking pruning and attribute pruning techniques for discarding the irrelevant attributes like others. Table 15, shows the comparison between the algorithms with the average comprehensibilities of ten folds for a mixed mode dataset. Results show that the comprehensibilities of the rule-sets generated by the RE-E-NNS are not the best among the six algorithms, but they are not worse either. For the dataset, the comprehensibility for RE-E-NNS is better than Re-RX, RE-Re-RX, and Three-MLP Ensemble Re-RX algorithms.
Table 15
Comparison of comprehensibility of rules for a mixed mode dataset
Dataset
|
Comprehensibility
|
Re-RX
|
RE-Re-RX
|
ERENNR
|
ERENN_MHL
|
Three-MLP Ensemble Re-RX
|
RE-E-NNS
|
Australian Credit Approval
|
Global
|
11.8
|
11.8
|
3.4
|
3.7
|
12.6
|
4.1
|
Local
|
34.2
|
33.6
|
2.4
|
5.5
|
36.8
|
11.9
|
Though the comprehensibilities of the rule-sets for RE-E-NNS are not best, still the rule-sets are adequate for explaining the knowledge sealed in NNEs with very high accuracies. Moreover, accuracy and comprehensibility have an inverse relationship. The rule-sets can classify unknown patterns with better accuracy compared to the NNs and NNEs (Table 6 and Table 7 shows that). In overall, the algorithm is competent enough to mimic the high performance of NNEs through the extraction of high-performance rules.