In this part, we examine the influence of four different data transformation methods, wrapper feature selection approach to the performance of ML algorithms, and their effect on the computational times. Additionally, we show the benchmark results of ten ML algorithms on two publicly available credit risk datasets. The average outcomes of each model for each credit risk dataset over five separate testing and training partitions are illustrated in Tables 3–7. In the tables, each evaluation criterion belonging to the top model is typed in bold to indicate its superiority. It should also be noted that the programs are run in an environment using Python 3.6.3 on an Intel 12700H CPU with 14 cores, 16 GB of RAM PC. The findings of the research are list in the following five titles.
Influence of transformations to prediction power
We consider Natural Logarithm, Cox-Box, Min-Max and Standard data transformation methods. The application of these methods is divided into 2 cases, the first, appliance to only the continuous variables, and, the second, to both continuous and categorical variables defined as (all).
Table 3
Rankings for Australian dataset on data transformations.
SVM | Rank | LR | Rank | KNN | Rank | MLP | Rank | RF | Rank | GNB | Rank |
Min-Max | 2.33 | Standard | 2.00 | Min-Max (all) | 2.33 | Box-Cox | 2.00 | Box-Cox | 1.33 | Min-Max (all) | 1.78 |
Standard (all) | 2.56 | Min-Max (all) | 2.89 | Standard (all) | 2.67 | Standard (all) | 3.00 | No Scale | 1.33 | Min-Max | 1.89 |
Min-Max (all) | 2.67 | Natural logarithm | 3.00 | Min-Max | 3.00 | Min-Max (all) | 3.11 | Min-Max | 1.67 | Natural logarithm | 1.89 |
Standard | 3.78 | Box-Cox | 3.44 | Standard | 4.22 | Natural logarithm | 3.44 | Standard | 1.67 | Box-Cox | 1.89 |
Natural logarithm | 4.22 | Min-Max | 3.44 | Natural logarithm | 4.67 | Min-Max | 3.67 | Min-Max (all) | 1.89 | Standard (all) | 2.00 |
Box-Cox | 4.56 | Standard (all) | 3.78 | No Scale | 4.67 | Standard | 3.89 | Standard (all) | 1.89 | Standard | 2.22 |
No Scale | 4.67 | No Scale | 3.89 | Box-Cox | 5.00 | No Scale | 5.78 | Natural logarithm | 2.00 | No Scale | 2.78 |
Table 4
Rankings for German dataset on data transformations.
SVM | Rank | LR | Rank | KNN | Rank | MLP | Rank | RF | Rank | GNB | Rank |
Min-Max | 2.22 | Standard (all) | 1.89 | Min-Max | 2.11 | Standard | 1.44 | Natural logarithm | 2.00 | Standard | 2.11 |
Standard (all) | 2.89 | Standard | 2.11 | Standard | 2.56 | No Scale | 2.22 | Box-Cox | 2.22 | No Scale | 2.44 |
Standard | 3.22 | Box-Cox | 3.00 | Box-Cox | 3.44 | Standard (all) | 3.56 | No Scale | 2.22 | Standard (all) | 2.56 |
Natural logarithm | 3.44 | No Scale | 3.44 | Standard (all) | 4.11 | Box-Cox | 3.67 | Min-Max | 2.44 | Min-Max (all) | 2.67 |
No Scale | 3.78 | Min-Max (all) | 3.44 | Natural logarithm | 4.11 | Min-Max (all) | 4.56 | Standard | 2.78 | Min-Max | 3.11 |
Min-Max (all) | 4.22 | Natural logarithm | 3.44 | No Scale | 4.67 | Min-Max | 4.56 | Standard (all) | 2.78 | Box-Cox | 3.22 |
Box-Cox | 4.44 | Min-Max | 4.67 | Min-Max (all) | 4.89 | Natural logarithm | 4.67 | Min-Max (all) | 3.44 | Natural logarithm | 3.44 |
In order to show the effect of transformations, we consider the approach that taking the average rankings of three methods default, GS, and WFS on each three criteria; accuracy, sensitivity, and specificity.
Table 3 indicates that any kind of scaling is profitable for SVM algorithm, especially, Min-Max delivers the best ranking, and similar results also apply for the German dataset. When we examine LR algorithm, the results suggest that Standard scaling is the most efficient approach for both datasets. Another ML algorithms that scaling is valuable is KNN. The Min-Max and Standard are the most convenient way of transformation for this algorithm. When we consider the MLP, it can be said that all transformation approaches utilize the performance on the Australian dataset, and best choice for scaling is the Box-Cox. However, we observe that transformation is ineffective and even has adverse effect on the other dataset (Table 4). A similar view applies to GNB algorithm, in which any scaling method enhances the prediction power on the Australian dataset, and oppositely there is no use in employing these techniques for the German dataset. The table illustrates that the best outcome is obtained with Min-Max scaling on Australian Credit dataset. It is observed that data scaling is ineffective for RF in literature, however the findings show that transformations influence the algorithm slightly.
Influence of transformations to the computational cost
In order to enrich the research, we also share the cost effectiveness of the applied analyses. We measure the computational time under two costly methods hyper-parameter optimization (GS) and WFS. In terms of SVM, the following, Tables 5 and 6, indicate a dramatical decrease in almost all cases, specifically Natural Logarithm and Min-Max scaling methods deliver consistent savings in both datasets. A similar view is seen for LR algorithm, and the cases that Min-Max and Standard scales are applied to continuous and ordinal variables show stable results. KNN also gain advantage from natural logarithm scaling in both datasets. The scaling provides a decrease most of the cases. Additionally, GNB benefits from standard and Min-Max transformation. However, for the MLP and RF algorithms, the undertaken methods are not convenient in terms of cost reduction.
Table 5
Impact of data scaling in terms of computation time on German dataset
Method | Scale | RF | GNB | SVM | MLP | KNN | LR |
| Without | 01:12 | 00:08 | 05:40:49 | 01:07 | 00:09 | 00:35 |
| Standard | 01:14 | 00:01 | 09:38 | 01:11 | 00:02 | 00:23 |
Grid | Min-Max | 01:23 | 00:01 | 07:09 | 01:11 | 00:02 | 00:23 |
Search | Box-Cox | 01:13 | 00:01 | 08:24 | 01:10 | 00:02 | 00:38 |
| Natural log | 01:12 | 00:01 | 06:45 | 01:11 | 00:02 | 00:27 |
| Min-Max (All At.) | 01:13 | 00:01 | 08:04 | 01:17 | 00:02 | 00:11 |
| Standard (All At.) | 01:13 | 00:01 | 11:23 | 01:03 | 00:02 | 00:07 |
| Without | 02:36 | 00:28 | 08:37 | 25:33 | 00:42 | 03:04 |
| Standard | 02:39 | 00:27 | 06:15 | 41:48 | 00:43 | 02:18 |
Wrapper | Min-Max | 02:29 | 00:27 | 03:50 | 37:21 | 00:43 | 02:45 |
Feature | Box-Cox | 02:42 | 00:29 | 06:42 | 34:06 | 00:43 | 03:02 |
Selection | Natural log | 02:30 | 00:27 | 04:23 | 30:28 | 00:43 | 02:33 |
| Min-Max (All At.) | 02:56 | 00:27 | 01:31 | 40:19 | 00:42 | 01:23 |
| Standard (All At.) | 02:33 | 00:27 | 01:41 | 42:16 | 00:44 | 00:55 |
Table 6
Impact of data scaling in terms of computation time on Australian dataset
Method | Scale | RF | GNB | SVM | MLP | KNN | LR |
| Without | 00:38 | 00:01 | 01:26 | 00:11 | 00:43 | 00:06 |
| Standard | 00:44 | 00:01 | 01:26 | 00:22 | 01:03 | 00:08 |
Grid | Min-Max | 00:36 | 00:01 | 01:25 | 00:20 | 00:56 | 00:01 |
Search | Box-Cox | 00:36 | 00:01 | 01:22 | 00:19 | 00:49 | 00:03 |
| Natural log | 00:38 | 00:01 | 00:13 | 00:19 | 00:40 | 00:03 |
| Min-Max (All At.) | 00:37 | 00:01 | 01:37 | 00:21 | 01:04 | 00:01 |
| Standard (All At.) | 00:36 | 00:01 | 01:20 | 00:22 | 01:07 | 00:01 |
| Without | 01:06 | 00:16 | 01:47 | 11:39 | 00:25 | 00:34 |
| Standard | 01:09 | 00:13 | 01:37 | 10:53 | 00:30 | 00:22 |
Wrapper | Min-Max | 01:10 | 00:15 | 01:49 | 16:59 | 00:29 | 00:29 |
Feature | Box-Cox | 01:09 | 00:17 | 00:45 | 15:31 | 00:29 | 00:33 |
Selection | Natural log | 01:01 | 00:18 | 01:38 | 12:08 | 00:25 | 00:35 |
| Min-Max (All At.) | 01:12 | 00:15 | 00:58 | 13:37 | 00:29 | 00:21 |
| Standard (All At.) | 01:07 | 00:16 | 01:41 | 13:35 | 00:27 | 00:19 |
Impact of wrapper feature selection method
To illustrate the effectiveness of wrapper approach, we compare the results with the untouched ML models in that no parameter optimization and no data transformation is applied. The predictive comparisons are made based on Table 7. As mentioned in section 2, we use ML algorithm as the undertaken algorithm for the wrapper approach. Also, we share which variables are selected during the analysis with two categories. The first category contains the variables that deliver the highest accuracy ratio, and the other one is given for the overall view of mostly chosen features in the trials. The second category is the list of features that appear more than twice in the selection stage. We cannot share the features resulting in the analysis of the Australian dataset since the feature names are not given in the original file.
The credit default risk detection performances are compared as the default of ML algorithm and WFS is applied version of the algorithms (Table 7). The findings suggest that almost all of the algorithms are influenced positively from WFS method. Except for LGBM on Australian dataset (88.9–88.35%), and XGB on German dataset (79.67–78.67%), we see an improvement on all the other cases in terms of prediction (accuracy). Specially, LR (78.1–86.67%), KNN (72.9–80.95%), MLP (74.3–84.76%), GNB (78.6–88.10%), and SVM (54.3–85.71%) gain a dramatical enhancement on Australian dataset. For the other dataset, DT (66.33–72.67%), LR (74.67–77%), and LGBM (72.40–76.40%) are the ones that gains notable default prediction utilization. In terms of the number of attributes, almost all algorithms gain an improvement with considerably less features. Specially, DT (five out of twenty), KNN (eight out of twenty), LGBM (nine out of twenty), and LR (eleven out of twenty) do better with much less attributes (Table 8).
Performance against minority and majority classes
We also investigate default risk capability of the algorithms on minority (default) and majority (non-default) class to show a deeper view. As mentioned in Section 4.1, German dataset has a 30–70% of class distribution which indicates less balanced ratio comparing to Australian dataset (45–55%). Regarding determining a more suitable algorithm on this subject, we consider the criterion balanced accuracy rate calculated as giving equal weights to sensitivity and specificity. In this way, we can see how good the algorithms are at paying attention to both classes.
For such a comparison we consider the cases that parameter optimization is applied to each algorithm. The outcomes of the analyses (Fig. 1) indicates that LGBM (90.46%), CB (89.39%), and XGB (88.95%) are the top three algorithms whereas SVM (73.94%), KNN (77.70%), and MLP (79.35%) are the least successful ones for equally pay attention to both classes on Australian dataset. However, MLP (73.09%) and SVM (70.95%) take their place in the top three with XGB (73.89%), and the bottom three performed algorithms are KNN (61.98%), DT (67.38%), and GNB (68.42%) on the other dataset. Overall, we can summarize the findings as XGB algorithm is the best choice for this credit risk detection task since it is placed in the top three on both datasets, and KNN is not a convenient algorithm for an unbalanced dataset because it is hierarchically placed in the bottom three on both datasets.
Benchmark validation
Now, we illustrate the ultimate performance validations that delivers the best result based on rankings. On Australian dataset, the top approach is the combination of parameter optimization and LGBM (90.60% of accuracy) algorithm, and it is followed by CB (89.50%) and XGB (89.10%) and with GS. The bottom three approaches are shared by the algorithms SVM (54.30% of accuracy), SVM with GS (72.90%) and MLP (74.30). For the German dataset, the most convenient method is the XGB which yields 81.67% of accuracy, and the rest of the top three approaches are MLP with GS (81.00%) and MLP with WFS (79.67%). The least successful three methods are ordered hierarchically DT (66.33% of accuracy), SVM (71.00%), and KNN (71.33%) in the Table 7.
Table 7
Banchmark comparison of ML algorithms on German and Austrian credit dataset
| | German | | | | | Australian | | |
Model | Accuracy | Sensitivity | Specificity | Mean Ranks | Model | Accuracy | Sensitivity | Specificity | Mean Ranks |
XGB-GS | 81.67 | 54.44 | 93.34 | 2.33 | LGBM-GS | 90.60 | 92.20 | 88.70 | 3.33 |
MLP-GS | 81.00 | 53.33 | 92.86 | 3.67 | CB-GS | 89.50 | 90.40 | 88.40 | 5.00 |
MLP-WFS | 79.67 | 51.11 | 91.91 | 5.00 | XGB-GS | 89.10 | 88.00 | 89.90 | 5.33 |
XGB | 79.67 | 48.89 | 92.86 | 6.00 | LGBM | 88.90 | 89.40 | 88.40 | 6.67 |
SVM-GS | 79.33 | 50.00 | 91.90 | 6.33 | CB-WFS | 88.92 | 90.10 | 87.40 | 7.00 |
RF-GS | 79.00 | 44.44 | 93.81 | 7.00 | GNB-WFS | 88.10 | 82.90 | 92.30 | 8.00 |
MLP | 78.67 | 51.11 | 90.48 | 7.00 | CB | 88.10 | 88.30 | 87.70 | 8.00 |
XGB-WFS | 78.67 | 48.89 | 91.43 | 8.00 | XGB-WFS | 87.14 | 83.80 | 89.80 | 9.00 |
LR-GS | 78.00 | 46.67 | 91.43 | 9.00 | DT-WFS | 87.14 | 83.80 | 89.80 | 9.00 |
DT-GS | 77.00 | 43.33 | 91.43 | 10.33 | LGBM-WFS | 88.35 | 90.60 | 85.50 | 9.00 |
LR-WFS | 77.00 | 45.56 | 90.47 | 10.33 | DT-GS | 87.10 | 79.50 | 93.30 | 9.33 |
LGBM-GS | 76.83 | 51.67 | 87.62 | 10.33 | LR-WFS | 86.67 | 82.10 | 90.40 | 10.00 |
CB-WFS | 76.90 | 49.00 | 88.86 | 10.67 | RF-GS | 86.70 | 87.20 | 86.30 | 11.33 |
CB-GS | 76.90 | 50.00 | 88.43 | 10.67 | SVM-WFS | 85.71 | 81.20 | 89.30 | 12.33 |
GNB-GS | 76.67 | 47.78 | 89.05 | 11.67 | LR-GS | 86.20 | 85.50 | 86.80 | 12.33 |
GNB-WFS | 76.00 | 53.33 | 85.72 | 12.00 | DT | 86.20 | 86.30 | 86.10 | 12.67 |
LGBM-WFS | 76.40 | 50.67 | 87.43 | 12.33 | XGB | 85.70 | 84.60 | 86.60 | 13.67 |
RF-WFS | 75.33 | 55.56 | 83.80 | 12.33 | RF-WFS | 85.71 | 85.50 | 85.90 | 14.00 |
GNB | 75.33 | 61.11 | 81.42 | 12.33 | MLP-WFS | 84.76 | 81.20 | 87.60 | 14.33 |
RF | 74.67 | 48.89 | 85.72 | 14.67 | GNB-GS | 80.50 | 91.50 | 71.70 | 14.67 |
LR | 74.67 | 41.11 | 89.05 | 15.33 | RF | 83.80 | 85.50 | 82.50 | 15.67 |
SVM | 71.00 | 7.78 | 98.09 | 16.00 | GNB | 78.60 | 91.50 | 68.20 | 16.00 |
LGBM | 72.40 | 24.00 | 93.14 | 16.00 | KNN-WFS | 80.95 | 84.60 | 78.00 | 16.67 |
CB | 74.50 | 42.00 | 88.43 | 16.00 | KNN-GS | 76.20 | 91.50 | 64.00 | 17.33 |
SVM-WFS | 72.67 | 28.89 | 91.43 | 16.67 | SVM | 54.30 | 95.70 | 21.10 | 18.33 |
KNN-GS | 73.00 | 34.44 | 89.53 | 17.00 | MLP-GS | 79.10 | 82.10 | 76.60 | 18.67 |
DT-WFS | 72.67 | 32.22 | 90.01 | 17.33 | LR | 78.10 | 86.30 | 71.50 | 18.67 |
KNN-WFS | 73.33 | 40.00 | 87.61 | 17.67 | MLP | 74.30 | 75.20 | 73.60 | 21.33 |
KNN | 71.33 | 38.89 | 85.23 | 20.33 | KNN | 72.90 | 83.80 | 64.10 | 21.33 |
DT | 66.33 | 42.22 | 76.66 | 20.67 | SVM-GS | 72.90 | 83.80 | 64.10 | 21.33 |
Table 8
Selected Features for German credit data set using ML methods
Algorithms | More than twice chosen features | Best accuracy |
DT | 1,4,8,9,12 | 1,4,8,9,12 |
KNN | 4,8,9,10,12,14 | 3,4,8,9,10,12,13,14 |
LGBM | 2,5,16 | 1,2,3,4,5,6,12,16,19 |
SVM | 4,6,8,9 | 1,2,3,4,6,7,8,9,10,13 |
GNB | 1,2,4,5,6,8,9,11,12,13 | 1,2,4,5,6,8,9,11,12,13 |
XGB | 1,4,5,7,8,9,10,11,12,13 | 1,4,5,7,8,9,10,11,12,13 |
LR | 4,5,6,8,10,14 | 1,2,3,4,5,6,8,9,10,12,14 |
MLP | 1,2,3,4,5,6,7,8,9,11,12 | 1,2,3,4,5,6,7,8,9,11,12,13,14 |
RF | 1,2,3,4,5,6,8,9,10,11,12,13,14 | 1,2,3,4,5,6,8,9,10,11,12,13,14 |
CB | 1,2,3,4,5,6,7,8,12,15,16,19 | 1,2,3,4,5,6,7,8,9,11,12,14,15,16,19,20 |