The purpose of this study was to develop SVM models with the ability to determine corrosion status of different grades of stainless steel exposed to biogas environments. The experimental data set was obtained from electrochemical tests in addition to microscopic analysis simulating the environmental conditions involved in biogas production.
The obtained results from the different configurations proposed for SVM models are shown below. In this case, the influence of two different kernel functions was analysed: polynomial (SVM-POL) and radial basis functions (SVM-RBF). Furthermore, the influence of the regularization parameter, C, was considered for both functions with the aim to determine the optimal configuration of the proposed SVM model in this application.
Figure 5 shows the results in terms of precision and accuracy obtained for SVM-POL models. According to the figure, it can be pointed out that these models presented similar behaviour for both indices. In this case, the models that resulted to be the optimal ones for localised corrosion modelling of stainless steel in biogas environments were SVM models considering linear kernel function. For these models, the regularization parameter had no great influence on classification performance. The highest values related to precision and accuracy indices provided by SVM-POL model were 0.94 and 0.93, respectively. These values reflected the capacity of the proposed SVM-POL model to predict the corrosion status of stainless steel after electrochemical tests in biogas environments accurately.
In order to identify the optimal configurations for these models, a statistical procedure for multiple group comparison, considering precision and accuracy values, was applied according to the procedure shown in Fig. 4. Firstly, the optimal values of C for each degree of the polynomial function considered in SVM-POL models were identified. These models are represented in Fig. 6 with red colour. Secondly, the optimal degrees of the polynomial function for each C value were identified. These models are represented in Fig. 6 with blue colour. Comparing the results obtained from these steps, those models that resulted to be identified as optimal models for both steps are represented with grey colour. Finally, these models are subjected to a multiple comparison test with the aim to define the optimal global equivalent configuration for SVM-POL models presented in this study. As a result of the application of this statistical procedure, the optimal configurations of SVM-POL model are represented in Fig. 6 with black colour. These configurations marked with black colour resulted to be equivalent to the model that provided the highest values of precision and accuracy, 0.94 and 0.93, respectively, marked with a cross in each graph. According to Fig. 6, the optimal configurations for SVM-POL model resulted to be the linear kernel function with C = 20, 21, 22.
Similarly, the influence of the parameters involved in SVM-RBF model was analysed. For this model, the results in terms of precision and accuracy are represented in Fig. 7 where the influence of C and γ values can be analysed. According to Fig. 7, the highest precision value reached by this model was 1. In this case, it can be observed that higher values of γ provided better results for the classification problem whereas the regularization parameter defined by C did not exhibit the same behaviour. For this case, as this parameter increased, a decrease in the classification performance was observed. This behaviour was different when the performance of the model in terms of accuracy was analysed. For accuracy results, the model provided the best classification performance when γ took intermediate values from the considered range whereas the highest values of C provided the maximum value of accuracy, equal to 0.95.
With the aim to determine the optimal equivalent configurations that provided the best results for SVM-RBF model, the procedure represented in Fig. 4 was applied. The results obtained from the application of multiple comparison tests are collected in Fig. 8.
According to the results collected in Fig. 8, the equivalent configurations that provided the optimal precision value (precision = 1 in Fig. 8.a), resulted to be SVM-RBF model with the following configurations, see Fig. 8.a): (γ = 24 – C = 20), (γ = 25 – C = 20, 21, 22, 25), (γ = 26 – C = 20, 21, 22,23, 24, 27), (γ = 27 – C = 20, 21, 22,23, 24, 25, 26) and (γ = 28 – C = 20, 21, 22,23, 24, 25, 26, 27, 28). Related to accuracy results, see Fig. 8.b), the equivalent configurations providing the optimal behaviour (accuracy = 0.952), resulted to be SVM-RBF model with the following pairs of values: (γ = 22 – C = 23, 24, 25, 26, 27, 28), (γ = 23 – C = 24, 26, 27), (γ = 24 – C = 25, 26) and (γ = 25 – C = 27).
Related to the results provided by SVM-RBF model, it can be noted that there were different configurations providing the highest precision value. However, these configurations may be different from those configurations identified as the optimal ones when accuracy results were analysed. This behaviour can be explained according to Eq. (7). Based on this equation, for those cases where the models provided a value of precision equal to 1, the false positive term results to be null. This means that the model presented high capacity to determine those patterns that will not suffer localised corrosion. However, this model may not present similar capacity to detect all the patterns that will suffer this attack accurately since no information about false negative patterns is included in precision term. For the optimal configurations of SVM-RBF model in accuracy term, the maximum value reached was 0.952, much lower than the result obtained for precision.
With the aim to determine the optimal configurations for the proposed SVM models: SVM-POL and SVM-RBF and looking for a balance between the capacity to detect the patterns that will suffer corrosion and those that will not suffer corrosion in biogas environments, ROC space is applied.
The ROC space is created by representing the true positive rate (TPR), defined by Eq. (9), versus the false positive rate (FPR), defined by Eq. (10), for each configuration. As it was introduced previously, these measures can be computed from the confusion matrix for each classification model. TPR corresponds to sensitivity whereas FPR is equivalent to 1-specificity. This graphic represents a useful tool to compare the classification performance of different models since in these classification problems, the goal is to identify those models that provide acceptable discriminability between the existing classes: corrosion and no corrosion patterns. This graphic is a two-dimensional graph that provides a tradeoffs between benefits (TP) and costs (FP). Each model can be represented by a single point in ROC space where the point (0,1) represents the perfect classification. In this way, one model represented in ROC space is better than another if it is to the northwest of the graphic (Fawcett 2006).
In the following figure, the optimal configurations identified for SVM models are represented, in addition to those developed model considering traditional classification techniques, such as classification tree (CT) and k-nearest neighbour (kNN) considering three different values for k: 1, 3 and 5.
According to Fig. 9, it can be pointed out that CT provided better results than KNN models. However, CT presented lower efficiency than SVM models. Therefore, the proposed SVM models become an efficient alternative to traditional techniques for this application. Specially, there were some configurations for SVM models that provided excellent specificity results (100%). These models are represented on the Y-axis (False Positive = 0). However, these configurations were not considered as the optimal ones since they did not present high capacity to detect all the corrosion patterns correctly since the number of FN patterns provided by these configurations was high. This is the reason why they are represented far from the upper left corner. For the application considered in this study, with the aim to find a balance between the right classification of patterns that will suffer corrosion and those that will not suffer this attack, the best configuration for SVM model can be defined as the model located nearest the upper left corner. In this case, the optimal configuration for SVM model resulted to be SVM-RBF (C = 26 and γ = 23) with sensitivity and specificity values of 94.0% and 96.6%, respectively. These values reflected the high capacity of the proposed model to predict the corrosion status of different grades of stainless steel in biogas environments without the need to perform microscopic analysis of the material surface, avoiding subjectivity in the results.