We will show in the following, the way of evaluating a classification model through an algorithm that predicts whether there is a fire at a given location or not. Triggering a fire alarm when there is no fire is less serious than not triggering an alarm when a house is burning. The matrix confusion test [8], which evaluates the effectiveness of a classification system, was the instrument employed in this study. A real class is represented by each row, while an estimated class is represented by each column.
3.1 Confusion matrix (fire prediction)
Posing the issue of fire prediction. How to predict the latter when there is none, do not predict when there is really one, or do not. This problem is difficult to solve ! And in order to find the solution, we will use the confusion matrix:
Table 1. Confusion matrix of fire prediction
In this instance, "positive" refers to the class that a fire falls under, and "negative" to the other. If we correctly forecast, there will be a fire and it happens, that is a true "positive" prediction. On the other hand, if this forecast turns out to be false, it is a false positive, etc. Sometimes we also call them « Type-I errors » which are known as false-positives, and "false-type II" false negatives.
To get a matrix of confusion with scikit-learn [9] that is also known as sklearn, which is a free machine learning library for the Python programming language, we just have to use «confusion_matrix function » [10].
3.2 Confusion matrix results and evaluation indicators
The matrix of confusion we use materializes the confrontation between the observed and predicted classes [9]. Interpretable (metric) Indicators are derived with the help of the confusion matrix command. A third parameter makes it possible to designate the target modality,which is necessary for the calculation of certain indicators. In our case, we seek above all to identify fraudulent messages (fire = pos).
Below is an excerpt from the command code "confusion matrix" created to predict a fire.
The matrix is transposed compared to the usual presentation. We have predicted classes online, observed in columns. The success rate (accuracy, not to be confused with precision) is 92.1%. The 95% confidence interval is provided. It is rare enough to be reported.
We have other indicators, in particular the sensitivity, which associated with the positive class "fire = pos", is equal to 471 / (471 + 72) = 86.74%.
The "confusion matrix" object has a series of properties. To access the global indicators, we use $overall which is a vector with named values.
To access the "Accuracy" cell, we will do:
Access to indicators by class goes through the $byclass field :
After running all the command codes of the confusion matrix, we got fire detection accuracy equal to : 471/ ( 471 + 37 ) = 92.71%. Here, we obtained very good precision.
3.3. Criteria to optimize and derived values
We can derive many performance criteria from the previously constructed confusion matrix. When we don't know how many points are in the test set [11], it is generally preferable to give a proportion of errors rather than the entire number of errors (5% of errors, more than 10 errors).
To gauge how effective our model is at identifying events in the positive class [12]. Sensitivity measures the proportion of real detect fire events that are accurately predicted, given that detect fire is the positive class. The fraction of accurately detected positives, or true positives, is what we refer to as the sensitivity. This is our model's capacity to identify every fire.
Thus, we will use the abbreviations TP (True Positive), FN (False Negatives) and FP (False Positives) to calculate the "Sensitivity" [13], the "Sensitivity" and the "Precision" which are the four main terminologies of the matrix of confusion.
One can easily have a very good reminder by systematically predicting "positive". We will not miss any fire, but our model is not used for much.
Therefore, we shall concentrate on accuracy, or the percentage of correctly anticipated points among positively predicted points. It is the capacity of our model to set off an alert only in the case of an actual fire.
We can relatively and easily have very good accuracy [4] by predicting very few positives (we are less likely to be wrong) [14].
We can compute the "F-measure", which is their harmonic average, to assess a trade-off between recall and precision.
As a summary, it can be claimed that confusion matrices have the benefit of being easy to read and comprehend. It allows you to quickly visualize data and statistics to analyze model performance and identify trends that may help change settings. It is also possible to employ a confusion matrix for classification issues with three or more classes by adding rows and columns.
3.4. Error measures
When we talked about evaluating classification models, we started by counting the number of prediction errors that the model makes [15]. This is not appropriate for a regression problem.
In fact, how precisely can we use numbers to determine if a prediction is accurate or not? On the other hand, we generally prefer a model that is globally "closer to true values" to an almost exact model for a few points.
The figure below gives information on predictions and provides the true values to predict fire and represent it [6].
The red line makes almost exact predictions for two of the points, but the orange line is globally closer to the true values to predict and better represents the data.
Let us evaluate this notion of "closer to true values". We will calculate for each point xi of the test set,the distance between its label and the predicted value and sum it up. The result is the sum of the residue squares, or RSS, for Residual Sum of Squares.
The problem with RSS is that it is greater when we have data. For this reason, we will normalize it by the number of n points in the test set. This gives the mean squared error, or MSE, for Mean Squared Error [16].
To get back to the unity of y, we can take the root of the MSE. This gives the RMSE, or Root Mean Squared Error.
But RMSE does not behave very well when labels can take values that are spread over several orders of magnitude [17]. Imagine making an error of 100 units on a label that is 4; the corresponding term in the RMSE is 1002 = 10000. It's exactly the same as if you make a mistake of 100 units on a label that is 8000. However, a prediction of 104 instead of 4, an error of 2 orders of magnitude, seems to me to be a mistake much larger than a prediction of 8100 instead of 8000.
To take this into account, one can pass the predicted values and the true values to the log before calculating the RMSE. We thus obtain the RMSLE (Root Mean Squared Log Error) [18]:
Let's revisit our original example. For the prediction of 104 fire detections instead of 4, the corresponding term in the RMSLE is now 1.75. That corresponds to the prediction of 8100 instead of 8000 is now 3.10-5. And there, we notice that the trick of the passage to the log worked!
But RMSE does not show relative values, that’s why we will choose to use the coefficient of determination, which is, however, very precise in telling how well your model explains a phenomenon.