Fire prediction using Machine Learning Algorithms based on the confusion matrix

doi:10.21203/rs.3.rs-3215936/v1

Download PDF

Research Article

Fire prediction using Machine Learning Algorithms based on the confusion matrix

https://doi.org/10.21203/rs.3.rs-3215936/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

In an earlier article, we outlined the process of developing a Machine Learning project that is often complex to establish, and that the problem must be broken down into several stages to facilitate its resolution [1]. We were able to identify 5 steps that we think are the most important to tackle such a project. These 5 steps are : the definition of the problem, the preparation of the data, the choice of the right algorithms, the optimization of the results and the presentation of the final results.

In this manuscript, we will propose the application of different ways of evaluating classification models through an algorithm that predicts whether there is a fire in a given location or not. We are conscious that this problem is difficult to solve, especially when we have to predict the latter when there is none, do not predict when there is really one, or not predict when there is none.

The method of this study will show how to choose the right algorithm and how to Evaluate it. The experiment shows promising results obtained thanks to the classification model algorithm and confusion matrix which provide fire detection accuracy around 92.71%.

Machine Learning project

algorithms

Linear Regression

Classification algorithms

Confusion matrix

regression algorithms

Technology known as "Machine Learning" [2] enables computers to learn without being specifically programmed for this task [3]. Computers require data to examine and train on in order to learn and advance. Even though machine learning has been around for a long time, many people still don't understand its exact definition. It is a contemporary science that makes predictions from patterns in data by using statistics, data mining, pattern recognition, and predictive analysis.

In this article, we'll employ a classification model method to tackle the challenge of determining whether or not there will be a fire at a specific place. The confusion matrix, which evaluates the effectiveness of a classification system by assigning each line to an actual class and each column to an estimated one, was one of the tools utilized in this study.

The confusion matrix, also known as the error matrix, is a table that compares various test findings and forecasts to actual values. These matrices are employed in artificial intelligence applications such as statistics, data mining, machine learning, and others [4].

2.1 Machine learning Algorithms

The field of Machine Learning is full of algorithms to meet different needs. Each has its own mathematical and algorithmic specificities. For someone who starts in the field, it may not be easy to understand [5]. The following list contains 8 of the most basic but undeniable machine learning algorithms.

Linear Regression
Logistic Regression
Vector Machine Support (SVM)
Naïve Bayes
Anomaly Detection
Decision Trees
Neurals Networks
K-Means

2.2 Choosing the right algorithm

Many choices of algorithms are available to Data Scientists. This decision is partially guided by the nature of the problem that needs to be solved. For example, we will not apply a classification algorithm to a regression problem [6]. Nevertheless, it is necessary to know how to evaluate any algorithm on its dataset.

A rigorous evaluation of algorithm’s performance of an is an essential step in its deployment[7]. While machine learning may address a variety of issues, including classification, regression, and clustering, there may be a number of potential solutions for each issue category. The number of characteristics, the volume of data we have, and other variables can all play a role in selecting the best algorithm.

The following figure is used to guide the machine learning project manager towards the algorithm that will seem best adapted to carry out his project.

2.3 Classification algorithms:

It is not enough to collect mountains of information and store them in databases, but they must be exploited to gain knowledge, and this is the goal of classification algorithms. Examining a newly introduced component's features to assign a class to a specified set distinguishes the classification operation. There are two steps to this process :

The first is the construction of the model and then its use ; the second is to understand the term classification, taking the example of the prediction of an object identified by the sonar. We want to know if it's a fish or just an object. So there are only two possible options to classify our prediction in one or the other category. This is a classification problem [6].

2.4 Evaluation classification model

We will utilize labeled data to make predictions about the class to which an object belongs in order to demonstrate the classification models. We will mostly discuss binary classification, where we must determine whether or not an object belongs to a class. For example, say whether an image represents a cat or not. If so, we say that this image is positive ; otherwise, it is negative.

It must be said that when evaluating classification models, we used the number of errors as a measure of performance [7]. But that's not the only criterion! Indeed, not all errors are equal.

We will show in the following, the way of evaluating a classification model through an algorithm that predicts whether there is a fire at a given location or not. Triggering a fire alarm when there is no fire is less serious than not triggering an alarm when a house is burning. The matrix confusion test [8], which evaluates the effectiveness of a classification system, was the instrument employed in this study. A real class is represented by each row, while an estimated class is represented by each column.

3.1 Confusion matrix (fire prediction)

Posing the issue of fire prediction. How to predict the latter when there is none, do not predict when there is really one, or do not. This problem is difficult to solve ! And in order to find the solution, we will use the confusion matrix:

Table 1. Confusion matrix of fire prediction

In this instance, "positive" refers to the class that a fire falls under, and "negative" to the other. If we correctly forecast, there will be a fire and it happens, that is a true "positive" prediction. On the other hand, if this forecast turns out to be false, it is a false positive, etc. Sometimes we also call them « Type-I errors » which are known as false-positives, and "false-type II" false negatives.

To get a matrix of confusion with scikit-learn [9] that is also known as sklearn, which is a free machine learning library for the Python programming language, we just have to use «confusion_matrix function » [10].

3.2 Confusion matrix results and evaluation indicators

The matrix of confusion we use materializes the confrontation between the observed and predicted classes [9]. Interpretable (metric) Indicators are derived with the help of the confusion matrix command. A third parameter makes it possible to designate the target modality,which is necessary for the calculation of certain indicators. In our case, we seek above all to identify fraudulent messages (fire = pos).

Below is an excerpt from the command code "confusion matrix" created to predict a fire.

The matrix is transposed compared to the usual presentation. We have predicted classes online, observed in columns. The success rate (accuracy, not to be confused with precision) is 92.1%. The 95% confidence interval is provided. It is rare enough to be reported.

We have other indicators, in particular the sensitivity, which associated with the positive class "fire = pos", is equal to 471 / (471 + 72) = 86.74%.

The "confusion matrix" object has a series of properties. To access the global indicators, we use $overall which is a vector with named values.

To access the "Accuracy" cell, we will do:

Access to indicators by class goes through the $byclass field :

After running all the command codes of the confusion matrix, we got fire detection accuracy equal to : 471/ ( 471 + 37 ) = 92.71%. Here, we obtained very good precision.

3.3. Criteria to optimize and derived values

We can derive many performance criteria from the previously constructed confusion matrix. When we don't know how many points are in the test set [11], it is generally preferable to give a proportion of errors rather than the entire number of errors (5% of errors, more than 10 errors).

To gauge how effective our model is at identifying events in the positive class [12]. Sensitivity measures the proportion of real detect fire events that are accurately predicted, given that detect fire is the positive class. The fraction of accurately detected positives, or true positives, is what we refer to as the sensitivity. This is our model's capacity to identify every fire.

Thus, we will use the abbreviations TP (True Positive), FN (False Negatives) and FP (False Positives) to calculate the "Sensitivity" [13], the "Sensitivity" and the "Precision" which are the four main terminologies of the matrix of confusion.

One can easily have a very good reminder by systematically predicting "positive". We will not miss any fire, but our model is not used for much.

Therefore, we shall concentrate on accuracy, or the percentage of correctly anticipated points among positively predicted points. It is the capacity of our model to set off an alert only in the case of an actual fire.

We can relatively and easily have very good accuracy [4] by predicting very few positives (we are less likely to be wrong) [14].

We can compute the "F-measure", which is their harmonic average, to assess a trade-off between recall and precision.

As a summary, it can be claimed that confusion matrices have the benefit of being easy to read and comprehend. It allows you to quickly visualize data and statistics to analyze model performance and identify trends that may help change settings. It is also possible to employ a confusion matrix for classification issues with three or more classes by adding rows and columns.

3.4. Error measures

When we talked about evaluating classification models, we started by counting the number of prediction errors that the model makes [15]. This is not appropriate for a regression problem.

In fact, how precisely can we use numbers to determine if a prediction is accurate or not? On the other hand, we generally prefer a model that is globally "closer to true values" to an almost exact model for a few points.

The figure below gives information on predictions and provides the true values to predict fire and represent it [6].

The red line makes almost exact predictions for two of the points, but the orange line is globally closer to the true values to predict and better represents the data.

Let us evaluate this notion of "closer to true values". We will calculate for each point xi of the test set,the distance between its label and the predicted value and sum it up. The result is the sum of the residue squares, or RSS, for Residual Sum of Squares.

The problem with RSS is that it is greater when we have data. For this reason, we will normalize it by the number of n points in the test set. This gives the mean squared error, or MSE, for Mean Squared Error [16].

To get back to the unity of y, we can take the root of the MSE. This gives the RMSE, or Root Mean Squared Error.

But RMSE does not behave very well when labels can take values that are spread over several orders of magnitude [17]. Imagine making an error of 100 units on a label that is 4; the corresponding term in the RMSE is 1002 = 10000. It's exactly the same as if you make a mistake of 100 units on a label that is 8000. However, a prediction of 104 instead of 4, an error of 2 orders of magnitude, seems to me to be a mistake much larger than a prediction of 8100 instead of 8000.

To take this into account, one can pass the predicted values and the true values to the log before calculating the RMSE. We thus obtain the RMSLE (Root Mean Squared Log Error) [18]:

Let's revisit our original example. For the prediction of 104 fire detections instead of 4, the corresponding term in the RMSLE is now 1.75. That corresponds to the prediction of 8100 instead of 8000 is now 3.10-5. And there, we notice that the trick of the passage to the log worked!

But RMSE does not show relative values, that’s why we will choose to use the coefficient of determination, which is, however, very precise in telling how well your model explains a phenomenon.

The confusion matrix is used to show the ways in which any classification model can be confused when making predictions. It provides an overview of the errors made by a classifier as well as the types of errors that are made.

In this article, we used measures to evaluate the classification and performance of the model through the confusion matrix to predict fire. A confusion matrix can be beneficial in showing us the performance of our classification model : For instance, whether the system correctly or incorrectly planned both positive and bad events can be determined. All of these procedures serve as the foundation for computing the statistical metric for the broader category of matrix confusion. In our situation, we employ the most popular, which are: sensitivity, specificity, precision and finally the F-measure. For the fire detection issue, a confusion matrix and class statistics were applied. To address problems with multinomial classifications, it may be easily changed.

Funding The authors did not receive support from any organization for the submitted work.

Conflict of Interest : The author states that there is no conflict of interest.

Ethical approval : Not Applicable.

Consent to participate : Not Applicable.

Consent for publication : All authors of the manuscript have agreed for authorship, read and approved the manuscript, and given consent for the submission of the manuscript.

Data availability : The datasets used during the current study are freely available in the UCI repository.

Code availability : The code will be available upon request to reviewers.

Authors Contribution : The authors confirm their contribution to the paper as follows:

Study conception and design: Korchi Adil, Abatal Ahmed, Mohamed Essaid
Data collection: Korchi Adil, Abatal Ahmed, Mohamed Essaid
Analysis and interpretation of results: Korchi Adil, Abatal Ahmed, Mohamed Essaid
Draft manuscript preparation: Korchi Adil, Abatal Ahmed, Mohamed Essaid

All authors reviewed the results and approved the final version of the manuscript.

KORCHI, A., MESSAOUDI, F., & Oughdir, L. (July-2019). Successful Machine Learning project. International Journal of Scientific & Engineering Research Volume 10, Issue 9, September-2019, 1540-1543.
Sharifani, K., & Amini, M. (2023). Machine Learning and Deep Learning: A Review of Methods and Applications. World Information Technology and Engineering Journal, 10(07), 3897-3904.
Heininger, M., & Ortner, R. (2022, August). Predicting packaging sizes using machine learning. In Operations research forum (Vol. 3, No. 3, p. 43). Cham: Springer International Publishing.
Singal, A. G., Mukherjee, A., Elmunzer, B. J., Higgins, P. D., Lok, A. S., Zhu, J., ... & Waljee, A. K. (2013). Machine learning algorithms outperform conventional regression models in predicting development of hepatocellular carcinoma. The American journal of gastroenterology, 108(11), 1723.
Abubakar, A. I., Ahmad, I., Omeke, K. G., Ozturk, M., Ozturk, C., Abdel-Salam, A. M., ... & Imran, M. A. (2023). A survey on energy optimization techniques in UAV-based cellular networks: from conventional to machine learning approaches. Drones, 7(3), 214.
Manzali, Y., & Elfar, M. (2023, June). Random Forest Pruning Techniques: A Recent Review. In Operations Research Forum (Vol. 4, No. 2, pp. 1-14). Springer International Publishing.
Rahman, A., Lu, Y., & Wang, H. (2023). Performance evaluation of deep learning object detectors for weed detection for cotton. Smart Agricultural Technology, 3, 100126.
Masood, F., Masood, J., Zahir, H., Driss, K., Mehmood, N., & Farooq, H. (2023). Novel approach to evaluate classification algorithms and feature selection filter algorithms using medical data. Journal of Computational and Cognitive Engineering, 2(1), 57-67.
Heininger, M., & Ortner, R. (2022, August). Predicting packaging sizes using machine learning. In Operations research forum (Vol. 3, No. 3, p. 43). Cham: Springer International Publishing.
Miranda, F. M., Köhnecke, N., & Renard, B. Y. (2023). Hiclass: a python library for local hierarchical classification compatible with scikit-learn. Journal of Machine Learning Research, 24(29), 1-17.
Li, S., Kong, X., Yue, L., Liu, C., Khan, M. A., Yang, Z., & Zhang, H. (2023). Short-term electrical load forecasting using hybrid model of manta ray foraging optimization and support vector regression. Journal of Cleaner Production, 388, 135856.
Turner, K. E., Sohel, F., Harris, I., Ferguson, M., & Thompson, A. (2023). Lambing event detection using deep learning from accelerometer data. Computers and electronics in agriculture, 208, 107787.
Khan, M. A. R., Afrin, F., Prity, F. S., Ahammad, I., Fatema, S., Prosad, R., ... & Uddin, M. (2023). An effective approach for early liver disease prediction and sensitivity analysis. Iran Journal of Computer Science, 1-19.
Galante, N., Cotroneo, R., Furci, D., Lodetti, G., & Casali, M. B. (2023). Applications of artificial intelligence in forensic sciences: C urrent potential benefits, limitations and perspectives. International Journal of Legal Medicine, 137(2), 445-458.
Moustafa, S. S., & Khodairy, S. S. (2023). Comparison of different predictive models and their effectiveness in sunspot number prediction. Physica Scripta, 98(4), 045022.
Mitra, A., Jain, A., Kishore, A., & Kumar, P. (2022, September). A comparative study of demand forecasting models for a multi-channel retail company: a novel hybrid machine learning approach. In Operations Research Forum (Vol. 3, No. 4, p. 58). Cham: Springer International Publishing.
Ma, N., Yin, H., & Wang, K. (2023). Prediction of the Remaining Useful Life of Supercapacitors at Different Temperatures Based on Improved Long Short-Term Memory. Energies, 16(14), 5240.
Ahmad, F., Waseem, Z., Ahmad, M., & Ansari, M. Z. (2023, May). Forest Fire Prediction Using Machine Learning Techniques. In 2023 International Conference on Recent Advances in Electrical, Electronics & Digital Healthcare Technologies (REEDCON) (pp. 705-708). IEEE.

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

Fire prediction using Machine Learning Algorithms based on the confusion matrix

Status:

Version 1

Abstract

Figures

1. Introduction

2. State of Art

2.1 Machine learning Algorithms

2.2 Choosing the right algorithm

2.3 Classification algorithms:

2.4 Evaluation classification model

3. Methodology of Fire prediction with a classification model algorithm

3.1 Confusion matrix (fire prediction)

3.2 Confusion matrix results and evaluation indicators

3.3. Criteria to optimize and derived values

3.4. Error measures

4. Conclusion

Declarations

References

Additional Declarations

Status:

Version 1