A novel approach for detecting outliers by using Isolation Forest with reducing under fitting issue 

DOI: https://doi.org/10.21203/rs.3.rs-2376758/v1

Abstract

The effectiveness of machine learning for a particular activity depends on a variety of parameters. The incident database's description and validity come first and primary. Information retrieval even during training cycle is more challenging if there is a lot of repetitious, unimportant information or incomplete information available. It is well knowledge that running time for ML tasks is significantly impacted by conditions are as follows and sorting stages. To increase the accuracy of any model data cleansing is essential. Without sufficient data scrubbing, no predictive model accuracy can begin. EDA, or exploratory data analysis, is the name of this procedure. In this study, we discussed outlier’s identification, one of many EDA processes for complete perfect data. In this research, we attempted to use the isolation forest approach to calculate the outlier factor. Then a model known as an outlier finding model is created. The problem of outlier detection leads to a collection of connected supervised learning for binary classification. We carry out in-depth tests on various datasets and demonstrate that in our latest outlier finding technique compare with the old way. Our approach yields superior outcomes in terms of accuracy, precision, recall & F-1 score. Additionally, we successfully lowered the machine learning algorithms' under fitting issue.