4.1 Random Forest classifier:
Random Forest is the simplest and diverse method to solve classification problems. Here the forest term is meant ensemble of decision trees and usually trained using the bagging method as shown in Fig. 4. The Bagging method is combining different learning models to get good accuracy results. Based on the each tree class labels maximum voting the classifier output is decided.
Advantages: 1) It is easy to measure the relative importance of each feature for prediction
Disadvantages:1) Too many decision trees will lead to the slow algorithm
Basavaiah et al [53] introduced a model for tomato leaf disease classification by means of random forest classifier. The dataset consists of 500 images and resizing of size 500x500. The features colour histograms, local binary patterns, Hu moments are extracted. Further, dataset of size 300 images is used for training, and testing is done for 200 images. The classification is performed using a decision tree classifier and random forest classifier. The experiment resulted in 90% and 94% accuracy for decision tree and random forest classifiers respectively.
Chaudhary et al [54], introduced a modified random forest classifier for multi-class groundnut leaf disease classification problems. In this paper, a modified random forest classifier uses a random forest classifier, an attribute evaluator method, and an instance filter method. To show the performance of the proposed author compared existing machine learning algorithms such as SVM, neural network, and Logistic regression with the proposed model to check which classifier will be suitable for their dataset. An accuracy of 97.80% is achieved on five UCI machine learning repository benchmark datasets using the projected model.
4.2 Naïve Bayes classifier
A Naive Bayes classifier [58] is based on Bayes theorem and is a probabilistic machine learning model that’s used for classification tasks as shown in Fig. 5.
The fundamental Naive Bayes assumption is that each feature makes an independent and equal contribution to the outcome.
Advantages:
- It is faster and it can predict class easily
- it solves multi-class prediction problems
Disadvantages:
- It’s hard to find independent features
Padao et al [55], familiarized plant accurate recognition and classification using Naïve Bayes classifier. The features used for classification are texture and shape features are extracted. Training of classifier is performed on 30 different species datasets. The ROC curve is 0.981 which specifies the accuracy of the classifier is good.
4.3 Feedforward neural networks
Feedforward is a form of an artificial neural network [57] and it is inspired by a biologically inspired algorithm. Here the information passes only one direction forward and never comes backward. One of the simplest form feedforward networks is single layer perceptron and another form is multilayer perceptron. The single layer perceptron has a single layer of output nodes as shown in Fig. 6. Based on the weight series are fed as input to the output .
Multilayer perceptron (MLP) [47] consists of multiple layers of computational units or perceptron which are interconnected to the output layers as shown in the Fig. 7. It used the concept of backpropagation learning for training data.
MLP has advantages concerning for solving any complex problem with greater efficiency. It has a lot of applications in the field of speech recognition, image recognition and classification [48].
Advantages:
- It helps in solving the complex problem
- Adaptive learning makes the network extract the patterns from imprecise data.
Disadvantages:
- Sometimes it might take a longer time for training a large dataset
Since multilayer perceptron has a lot of advantages which led to the usage of this classifier in the field of leaf disease classification. Shak et al [ 48] used MLP for healthy and unhealthy leaf classification. With a training sample of 90, the accuracy of the classifier is 97.15%. The accuracy reduces as the training sample reduces since the test data set is more when compared to the train dataset. Next, MLP has marked its place in watermelon leaf disease classification [49]. Author Kutty et al [49], used MLP for watermelon leaf disease classification. The color features are extracted and feed to the classifier. The accuracy of 75.9% is achieved for 200 leaf samples.
Though MLP usage is extensively used in disease classification and the dataset which was used for classification was simple. The leaf dataset images were with a white or black background which helps the classifier outstands as the feature extraction will be easy. In this paper, the cotton dataset is with complex background and the performance of the classifiers is compared.
4.4 Adaptive Boosting (Ada Boost) classifier-
Ada Boost[14, 59] was proposed by Yoav Freund and Robert Schapire in 1996 and it is an iterative collective method as shown in Fig. 8. It helps in a combination of multiple poor performing classifiers so that classifier accuracy will be more. The basic idea behind Adaboost is to set the weights of classifiers and training the data sample in each iteration so that it ensures the correct predictions of unusual observations.
Two conditions should be met by Adaboost:
-
Different weighted training examples should be interactively trained by the classifier.
-
In each iteration, by minimizing training errors, it aims to provide an excellent match for these instances.
This method normally selects randomly the subset of training data. By choosing the training set based on the accurate forecast of the last training, it iteratively trains the AdaBoost machine learning model. It allocates the higher weight to incorrectly categorized observations so that these observations will have a high likelihood of classification in the next iteration. It also assigns weight to the qualified classifier according to the accuracy of the classifier in each iteration. Elevated weight will be given to the more accurate classifier.
This process iterates until the complete training data suits without any error or until the maximum estimator number specified is reached. To identify, perform a "vote" across all of the learning algorithms you created.
Advantages:
- It is less vulnerable to the overfitting problem
Disadvantages:
- It is sensitive to noisy data and outliers
In the paper, author Subasi et al [35] proposed ensemble Adaboost classifier is used to find the human activity using a sensor. Here the activity recognition is achieved using wearable sensors. The different physical activities were checked by authors proposed model and proved that their model is better when compared to others.
4.5 Support vector machine (SVM) classifier-
SVM [60] is a supervised machine learning algorithm that can be used for classification as well as for regression. It is formally defined by separating the hyperplane as shown in Fig. 9. A hyperplane is the line that helps in separating the data points. The SVM constructs hyperplane in high dimensional space or infinite dimensional space. These hyperplanes help in classifying the data and there can be more than one hyperplane. The hyperplane which is at maximum distance from data points will be considered for classification. The classifier is used for high dimensional spaces. A support vector machine [17.18] constructs in a high- or infinite-dimensional space a hyperplane or set of hyperplanes that can be used for classification, regression, or other tasks such as detecting outliers. Automatically, the hyperplane that has the largest distance to the nearest training data point in any class (so-called functional margin) achieves a good separation since, in general, the greater the margin, the lower the classifier's generalization error. SVM has its application in text classification, bioinformatics, hand-written recognition, image classification.
Advantages:
- classification accuracy is high
- Works well for a smaller dataset
Disadvantages:
- Training a large dataset will take a longer time
- Noise sensitivity
Priya et al [51], proposed a leaf recognition algorithm using Support Vector Machine (SVM). Here 12 features were extracted and the classifier uses the features extracted for classification. This process was carried out on flavia dataset and a real dataset. The author compared SVM classifier with the KNN classifier to show that the SVM has more accuracy and takes less training time.
Alehegn et al [52], worked on the Ethiopia maize disease leaf dataset and the author claims that the research carried out is not proposed by anyone. In this, pre-processing RGB to gray conversion, image enhancement is performed to improve the image quality. Further, texture, color, and morphological features are extracted. They are fed to the classifier and the accuracy is 95.63%.
4.6 K- NN classifier:
It is one of the simplest supervised classification algorithms. The K-NN [61] algorithm stores all available data and classifies, based on similarity, a new data point. This implies that it can be conveniently categorized into a well-suite group using the K-NN algorithm [1] as new data emerges. It can be used for classification and regression. It is often referred to as a lazy learner algorithm because it does not automatically learn from the training set, but instead stores the dataset and performs an operation on the dataset at the time of classification.
At the training point, the KNN algorithm only stores the dataset and then classifies the data into a group that is very close to the new data when it receives new data as shown in Fig. 10.
The K-NN working is based on the selection of K value so that Euclidean distance can be calculated for k number of neighbors. The categories are done based on the distance between data points. The query point will belong to the category where maximum number of neighbors
Advantages:
- It is very simple to implement.
- The performance will be good if the training data is large.
- No Training time
Disadvantages: The computation cost is high.
Hossain et al [45], proposed the leaf disease classification using the KNN classifier. In this paper, the Arkansas plant disease database and Reddit-plant leaf disease datasets are used for their research. The input image RGB to l*a*b* model so that color segmentation is performed. A segmented image is used further to get the color features to be extracted. The features are fed into the KNN classifier and an accuracy is 76.63%.
Krithika, N et [29], presented individual grape leaf disease identification using a KNN classifier. Author proposed tangential direction image segmentation. The color and GLCM features are extracted and further fed to the KNN classifier to get greater accuracy.
For cotton leaf disease classification, the images are segmented from the complex background, and removing the background is a challenging task. The background removal is considered as segmentation technique and to achieve that we used a modified factorization based active contour method. This method helps in recognizing the required leaf image from the image. Later, texture and color features are extracted and fed to the classifier for classification. In literature, there are supervised learning classifier algorithms like Artificial neural network, Support vector machine, K-NN classifier, AdaBoost, Naïve bayes classifier, Random forest classifier, etc. In this, we are comparing the performance of the classifiers based on the features selected. Features like color features or texture features are selected. The analysis is done on whether texture features or the color features or whether both texture and color features are enough to get the classification accuracy.
In this paper, work is focused on classification of leaf images as healthy and unhealthy. For this binary classification only color features are enough and if we further extend it to disease classification then color features won’t be sufficient.
4.7 Weka tool:
Waikato Environment for Knowledge Analysis, developed at the University of Waikato, New Zealand, is free software licensed under the GNU General Public License. It helps in analyzing machine learning algorithms [56, 57] and software is written in Java and it can run on any platform.