Recently, the adoption of machine learning methods has attained high performance in many computer vision tasks [1, 2, 3]. However, machine learning methods employ statistical learning algorithms to explore patterns in the training subset and then perform predictions and classification on the test subset, whereas deep neural networks manually choose features, provide an end-to-end pipeline for automatically extracting robust features, significantly improving the availability of leaf identification.
On the other hand, the identification of leaf diseases is essential for controlling disease spread and improving the healthy development of the pear industry [4, 5, 6]. However, plant disease prediction is a hard and interconnected task that needs different technical skills and experts in the field. The existing traditional approaches depend on a specialist or expert in the field to manually perform careful analysis of the foliar surface and then perform a diagnosis, but these approaches are inefficient in terms of complexity and cost [7, 8]. Therefore, the deployment of technology has become necessary for automatically recognizing and classifying the plant diseases from the appearance of the first symptoms.
Machine learning and deep learning approaches have received considerable attention recently for the purpose of object identification and classification [9]. Recently, there are several plant leaf diseases approaches that have been proposed [10, 11, 12], which are based on the employment of deep neural networks. These approaches showed that deep learning approaches have enhanced the classification accuracy in a comparison with the machine learning approaches. This is due to the large number of configurable parameters in the Deep Convolutional Neural Network (DCNN), where a substantial amount of labelled data is needed to train the model and improve the generalization capabilities.
The CNN models require enough number of training images in order to enhance the generalization capabilities. However, there is a lack of agricultural data, particularly in the field of identifying leaf diseases, as the collection process of huge disease datasets is inefficient in terms of labor and time, since this type of data requires an extensive knowledge in the area of plant diseases. In addition, the manual labelling task is a very subjective task, and ensuring the quality of the classified data is difficult [13].
As a result, the lack of training samples is the primary constraint to further improvement for the accuracy of leaf disease detection. Hence, the problem of train a deep learning model using a small amount of labelled data is worth examining. This problem has been addressed using the employment of traditional data augmentation methods [14].
Recently, the adoption of data augmentation approaches in computer vision tasks enhances the quality of deep learning classification, through obtaining a large size dataset, and hence better deep learning models can be trained with the improved datasets [15]. Therefore, since image data is distinct, a new training dataset can be extracted from the original image using a simple geometric transformation, including: rotation, scaling, translation, cropping, noise addition, and other data augmentation techniques. These strategies, however, provide very extra information, and hence improving the accuracy for classification objects in a certain area. Several approaches have considered the issue of data augmentation to enhance the classification accuracy, as presented in [10, 16, 17, 18, 19], where the most popular approach for training a deep CNN model with a small amount of data is to augment the input data with synthetic images.
In this research work, we employed the Generative Adversarial Network (GAN) to obtain new synthetic images that are augmented the training dataset. GAN is a machine learning-based approach that is utilized to produce additional new samples with similar features to the original samples, in order to be employed in the training process. GAN aims to generate entire synthetic images that can contribute to increase the dataset’s diversification. Recently, GAN approaches have become a standard method for dealing with dataset constraints. On the other hand, to mitigate the bias induced by class imbalance, authors of [17] proposed a new approach to augment synthetic samples named as Activation Reconstruction (AR).
Unlike the work presented in the previous research works [10, 16, 17, 20] which consists of 9 different classes with balanced dataset, our research work focuses on the same issue with imbalanced dataset. Therefore, our main target is to increase the training samples and balance the number of records in each class, in order to minimize the model over-fitting, and hence offers high classification accuracy.
Therefore, in this work, we develop an efficient deep vision classification approach for pear plant disease classification through the employment the Cycle GAN, in order to increase the number of samples in the selected dataset through producing new images, that result in enhancing the classification accuracy. Hence, the main contribution of this work lies on the following aspects:
-
Investigate the variety of pear leaf disease classification approaches based on deep neural network models.
-
Develop a new efficient approach that integrates classification with the CycleGAN approach in order to resolve the problem of a limited training set.
-
Validate the proposed approach through several experiments to assess the performance of the developed system in terms of classification accuracy.
The rest of this paper is organized as follows: Section 2 discusses the recent developed pear plant disease classification approaches, whereas Section 3 presents and discusses the proposed classification system. Section 4 shows the experimental testbeds and results, and Section 5 discusses the results obtained from several experiments that have been conducted to assess the efficiency of the developed system. And finally, Section 6 concludes the work presented in this paper and draws a future work.