Image-based disease classification in grape leaves using convolutional capsule network

Crop protection is the prime hindrance to food security. Plant diseases destroy the overall quality and quantity of agricultural products. Grape is an important fruit and a major source of vitamin C nutrients. The automatic decision-making system plays a paramount role in agricultural informatics. This paper aims to detect the diseases in grape leaves using convolutional capsule networks. The capsule network is a promising neural network in deep learning. This network uses a group of neurons as capsules and effectively represents spatial information of features. The novelty of the proposed work relies on the addition of convolutional layers before the primary caps layer, which indirectly decreases the number of capsules and speeds up the dynamic routing process. The proposed method has experimented with augmented and non-augmented datasets. It effectively detects the diseases of grape leaves with an accuracy of 99.12%. The method's performance is compared with state-of-the-art deep learning methods and produces reliable results.


Introduction
In agriculture, plant diseases have vital distress because they constantly reduce crop quality and production. The properties of plant diseases range from minor symptoms to severe damage and thereby reduce the agricultural economy drastically (Savary et al. 2012). Different methods have been developed to detect the disease to avoid losses. The processes associated with molecular biology give correct identification of disease agents. But, these methods are not available to the farmers directly. It requires a lot of money and domain knowledge. For the above reasons, a lot of research has been carried out to derive the methodologies that will be accurate enough and accessible to the farmers. The decision-making technologies are used to solve the problems of precision agriculture (Arsenovic et al. 2019).
In India, grapes (scientific name Vitis vinifera) are one of the common fruits. India holds 18th position for the production of grapes. The Indian states of Tamil Nadu, Andhra Pradesh, Karnataka, and Maharashtra are the major producing states of grapes. The grape plant grows well with a temperature of 15-40°C. The leaves are the paramount part of any plant. It has a long span of time in comparison with the buds and flowers. It shows the entire characteristics of plants. The leaves of the grape plants are commonly affected by black rot, black esca, and leaf blight diseases. These diseases affect the grape yield and produce loss to the cultivators (Andrushia et al. 2019(Andrushia et al. , 2020. Early detection and identification of diseases help the farmers to reduce their losses. Hence, it is essential to find the diseases through automatic systems at the earlier stages. The latest technologies are adopted to enhance the decision-making process of precision agriculture (Gebbers and Adamchuk, 2010). A more considerable amount of data has been collected in real-time, and various artificial intelligence techniques are used to make optimal decisions, which lead to cost reduction. Still, the field of decisionmaking systems in agriculture informatics is in infancy and open for improvements. It has been reviewed that various machine learning algorithms are used for this purpose. Decision trees, logistic regressions, k-nearest neighbors (KNN), support vector machine (SVM), and extreme learning machine (ELM) are the few machine learning techniques used for this purpose. The easy accessibility of cameras and the tremendous growth of internet facilities make the automatic detection system viable. Due to the cheap availability of gadgets, the automated detection system is one of the less complex tasks.
The recent boom in the deep learning methods was also adopted for the automatic process in agriculture. The advancement in computer memory and hardware setup can lead to different solutions. These deep learning methods can be used to solve complex tasks in a reasonable time. This paper proposes a convolutional capsule network for disease detection and classification of grape leaves. It improves the detection accuracy and analyzes the improvements of other deep learning models. The influence of capsule dimensions on detection accuracy is explored. By increasing convolution layers before the primary caps layers, the proposed model provides accurate detection of diseases.
The other parts of the paper are organized as follows: Sect. 2 elaborates on the literature survey on plant disease detection. Section 3 briefly introduces the convolutional capsule network. Section 4 details the experimental implementation and result analysis. The final section concludes the findings and limitations of the research work.

Motivation and related work
Timely diagnosis of plant diseases is the prime factor in controlling plant loss. If a human performs it, it needs a lot of time, chances of getting errors, and is costly. In recent years, researchers worked on automatic techniques to detect and classify diseases in plants. The automatic equipment and methods are excellent for monitoring the crop fields (Ampatzidis et al. 2017). This section elaborates on the computer vision methods to detect the disease in plant leaves. Qin F et al. (2016) investigated leaf disease detection by a supervised classification algorithm. Initially, k-median and fuzzy C-means clustering methods are used to obtain lesion images. After the lesion segmentation, texture, color, and shape features are extracted. The optimal features are extracted from a set of 129 features. The optimal features are trained by the machine learning methods of logistic regression, linear discriminant analysis, naive Bayes, and regression tree, and their related diseases are detected.
An automatic leaf disease detection system using machine learning techniques is discussed by Jaisakthi et al. (2019). Graph cut segmentation is used to segment the foreground from the background. The features of color and textures are extracted from the foreground. Random forest, AdaBoost algorithm, and SVM classifiers are used to classify the diseases of grape leaves. SVM classifier with global thresholding has given 93.03% of testing accuracy. The detection accuracy of the method depends on the adopted features, and it consumes much time. Waghmare et al. (2016) investigated a grape plant leaf disease detection system. Initially, the background of the leaf images is removed. The segmentation technique is applied to segment the diseased part of the leaves. The fractal-based texture features are extracted. The extracted features are given into multiclass SVM. The proposed method classifies the grape disease of downy mildew, black rot, and powdery mildew. The classification accuracy of 89% is obtained through MSVM. It varies if the training and testing samples are increased. The size of the dataset is less, and the performance metric is changed in accordance with the size of the dataset.
The need to find automatic disease detection methods is seriously evolving due to the accomplishment of machine learning techniques. The traditional machine learning algorithms manually extract the features from the images and feed those features into a detection algorithm. These manual feature extraction steps make the machine learning method less versatile. But, deep learning models consist of many processing layers and fetch the features directly from the image. It is a time-consuming process and less robust.
Different techniques are adopted to build the deep learning-based algorithm. Recent advancements in artificial neural network architecture made a foundation for hybrid deep learning models. Autoencoders, sparse coding, restricted Boltzmann machines, and customized convolution neural networks (CNN) are some of the widely used architectures. Among these models, CNN is mainly used for computer vision applications (Guo et al. 2016). It has outstanding success in the agriculture field also. LeNet, AlexNet, ResNet, GoogleNet, Visual Geometry Group (VGG), etc., are examples of CNN-based models. Few studies reported the use of CNN-based plant disease classification (Chen et al. 2019;Liu et al. 2017;Wang et al. 2017;Liu et al. 2017;Ma et al. 2018;Polder et al. 2019;Fuentes et al. 2018) and yielded promising results. Table 1 highlights the plant disease classification methods for different plants. Kerkech et al. (2018) investigated disease detection in grapes using the CNN model. The color space and vegetation indices are combined with CNN LeNet-5 to classify the diseases and obtain 95.8% detection accuracy. The number of expert labeled input limits this method. Gandhi et al. 2018 explored CNN for plant disease detection. The deep architectures of Inception V3 and MobileNet are used to detect plant diseases with 88.3% and 92% accuracy. Initial, pre-processing steps are adopted to remove the noises. A deep convolutional generative adversarial network (DCGAN) is used for augmentation steps to increase the number of images. The model is deployed in mobile applications. Even though the images are augmented, the detection accuracy of the model is less.
Arnal Barbedo (2019) presented plant disease identification for ten diseases. The authors increased the data samples of each disease category by gathering 60% of images under controlled conditions and 40% under actual field conditions. Initially, the individual lesions and spots are segmented. After removing backgrounds, the pretrained GoogleNet CNN architecture is applied to classify the diseases. The detection accuracies are varied following the plant species. The usage of spot segmentation and background removal significantly impact the detection accuracies. Zhu et al. (2021) investigated the YOLOv3-SPP-based deep learning method for detecting black rot in grape leaves. The super-resolution image enhancement scheme and convolution neural network-based techniques are leveraged to detect the disease in the grape leaves. Initially, the input image is up-sampled using bilinear interpolation. After the enhancement, the inputs are given into the YOLOv3-spatial pyramid pooling model and achieve 95.79% of detection accuracy. Under the field environment, this method produces 86.69% of precision only. Yuan et al. (2022) presented a deep-leaning method for detecting black rot spots on grape leaves accurately. DeepLab V3 ? model is used, which combines the feature maps of different levels. The backbone of this network is ResNet 101. The test results show that the improved DeepLab v3 ? has better outcomes than the conventional method. Zhu et al. (2021) and Yuan et al. (2022) researched black rot spot diseases of grape leaves. Other common diseases on grape leaves are not included in these research papers.
The CNN-based visual geometry group (VGG) is used to find the diseases of grape leaves (Paymode and Malode, 2022). The plant village dataset and real-time field images are collected and used for this study. Deep learning-based data augmentation techniques are adopted to increase the size of the dataset. The different trails of hyper-parameters are observed in this research. The detection accuracy of 98.40% is achieved for detecting the diseases in grape leaves. The processing time of the individual input image is not recorded in this research work.
Five grapevine leaf species are classified using the CNN-SVM-based approach Koklu et al. (2022). Mobile-Netv2 CNN model is leveraged to classify the leaf type. Initially, the features are extracted from the pre-trained MobileNet2 logits layer, and classification is performed through SVM with different kernels. The feature selection is performed by the Chi-squares method and achieved 97.60% of classification accuracy. The feature selection Image-based disease classification in grape leaves using convolutional capsule network 1459 procedures increase the classification accuracy. Grape black measles disease detection is investigated by Ji and Wu (2022). The disease severity is detected with the help of the ResNet-50-based DeepLabV3 ? segmentation model and fuzzy logic. The region of interest features and the percentage of infections are obtained from the input image. The fuzzy rule-based reference system is developed based on each feature, and it is used to grade the grape disease. The healthy, mild, medium, and severe classification grades are obtained concerning only the measles disease.
From the literature, it is understood that the deep learning models effectively detect diseases in plant leaves. CNNs are highly used networks with complex layers and are used to extract the features such as rotation, translation, and scale (Sezer and Sezer, 2019). The prime requirement of any CNN-based architecture is massive training data which is not feasible for individual plant-based image analysis. Another major shortcoming is the loss of information due to local details such as position and posture during image analysis. Hence, CNNs are not a good choice for extracting diverse information such as orientation, scale, and semantic class required for image reconstruction. Recently, the Hinton team (Sabour et al. 2017) introduced a new deep learning architecture capsule network, which overcomes the disadvantages of traditional deep learning techniques. It is a novel building block that represents spatial relationships of features effectively. It is mainly applied in image recognition and classification tasks. Even though the capsule network is applied to the different research fields such as tumor classification (Afshar et al. 2018), disease classification (Sezer and Sezer, 2019;Afshar et al. 2020;Verma et al. 2020), drug detection , object detection (Kumar, 2018), hyperspectral image classification (Deng et al. 2018, Yin et al. 2019, M. E. Paoletti et al. 2018, it is still in infancy. Actually, a capsule network uses a set of neurons as a capsule. The capsules are vectors to denote internal properties that can be utilized to learn the relationship between various features. So, the model can effectively infer the potential variant with lesser training inputs.
Few studies only used the capsule network in plant disease analysis because it is one of the recent techniques in computer vision. The limitations in the field of plant disease detection are listed as follows: reduction in trainable parameters to design an efficient network and development of an efficient model to detect the plant diseases for a large dataset. These limitations are addressed in the proposed method. The novelty of the work is as follows: the proposed method combines CNN and the capsule network to obtain superior results in comparison with the state-of-the-art techniques. The convolutional layers are added before the primary caps layer, which indirectly decreases the number of capsules and speeds up the dynamic routing process.
The influence factors of the proposed model are the dimension of capsules and the routing number of training phases which are analyzed toward classification results. It can guide subsequent research on plant disease classification using a capsule network.

Materials and methods
The complete process of proposed grape leaf disease detection using convolutional capsule networks is highlighted in Fig. 1. The key steps involved in the process are image collection, image augmentation, dataset partitioning, convolutional capsule network, diseased classification, and performance analysis.

Dataset
In this study, the grape leaves dataset is collected from the farms cultivating grape plants. The images are captured in the grape fields located in Madampatti, Coimbatore district, Tamil Nadu, India. In addition to this, the images from the plant village dataset (Hughes and Salathé, 2015) were also used for the proposed experiment. The healthy grape leaf images and diseased leaf images are taken. The considered disease categories are grape black rot, grape esca and grape leaf blight. Figure 2 shows the images under each disease category.

Capsule networks
Capsule network has a deep learning architecture. It handles affine transformation well (Deng et al. 2018). The capsules of the capsule network are designed to represent the output in terms of vectors. The capsules consist of a set of neurons that can learn the entities of an image such as pose, size and orientation. Dynamic routing mechanism is followed to route the data from one layer to another layer. The lower layer capsules predict the response of higher level capsules. The higher level capsules are triggered only when the lower level predictions agree. Consider 'i' is the lower level capsule and its output is l i . The prediction of higher level capsule 'j' is given by where W ij is the weight matrix. It is learnt by backpropagation technique. Each and every capsule try to predict the response of the higher level capsules. If the prediction of the capsule imitates the actual response of higher level capsules, then the coupling coefficients between the capsules will increase. The coupling coefficient can be calculated by following equation: where b ij is the log prior probability. It is given as zero initially which highlights whether lower level capsule 'i' is coupled with higher level capsule 'j'. By using Eq. 3, the input vector of higher level capsule j is calculated.
The length of the output vector shows the probability of existence. The nonlinear squash function is used as ant activation function. It shrinks long vectors close to one and chokes short vectors to almost zero. It is given by, The input and output vector of jth capsule is represented by S j and V j . Equations 1, Eq. 2, Eq. 3 and Eq. 4 is used for the routing procedure to find V j . The routing number can decide the number of iterations. The output length of lower capsules encodes the existence probability of their entities (Zhang et al. 2019). The vector directions encode different properties of the entities such as orientation, size and posture. So, the capsules learn the spatial relationship between entities within the input image. The margin loss is used to detect if the entities of the particular class are present and can be obtained by Eq. 5. 'k' is the last layer of the capsule and l k is the loss function.
If T k is one, then 'k' is present. At the initial training process, the parameters of m þ , m À and b are indicated. The sum of the losses of all output capsules of the output layer is known as total loss.

Convolutional capsule network
The proposed method uses convolutional capsule networks for the disease classification of grape leaves. The architecture of the convolutional capsule network of the proposed method is shown in Fig. 3. The input image of size 128 9 128 is used for the classification. The architecture consists of 5 convolution layers, 1 primary layer, 1 digit layer and 3 fully connected layers. In order to generate an effective feature map, more convolutional layers are added. Initial layer consists of a 5 9 5 size of 16 convolutional kernels with 1 stride. 2 9 2 max-pooling with 2 strides is performed after the first layer. The second layer consists of a 5 9 5 size of 32 convolutional kernels with 1 stride. The third layer consists of a 5 9 5 size of 64 convolutional kernels with 1 stride. 2 9 2 max-pooling with 2 strides is performed after the second and third layer. The fourth layer consists of a 9 9 9 size of 128 convolutional kernels with 1 stride. Primary capsule layer is the fifth layer which consists of 32 capsules. Each capsule is applied with a 9 9 9 size of convolutional kernel with stride 1. The digit caps layer consists of 16 dimensional capsules with four classes. Image-based disease classification in grape leaves using convolutional capsule network 1461

Pre-processing and data augmentation
The input size of the proposed model is 128 9 128. The size of the images in the input dataset are not the same. So, all the images are resized into 128 9 128. Inadequate inputs cause overfitting problems in the deep neural network. In order to avoid overfitting problems, the input images are augmented. Rotation, scaling transformation, gamma correction, flipping and color augmentation techniques are used to generate the augmented dataset. The augmented images are randomly shuffled and grouped into training, testing and validation dataset. Figure 4 shows the example of augmented images. Table 2 highlights the number of grape leaf images under each category.

Optimization of hyper-parameter
The hyper-parameter optimization of convolutional capsule networks depends on the number of kernels in the convolutional layer, dimensions of primary caps and digit caps layers. In addition, routing number is one of the vital parameters to find coupling coefficients. It is tested from 1 to 3 with an increment of 1. The performance of the architecture is tested for each parameter setting. Based on the training dataset tenfold cross-validation is adopted. After achieving highest accuracy, the best set of parameters are chosen and applied for testing the dataset. In order to reduce overfitting, early stopping is adopted. Whenever the error in the validation set is less than the previous iteration, the training is stopped immediately and the optimal hyperparameter is derived.

Training of convolutional capsule network
During the training process, the weights of convolutional neural networks are randomly initialized using normal distribution with 0.01 standard deviation. Rectified linear unit activation function is used in all layers of the network. Batch normalization is used to reduce the internal covariate shift. The input of each layer is normalized to the standard Gaussian distribution (Ioffe and Szegedy, 2015). The Adam optimizer of stochastic gradient descent algorithm is utilized as the optimizer. This algorithm learns sparse data and fine tunes the learning rate for every parameter setting.
In comparison with non-adaptive methods, it has a high convergence rate. Let c1,c2,c3,c4,p5 are the parameters of four convolutional layers and the primary caps layer. The convolutional operations are represented with conv. The output from the convolutional layers and primary caps are denoted by O_conv1, O_conv2, O_conv3, O_conv4 and O_prim. Feature extraction process is achieved through convolutional layers and primary caps layer. The dynamic routing is used to generate digit capsules. The stochastic gradient is used to find the network parameters. Figure 5 highlights the flow diagram of routing process and the algorithm for training convolutional capsule network is given as follows:

Performance assessment
K fold is the efficient cross-validation method which is used for larger dataset. Tenfold cross-validation is used to evaluate the performance of the proposed method. The dataset is randomly separated into ten subset. Of the 10 subsets, 9 subset is used for training and the remaining dataset is used for testing the network. The cross-validation procedure is repeated 10 times. With each of the ten subsets used exactly once as the validation set. The performance of the method is obtained by taking an average of all ten runs. The performance of the proposed method is assessed through precision, recall, f-measure and accuracy. Precision is used to quantify the number of positive observations to the total observation. Recall is used to quantify the number of positive observations made out of all positive samples. F1 measure or F1 score is the weighted average of precision and recall which replicate the total number of observations that are correctly classified by the network. The equations of the performance metrics are given as follows: F1Measure Accuracy 4 Experimental results and discussion The proposed method for convolutional capsule networkbased grape leaves disease detection and classification is implemented in the system having 2.90 GHZ processor with 12 GB RAM and 4 GB GPU card NVIDIA GTX 1050. All the calculations are carried out in the aforesaid system. Epochs are varied in the range of 30-150 with respect to the testing accuracy. The learning rate is varied from 0.01to 0.0001 Different batch sizes are tried for the network in order to fit in the GPU. The batch size varies from 16 to 64. In order to cater the computer memory, it is fixed as 64. Different volumes of training parameters are considered with respect to the dataset in the training process. 85% of the dataset is used for training, and 15% of dataset is used for testing. The optimal set of hyper-  Table 3. The detailed quantitative and qualitative analysis is carried out for augmented and non-augmented datasets.

Performance analysis
All the images in the dataset are resized into 128 9 128. The collection of image dataset for diseased grape leaves is a time-consuming process. The image augmentation steps are used to provide sufficient images for training and testing of the model. The proposed convolutional capsule network for grape disease detection and classification is experimented for two datasets (with and without augmentation). It is trained with optimal hyper-parameters. The performance of the model is evaluated for tenfold crossvalidation. In the training process, epochs play vital role in order to check the performance of the model. The proposed method has been tried with different epochs which are starting from 30. The model has converged after 150 epochs. Hence, the training accuracy and loss of the model are obtained up to 150 epochs. The performance metrics of the proposed method for augmented dataset and non-augmented dataset are highlighted in Tables 4 and 5. The detailed results of tenfold cross-validation are shown. The experimental results highlight that the classification accuracy is high for augmented dataset. 99.12% of classification accuracy is obtained by convolutional capsule network for augmented dataset. Recall is one of the important performance metrics which is used to minimize the false negatives. In the proposed method, the recall value is as high as 99.06%. Adding convolutional layers in the capsule network and the parameters which are employed in the network results in accurate detection of grape leaf diseases. Figure 6 shows the graphs of performance metrics under training and validation phases. The epochs play a paramount role in the learning process of deep learning models. The proposed model of convolutional capsule network for grape leaves disease classification achieved 99.12% of validation accuracy and 0.091 loss. It is for augmented dataset. For non-augmented dataset, the validation accuracy is 92.13% and loss is 0.132. Around the 120th epoch, the high accuracy is obtained but it is not consistent. It converged after 150 epochs. The proposed model is trained for augmented dataset and non-augmented dataset in which the augmented datasets give higher performances.

Capsule dimensions
The capsules are an important component of the convolutional capsule network. It consists of many neurons. The capsules in the primary caps layer are the lower level capsules. It learns from convolutions layers, and it can represent small entities of the input image. The capsules in the digit caps layers are higher dimensions which represent the complex entities of the input. Hence, the dimensions of the capsule play a major role in the classification tasks. If the dimensions of the capsule are low, then the representation ability of the capsule is less. In addition, if the dimensions of the capsule are high, then the redundant information will occur. Both cases provide negative results in the classification. The set of values for capsule dimensions ((4,8), (6,12), (8,16), (10,20)) are used to evaluate the performance of the model. The other parameters are given as per Table 3. The effect of capsule dimensions over two datasets is given in Fig. 7. D1 is an augmented dataset. D2 is a non-augmented dataset. The classification accuracy is  Image-based disease classification in grape leaves using convolutional capsule network 1465 high for the capsule dimensions of (8,16). The performance of the model gradually increases, as the dimension increases. When the dimensionality is too high, then it causes dimensional disaster and results in performance degradation.

Routing number
Routing is an iterative process. The routing method acts as an orientation filter. The inputs from lower capsules are sent to the higher capsules, which agree with their input. The routing number is the prime factor in detecting whether a capsule network can obtain accurate coupling coefficients. If the routing number is increased, then accuracy also increases. But the increased routing number makes the model overfit. Hence, it is necessary to determine the optimal routing number. In this experiment, the routing number is checked for 1, 2, and 3. The network used other parameters, which are listed in Table 3. In general, the routing helps the smaller capsules to determine the optimal route which agrees with its input. The coupling coefficient value increases if more routing occurs, which overfits the input concerning the specific larger capsule. Figure 8 shows the effect of routing numbers concerning classification accuracy. Initially, the accuracy is increased slightly; afterward, it is decreased. By expanding the routing number, the classification accuracy is not improved. The performance of the model is degraded if the routing number is increased. The smaller routing number gives insufficient training, and the larger value will miss the optimal fitting. If the routing number is high, then the training time of the network is also high. Hence, the optimal routing number 2 is chosen in this experiment.

Time complexity
In the proposed work, a convolutional capsule network is used to detect the grape leaves diseases. The network consists of convolutional layers before the primary caps layer. It decreases the number of primary capsules and speeds up the dynamic routing process. As the network uses dynamic routing over multiple iterations, the number of operations is higher, but it also makes the entire network converge quickly. Hence, the design complexity of the network also reduced. The time complexity is based on the  6 Training and loss graphs of non-augmented dataset (left) and augmented dataset (right) Fig. 7 Effect of capsule dimensions on accuracy required number of multiplications in the capsule layer and fully connected layers. The lower layer capsule 'i' having the dimensions of d 1 and the higher layer capsule 'j' having dimensions of d 2 . By considering 'k' capsules in lower layers and 'm' capsules in the higher layer, the total number of operations is m 9 d 2 9 k 9 d 1 . Each prediction involves d 1 9 d 2 multiplications. According to Eq. 3, each capsule is calculated as a weighted average over predictions. d 2 multiplications involved to weight each prediction (e l jji Þ with coupling coefficients ðC ij Þ. Each routing process includes d 2 9 k 9 m multiplications. The capsule networks have less number of layers in comparison with convolutional neural networks. The proposed method took 1.82 s to process each input in the NVIDIA GTX 1050 computer.

Comparative analysis
The performance of the proposed method is compared with five state-of-the-art deep learning methods. All the methods have adopted a deep learning network for grape leaves disease detection. Recent research works for grape leaves disease detection methods were only selected for comparison. Table 6 shows the comparative deep learning methods. CNN-based architectures are mainly used to detect the diseases in grape leaves. Most Kerkech et al. (2018) investigated disease detection using the CNN model. The color space and vegetation indices were combined with CNN LeNet-5 to classify the diseases and obtained 95.8% of detection accuracy. The real-time datasets are used in this method. Xie et al. (2020) experimented with the disease detection of grape leaves by a faster R-CNN model. The pre-trained models of Inception v1, and Inception ResNet v2 blocks are used to derive a faster DR-IACNN model. The implementation results highlight that the faster IACNN method produces 81.1% of average precision. The four common diseases of grape leaves are classified. The real-time grape leaf disease detection dataset (GLDD) is used. The improved convolution neural network (DICNN) is used by Liu et al. (2020) to detect the common diseases of grape leaves. DICNN model obtains 97.22% of detection accuracy. It is higher than the GoogleNet and ResNet-34 models. Rao et al. (2021) explored disease detection for mango leaves and grape leaves. The pre-trained model with AlexNet is used to detect the diseases. The plant village datasets and real-time datasets are used. 99.03% of detection accuracy is obtained. Huang et al. (2020) explored transfer learning-based disease detection for grape leaves. The real-time self-acquired dataset is used for the study. The pre-trained models of VGG16, MobileNet, and Alex-Net were used. MobileNet and improved AlexNet produced 97% of classification accuracy. Ji et al. (2020) used multiple CNN to extract the complementary features from the input. Inception V3 and ResNet50-based CNN architecture are used to detect the diseases. The model obtains 98.57% testing accuracy. The grapevine yellow disease is analyzed through various CNN models (Cruz et al. 2019), in which ResNet-50 produced better performance.

Discussions
The automated systems with deep learning techniques produce reliable results in most pattern recognition problems. Among these deep learning techniques, researchers broadly use CNN for image classification tasks. Even though CNN is used extensively, there are a few disadvantages. It fails to set relationships among spatial relationships such as posture, size, and orientation. The prime reason is that the input is processed through subsampling and max-pooling techniques.
The first capsule networks were proposed by Hinton (Sabour et al. 2017). This network received attention from the researchers because of its performance. The remarkable performance of the network is based on the dynamic routing algorithm. The capsule is a set of neurons in the form of a vector. These vectors represent the input information in terms of spatial orientation, magnitude, etc.; the capsules are routed via dynamic routing from one layer to another. It will catch and hold more sufficient information than conventional deep learning models. Nevertheless, deep learning techniques often neglect the small changes in the input vector components. The capsule networks adopt vector neurons to perform better than the other deep neural models.
The proposed method combines CNN and the capsule network to obtain superior results compared to the state-ofthe-art techniques. The novelty of the proposed work depends on the additional convolutional layers which are added before the primary caps layer. These layers indirectly decrease the number of capsules and speed up the dynamic routing process. The influence factors of the In this work, the convolutional capsule network is used to detect the diseases of grape leaves. Unlike CNN, the diseases from grape leaves are detected from a small number of layers. Four convolutional layers and primary capsule layers are used. In the CNN model, several convolutional layers are used to detect the diseases of the same task (Geetharamani et al. 2019;Ghoury et al. 2019). Less number of layers makes the network to be less complex. To increase the reliability of the automatic system, a large number of images are used for comparison with previous studies. By decreasing the size of the input image, the information in the image is lost. In the proposed method, even though the input image is given with the size of 128 9 128, reliable classification is achieved. The CNNbased models used the input images with size larger than 128 9 128 (Xie et al. 2020).
The significant limitations of the research work are listed as follows: When processing large images, the convolutional capsule network requires a lot of hardware resources. There is no standard rule for finding the number of primary caps. It is discovered artificially by the convolution mode of the convolution layer.

Conclusions
The automatic plant disease detection system is a universal detector that detects the abnormalities caused by fungal and bacterial deformities. A convolutional capsule network is used to detect the diseases on grape leaves in this work. CNN is immensely used in several agricultural applications. The capsule network addresses the significant shortcomings of CNNs. In the proposed convolutional capsule network, convolutional layers are added before the primary caps layer, which indirectly decreases the number of capsules and speeds up the dynamic routing process. The influence factors of the proposed model are the dimension of capsules and the routing number of training phases which are analyzed toward classification results. The experimental results are evaluated under augmented and non-augmented datasets. The proposed model yields 99.12% of classification accuracy. The performance of the proposed method is analyzed through the performance metrics of precision, recall, f1 score, and accuracy.
The prime focus of this research is to offer improvements in the agriculture field and thereby increase food production. Future research direction can adhere to the images captured under different environmental conditions. It can give better support to developing an automatic plant disease detection system. In the future, the research work can be extended by adding additional disease classes of grape leaves. In the future, by developing an automated drone that can compute the overall health of fields and take immediate action. It will empower the farmers and decrease the losses of fields. Another direction is investigating the deep learning architecture optimization made with different hybrid models.
Funding This study was not funded by any other organization.
Data availability The datasets generated and analyzed during the current study are available from the corresponding author on reasonable request.

Declarations
Conflict of interest Authors declare that they have no conflict of interest.