Maize Disease Recognition Based On Image Enhancement And OSCRNet

demonstrates strong robustness for maize disease images collected in the natural environment, providing a reference for the intelligent diagnosis of other plant leaf diseases.

enhancement framework of the maize leaf was designed, and a multi-scale image enhancement algorithm 23 with color restoration was established to enhance the characteristics of the maize leaf in a complex 24 environment and to solve the problems of high noise and blur of maize images. Subsequently, an 25 OSCRNet maize leaf recognition network model based on the traditional ResNet backbone architecture 26 was designed. In the OSCRNet maize leaf recognition network model, an octave convolution with 27 characteristics to accelerate network training was adopted, reducing unnecessary redundant spatial 28 information in the maize leaf images. Additionally, a self-calibrated convolution with multi-scale features 29 was employed to realize the interactions of different feature information in the maize leaf images, 30 enhance feature extraction, and solve the problems of similarity of maize disease features and easy 31 learning disorders. Concurrently, batch normalization was employed to prevent network overfitting and 32 enhance the robustness of the model. The experiment was conducted on the maize leaf image data set. 33 The highest identification accuracy of rust, grey leaf disease, northern fusarium wilt, and healthy maize 34 was 94.67%, 92.34%, 89.31% and 96.63%, respectively. 35 36

37
The aforementioned methods were beneficial in solving the problems of slow efficiency, low accuracy 38 and image recognition training, and also outperformed other comparison models. The present method modern agriculture, and significant results have been reported. 23 Presently, with the development and achievements of deep learning [7] and image processing [8] [9], 24 the convolutional neural network (CNN) [10] structure has been extensively employed in the 25 identification of agricultural pests and diseases in terms of feature extraction and feature recognition 26 ResNet network, are complex, and the training times are long, requiring a substantial amount of data for 32 training. Additionally, due to large noise, plant diseases, and insect pests, maize leaf images are often 33 unclear and difficult to obtain. Based on the aforementioned traditional convolutional neural networks, 34 including ResNet, there is substantial difficulty in achieving significant results with less data, and training

Fig. 1. The image recognition process of maize leaf diseases and insect pests 23
The image recognition process of maize leaf diseases and pests is articulated in Fig. 1: 24 1) Firstly, the data set of maize leaf pests and diseases was constructed. 25 2) Secondly, MSRCR maize leaf image pretreatment was conducted to obtain the enhanced images of 26 maize leaf diseases and insect pests. Meanwhile, to obtain as many data set samples as possible, the 27 data set was expanded. 28 3) Thirdly, utilizing the maize image enhanced data set, OSCRNet was trained. 29 4) Finally, the training model was applied to the identification of maize leaf diseases and insect pests. Obviously, the meaning of data sets is crucial in the experimental process of deep learning, and the 34 recognition accuracy of neural networks is seriously affected by the quality of data sets. In the present paper, images of maize leaf diseases and pests from different sources were collected, several of which 1 originated from the maize part of the National Data Sharing Center for Agricultural Sciences and the 2018 2 AI Challenger Crop Disease Detection Competition [19]. The image dataset of maize leaf pests and 3 diseases included maize rust, maize grey spots, maize Northern blight, and healthy maize 4 samples. Simultaneously, under natural light, the vertical distance between the lens and the maize leaves 5 varied from 5cm-8cm, and the image data obtained from different angles and directions were classified by 6 professionals. 7

Fig.2 Partial images of maize leaf diseases and insect pests 11
As shown in Table 1, the distribution of the obtained image data of maize leaf diseases and insect 12 pests was considerably uneven. In image recognition, sample heterogeneity will seriously affect the 13 accuracy of model recognition [20]. As such, conducting image denoising on maize leaf image data [28] 14 to separate and eliminate duplicate and useless data, enhancing and expanding maize leaf image data, and 15 performing horizontal, vertical, random and reverse folding on sample images are all necessary. At the 16 same time, the intensity of random dislocation transformation was 0.2, and the amplitude of random 17 image scaling was set to 0.2, and then batch normalization (BN) [21] was performed. In addition to 18 increasing the convergence rate of the network, normalization can also effectively alleviate the problem 19 of gradient disappearance. The displacement line of the corresponding point in the transformation was 10% 20 of the image side length, so as to avoid image distortion. Using said method, the sample data size was 21 increased by two-fold, and the image data ratio obtained is shown in Table 2. 22 Among the datasets, the training set accounted for 80%, and the verification set accounted for 20%, 2 and the distribution proportion was as shown in Table 3. 3 Through the maize leaf image preprocessing operation, the data set had the following advantages: 4 1) The information of feature extraction of maize leaf images was enhanced. 5 2) The maize leaf data set was expanded as far as possible to provide sufficient training samples for the 6 network. 7 3) Irrelevant information in maize leaf images should be eliminated to the maximum extent to reduce 8 the impact of unnecessary information on the network. 9

MSRCR maize leaf image enhancement 11
In image enhancement, maize leaf disease and insect pests images have the characteristics of loud 12 noise, fuzzy image, similar disease type characteristics and other characteristics. The traditional Retinex 13 algorithm [29] relies on the incident image and reflected image to constitute: 14 Where, I (x, y) represents the received image signal of maize leaves, L (x, y) represents the light 16 irradiation component in the external environment, and R (x, y) represents the reflection component of 17 the target maize leaves. The formula can be obtained by using the same logarithm operation: 18 From the formula, I(x, y) and a Gaussian filter can be used to obtain the result, which is expressed Significant image polarization will occur owing to the images processed by Retinex being highly 8 dynamic, and the data distribution being considerably wide, such that certain characteristic information in 9 maize leaf images will be lost. For certain original image HUE graphs, if the traditional algorithm is 10 used, color bias can more easily occur in the processed maize leaf images. Thus, the output data was 11 processed by Retinex first, and then the data were mapped to each channel according to the original RGB 12 proportion. On the basis of retaining the original color distribution, the maize leaf images were enhanced. 13 Through the aforementioned method processing, compared with the traditional Retinex algorithm, 17 the MSRCR maize leaf image enhancement algorithm images were brighter. Further, due to the 18 multi-scale characteristic, not only were the plant diseases and insect pests of maize leaf images clearer, 19 the color was brighter. As the MSRCR parameters did not need to be too complicated, the efficiency was 20 improved. At the same time, the quality was improved to a certain extent, effectively removing the noise 21 in the images of maize leaf diseases and insect pests, reducing the ambiguity of maize leaf images, and 22 retaining the necessary details in the images. The enhanced effect of maize leaf image is shown in Fig. 3. having a large amount of noise. To achieve the effect of rapid recognition and high precision, ResNet 6 was introduced as the neural network framework, so as to improve the learning effect of neural network, 7 reduce the complexity, and improve the ability of feature extraction. Octave convolution and 8 self-calibration and combination were also introduced. The PRelu activation function was selected and 9 conFig.d by OSCRNet [31]. Additionally, batch normalization was introduced, which can effectively 10 improve the robustness and identification accuracy of the network model, improve the convergence rate 11 and prevent overfitting. The OSCRNet maize leaf image recognition network structure consists of an 12 octave convolution module, four convolution layers, a self-calibrated convolution module and three fully 13 connected output layers. In the present study, the octave convolution in the first layer is relied on, so as to 14 obtain more characteristic information, reduce redundant spatial information, reduce the computational 15 overhead of the network, and accelerate the computational efficiency of the network. Self-calibrated convolution is used to improve the attention and generalization ability of neural network, adaptively 1 consider the information of the surrounding environment, improve the accuracy of network, and accelerate 2 the efficiency of network calculation. In the last convolution layer, the extracted features are all merged 3 and the classification function is performed by the three fully connected layers. Fig.4 presents the 4 architecture of the OSCRNet maize leaf image recognition network. Batch normalization is used after each 5 convolution layer to improve the robustness and identification accuracy of the model and avoid 6 overfitting. The octave convolution module is the first convolution layer in the network, which can obtain 7 more characteristic information and reduce redundant information. To improve the accuracy and 8 computational efficiency of the network, the multi-scale convolution module is located before the last 9 convolution layer. The model definition for OSCRNet is shown in Fig.4.  2) The second layer is composed of convolution module 1. Conv1 has a core size of 7x7 and a depth of 16 64. The ability of image feature extraction is enhanced by using octave convolution, and the network 17 convergence speed is accelerated by using APRelu operation and then BN layer processing. 18 3) The third layer is composed of two convolution modules. Conv2 has a core size of 3x3 and a depth 19 of 64. With the APRelu operation, BN layer processing is performed after maximum pooling. 20 4) The fourth layer is composed of three convolution modules. Conv3 has a core size of 3x3 and a 21 depth of 128. With the APRelu operation, BN layer processing is performed after maximum pooling. 22 5) The fifth layer is composed of four convolution modules. Conv4 has a core size of 3x3 and a depth 23 of 256. BN layer processing is performed after the APRelu operation. 1 6) The sixth layer is composed of the self-calibrated convolution module. First, the results from the 2 previous level are split into X1 and X2. The second step is to process the self-calibrated scale space. 3 The X1 is sampled 4 times by average pooling, and then upsampling is conducted after convolution. 4 At the same time, the original value X1 is added, and then the sigmoid activation function is used to 5 multiply the features of Conv5_2 to obtain the output feature Y1. The third step is to process the 6 original scale feature space. After Conv_3 convolution of feature X2, feature Y2 is obtained. In the 7 fourth step, the output features Y1 and Y2 of the two scale spaces are concatenated to obtain the 8 final output feature Y. 9 7) The seventh layer is composed of five convolution modules. Conv6 has a core size of 3x3 and a 10 depth of 512. BN layer processing is performed after the APRelu operation. H(x)-x, where F(x) is the network map before the summation, and H(x) is the network map after 5 summation, because residual learning is easier than the original features. When the residual approaches 0, 6 to ensure that the network performance will not decline, only identity mapping is performed for the 7 accumulation layer. At the same time, the accumulation layer will learn new features based on the input 8 features, so as to achieve better performance. The specific network structure is shown in Fig.6. 9 The deep residual network can indeed stabilize the network performance while increasing the 12 network depth, and improve the accuracy of the model to a certain extent. However, a large number of 13 training models and residual structures increase the difficulty and training time of the training. Compared 14 with other models, more training time is consumed, and the deep network structure also requires a larger 15 data set. In complex natural environments, a large number of plant diseases and insect pests of maize leaf 16 of the original images are difficult to obtain. A new OSCRNet maize leaf recognition network model was 17 designed in the present study, which can effectively overcome the problems of small sample size, maize 18 leaf image characteristics being too complex and leading to learning difficulties, and even the inability to 19 learn. 20 21

Octave convolution 22
Due to the reflected light in the external environment, sand and other influencing factors, 23 interference with the photographing of maize leaf images can more easily occur, resulting in more 24 redundant spatial information of maize leaf pests and diseases, thereby significantly reducing the 25 efficiency of network training. In order to solve the aforementioned problems, octave convolution with the characteristics of accelerated convolution operation was adopted in the present study. Octave 1 originally refers to the octave scale, and refers to halving the sound frequency in music. The aim of 2 octave convolution is to halve the low-frequency information in the data, so as to accelerate the 3 convolution operation. For ordinary convolution, all input and output feature maps have the same spatial 4 resolution, but in natural images, information is transmitted at different frequencies. As shown in Fig. 7, 5 the higher frequencies are usually encoded with fine details, while lower frequencies usually use global 6 structure coding. The output feature map of convolution can also be seen as mixed information of 7 different frequencies. Natural images can be decomposed into low-frequency signals that capture the 8 global layout and coarse structure and capture fine details. Similarly, the features of the convolution output should also have a subset of mappings that capture 12 spatial low-frequency variations and contain spatial redundancy information. To reduce such spatial 13 redundancy, octave feature representation was introduced. Scale space theory [26] provides a principled 14 approach to creating spatial resolution scale spaces in such a way that low frequency and high frequency 15 spaces can be defined, that is, the spatial resolution of the low frequency feature map is reduced by one 16 octave. In octave convolution, the mixed feature map is decomposed according to the frequency thereof, 17 and the octave convolution operation is used to store and process the feature map with low spatial 18 resolution and slow spatial change, thereby reducing the memory and computing cost. Different from the 19 existing multi-scale methods, octave convolution is represented as a single, universal convolution unit, 20 and the specific network structure is shown in Fig. 8. High frequency : In the present study, octave convolution was adopted to expand the receptive field by two times, 5 further facilitating the capture of more information by each layer. By reducing unnecessary and long 6 spatial information in maize leaf images, the problem of more redundant spatial information in maize leaf 7 images of diseases and insect pests was solved, and the learning efficiency of the network was 8 significantly improved. 9 The use of octave convolution can provide the network with the following advantages: 10 1) The spatial resolution of the low-frequency feature map of maize leaf can be reduced to half of the 11 original, the redundant spatial information in the maize leaf images can be reduced, and the network 12 computing overhead can be reduced, so as to accelerate the network computing efficiency and solve 13 the problem of slow network learning efficiency. 14 2) Compression of low frequency resolution of maize leaves can effectively expand the receptive field 15 by two times, further facilitating the capture of more information by each layer. 16 17

Self-Calibrated Convolutions 18
Owing to the increase of maize resistance and differentiation of disease characteristics, the 19 characteristics of maize diseases are similar, increasing the complexity of identification, and leading to 20 the network being prone to learning disorders. Under such circumstances, deeper and more detailed 21 communication of feature information is required in the network. A self-calibrated convolution with 22 multi-scale characteristics was adopted in the present study to solve the aforementioned problems. At 23 present, the latest progress of deep learning mainly focuses on the design of more complex network 24 structure to enhance the feature learning and expression ability thereof. By expanding the receptive field of each convolution layer, self-calibrated convolution can enrich the output function. Meanwhile, 1 self-calibrated convolution is different from standard convolution, which uses small convolution kernel to 2 fuse spatial and channel direction information. Self-calibrated convolution can adaptively establish 3 long-distance spatial and inter-channel dependence calibration operations around each spatial position, 4 and thus, can help CNN generate more identifying feature information by explicitly merging richer 5 information. The essence of self-calibrated convolution is a grouping convolution used for multi-scale [17] 6 feature extraction, which is divided into two groups according to channel dimension. To increase the 7 receptive field of the network, one path is used for conventional convolution feature extraction, and the 8 other path is used for down-sampling operation. The result is that each spatial position can be 9 self-calibrated by fusing information from two different spatial scales. 10 11

Fig.9. Principle diagram of Self-Calibrated convolution 12
In order to facilitate the network to learn better different characteristics, improve the attention and 13 generalization ability of neural network. Self-calibrated convolution separates each channel of the 14 convolution, but instead of each part of the channel being treated equally, each part of the channel is 15 responsible for a different function. The self-calibrated convolutional neural network performs feature 16 transformation at two different scales: one is the characteristics of the proportion of the original space 17 mapping, and the other is to use the sample of potential space mapping. As such, the design of the 18 structure is more conducive to improve the attention of the network, and help the network focus on the 19 differentiated characteristics. The specific network structure is shown in Fig.9. 20 First, given input X1, the average pooling of filter size r×r and stride R is adopted: 21 Secondly, K2 is used for T1 to conduct feature transformation: 23 There is a bilinear interpolation operator that maps the intermediate reference from the small scale 25 space to the original feature space: 26 Where F3(X1)= X1*K3, σ is the sigmoid function, and "·" means multiplying element by element. 28 As the equation shows, using X_1^' as the residual to form the weight for calibration is beneficial. The 29 final output after calibration can be written as: 30 The self-calibrated convolution is a multi-scale feature extraction module, which can separate each 2 channel of convolution, but is different from traditional convolution in that each channel is responsible 3 for a special function. As aforementioned, the self-calibrated convolutional neural network performs 4 feature transformation at two different scales: one is the characteristics of the proportion of the original 5 space mapping, the other is to use the sample of potential space mapping. The interference of unrelated 6 areas throughout the global information is avoided, and the characteristic representation of maize leaves 7 generated by self-calibrated convolution is more discernable. The self-calibrated convolution module 8 allows each spatial location to not only adaptively view the surrounding information environment as an 9 embedment from the low-resolution potential space as an input in the response from the original scale 10 space, but also model the interchannel dependencies by adopting the correction operation. Through the 11 aforementioned methods, the receptive field with self-calibrated convolution layers can be effectively 12 expanded, which is beneficial in improving the network attention and the differentiation of network 13 attention, solving the problem of network learning disorder, and significantly improving the accuracy of 14 the network. 15 The specific advantages of self-calibrated convolution include: 16 1) The self-calibrated network can locate multi-scale information in maize leaf images.

Attention mechanism 25
The visual attention mechanism [30] is a brain signal processing mechanism unique to human 26 vision. Human vision quickly scans the global image to obtain the target region that needs to be focused 27 on, also known as the focus of attention, and then invests more attention resources to said region to obtain 28 more details of the target that needs to be paid attention to, while suppressing other useless 29 information. According to the specific task goal, the attention mechanism can adjust the attention 30 direction and the weighted model. In the neural network, the weight of the attention mechanism can be 31 increased to weaken or forget the content that does not conform to the attention model. Limited attention 32 resources can be used to quickly select high-value information from a large amount of information, which 33 significantly improves the efficiency and accuracy of information processing in deep learning. 34

Adaptive parameterized APRelu activation function 1
In the process of neural network learning, the activation function is considerably important, and can 2 help neural networks to learn and understand. The activation function often used is the Relu activation 3 function [32], also known as the modified linear unit. The convergence speed of Relu is faster than that of 4 Sigmoid and TANh. At the same time, the Relu gradient will not be saturated, thereby solving the 5 problem of gradient disappearance. However, the activation function of Relu is considerably fragile 6 during training. Due to sunlight, dust and other interference in the environment, frequent use will slow 7 down the gradient change in the saturated area and cause the gradient to disappear. When X <0 of Relu, 8 the gradient will be 0, leading to the gradient being set to zero and the occurrence of neuron "necrosis" at 9 the same time. There is also the possibility that 40 percent of the neurons in the network will die, and be 10 unable to be reactivated. 11 To solve the aforementioned problems, the PRelu activation function [31], also known as Parametric In the present study, the adaptive parameterized APRelu activation function was improved by 19 combining the attention mechanism with the PRelu activation function. The APRelu activation function 20 uses the concept of SENet for reference. For most maize leaf feature information, the importance of each 21 feature channel in the maize leaf feature map is likely to be different. As an example, feature channel 1 of 22 sample A is considerably important, while feature channel 2 is not important. However, feature channel 2 23 is important for sample B, while feature channel 1 is not. At this time, for sample A, the attention of the 24 neural network should be focused on characteristic channel 1 and higher weight should be given to 25 channel 1. For sample B, the focus should be on feature channel 2 and higher weight should be given to 26 channel 2. SENet, on the other hand, can use a small, fully connected network to learn a set of weight 27 coefficients and then weight each channel, which uses an attentional mechanism. The structure of SENet 28 is shown in Fig.10. The APRelu activation function can be weighted through a small fully connected network, and the 32 set of weights can be used as coefficients in the PRelu activation function. When the neural network uses 1 the APRelu activation function, each sample can have unique weight coefficients, which is a different 2 nonlinear variation, thereby enabling the neural network to learn more feature information in the maize 3 leaf images, and to learn more focused feature information of different importance. The activation 4 function of APRelu solves the problem of neuron "necrosis" in Relu and reduces the instability in the 5 process of network training. Different weight coefficients are also assigned to different channels, so that 6 the neural network can learn more abundant characteristic information. 7 8

Batch normalization 9
The images of maize leaf diseases and insect pests are considerably complex and changeable, 10 leading to low learning efficiency of the neural network and increasing the difficult of learning for certain 11 neural networks. Meanwhile, with the deepening of the network structure, the data distribution of the 12 hidden layer also changes a lot or even fluctuates, which will adversely affect the stability of the 13 network. As such, the BN algorithm was adopted in the present study to normalize the data of each layer 14 into an average value of 0 and a standard deviation of 1 15

Laboratory environment 23
To better train and test the performance of our model, Windows10 (64-bit) operating system, Pytorch 24 deep learning framework, programming environment Python3.6, lab platform memory 16GB, and Intel 25 Core I7-9700K CPU equipped with NVDIA GeForce GTX1660Ti 6GB GPU were used. 26

Experimental data and analysis 28
In the present study, an extended dataset consisting of four classes was used, totaling 13,078 29 images. The training set accounted for 80%, and the verification set accounted for 20%. The most 30 appropriate network configuration and parameters before training had to be conFig.d, so as to accelerate 31 the speed of network training and improve the identification accuracy. As shown in Fig.11: 3 1) Image recognition accuracy gradually increased with the iteration number of the ResNet34 training 4 process. 5 2) The image recognition accuracy increased gradually with the iterations of the OSCRNet training 6 process. 7 An observation can be made that, compared with ResNet34, OSCRNet network could extract more 8 feature information in the data set under the training process of the same number of iterations, such that 9 the recognition accuracy was significantly improved. Additionally, the ResNet34 model needs more 10 iterations to achieve the ideal model accuracy. The OSCRNet model can achieve stable identification 11 accuracy quickly and with less fluctuation. 12 13

Ablation experiment 14
In order to more clearly observe the influence of the improvement in the present study on the neural 15 network, ablation experiments were conducted for comparison. 16 As shown in Table 4, an observation can be made that with the improvement of the octave 18 convolution module, the network learning speed was significantly improved by reducing a large amount 19 of verbose spatial information in the data set. There was also improvement in the APRelu module, which 20 could be attributed to the attention mechanism of the weight calculation, resulting in an increase in the 21 amount of network calculation. Further, the networking speed was better improved by OSCRNet. 22 As shown in Table 5, under the improvement of the octave convolution module, the accuracy of the 1 network decreased, which can be attributed to the octave convolution module excluding the redundant 2 spatial information and a small part of the useful feature information when processing the feature 3 information. Such circumstances led to an improved network with only the octave convolution module. 4 Although the receptive field was increased, the loss of part of the feature information in the maize leaf 5 images would still lead to a decrease in accuracy. The accuracy of maize leaf image recognition was 6 improved by the self-calibrated convolution module and the APRelu module, while there was steady 7 improvement with OSCRNet. Through the aforementioned experiments, an observation can be made that 8 although the model was improved under the separate octave convolution module, self-calibration 9 convolution module, and APRelu module, there were still defects. At the same time, in the absence of 10 octave convolution operation, in which only the self-calibration convolution module was used, the recall 11 rate would be reduced in several cases. Such reduction could be attributed to, without the screening 12 information of octave convolution, the self-calibration convolution module being more likely to learn 13 irrelevant or even wrong information from the maize leaf images, which can lead to disordered learning 14 of maize leaf image feature information and a reduced recognition rate. Under the combined effect of 15 octave convolution, self-calibration convolution and APRelu, the learning efficiency and the accuracy of 16 the network were improved, in addition to ensuring the robustness of the network to a certain extent. 17 18

Comparison with other networks 19
To enhance the comparability of the present experiment, the same data set was used to train different 20 networks under the same conditions: 21 As shown in Table 6, OSCRNet could achieve higher identification accuracy than the other 23 networks. Such high accuracy could be attributed to the self-calibrated convolution module, in which 24 feature transformation was conducted at two different scales. Such feature transformation avoided the 25 interference of irrelevant regions in the global information, made the generated feature representation 26 more discernable, and improved the attention of the network, thereby enabling the network to learn more 27 critical feature information. Meanwhile, in order to increase the number of training samples, horizontal 1 folding, vertical folding, random folding and reverse folding were performed on the sample images. The 2 intensity of random dislocation transformation was 0.2, and the amplitude of random image scaling was 3 set to 0.2, and then normalized processing was conducted. A variety of color features was provided to 4 increase image diversification, reduce complexity, and avoid network overfitting. 5 As shown in Table 7, compared with other network models, OSCRNet had the highest convergence 7 rate. Regarding ResNet34, octave convolution was introduced to reduce the redundant spatial information 8 in the maize leaf images, such that the convergence rate was also improved. The octave convolution 9 module could reduce the spatial resolution of the low-frequency maize leaf feature map to half of the 10 original one while expanding the receptive field by two times, reduce the redundant spatial information in 11 the maize leaf images, and significantly reduce the computing overhead of the network. 12 As shown in Table 10, OSCRNet had the highest training efficiency compared with other improved 4 network models under the same conditions. When octave convolution was introduced into OSCRNet, 5 redundant spatial information in maize leaf images was reduced, and the convergence rate was also 6 improved. In the octave convolution module and the self-calibration module, the sensing field was 7 expanded by two times, and the spatial resolution of the low-frequency feature map of maize leaf was 8 reduced to half of the original, which reduced the redundant spatial information in the maize leaf images, 9 significantly reduced the network computing overhead, and improved the network computing speed. At 10 the same time, OSCRNet had the highest training accuracy compared with other improved models under 11 the same conditions, reaching 93.53% in the enhanced data set. Compared with other models, the 12 robustness of the model was impromptu, rendering the learning process more reliable and significantly 13 avoiding the phenomenon of network learning disorder. 14 15

16
We have made oscrnet for maize image recognition, which covers a variety of disease features of 17 maize leaves. Through the comparison and analysis of multiple groups of experiments, the availability of 18 the oscrnet model proposed in this paper for maize image recognition is effectively verified. Many plant 19 images are used in the field of deep learning. However, maize image recognition technology is not perfect, 20 and the improvement of efficiency and recognition rate is slow. Therefore, our work is conducive to 21 promote the application of depth learning based method in plant protection. Analyzing the error 22 information learned from the model is a very useful insight to improve the network performance proposed 23 by us. In the design of oscrnet model, we can classify and avoid the problem of similar plant disease 24 features in plant images. However, in our data set, such problems will still affect the model learning, 25 which is a very serious problem in the field of plant recognition. The similar characteristics of maize 26 diseases are shown in Fig.12. Although oscrnet uses the information interaction of self calibrated 27 revolutions to solve this problem and has been greatly improved, it still can not completely avoid it. 28 Therefore, plant recognition methods still need to be improved. In order to solve such difficulties, we will consider introducing attention mechanism to improve and 3 alleviate this problem in our future work. At the same time, expand the number of data sets as much as 4 possible to ensure that the data sets are more balanced and perfect. OSCRNet networks were proposed to identify maize leaf diseases and insect pests. More feature 12 information was provided through the maize leaf image enhancement processing of MSRCR, which 13 solved the problem of maize leaf disease and insect pests image feature extraction in a more complex 14 environment. In OSCRNet, the octave convolution and self-calibrated convolution was combined to solve 15 the problems of slow learning efficiency, low accuracy and easy learning disorder in the image 16 recognition of maize leaf diseases and insect pests. The activation function of APRelu was improved to 17 avoid the phenomenon of neuron "necrosis". Compared with the data of other network models, the 18 present results show that OSCRNet can achieve better performance in the image recognition of maize leaf 19 diseases and insect pests. At the same time, the algorithm of OSCRNet is not only limited to the field of 20 maize leaf diseases and insect pests, but can also detect and identify other crop diseases. The OSCRNet 21 algorithm is of significant value to crop yield and crop disease and insect pest control, and opens up new 22 possibilities for the prevention and management of crop diseases. 23