Fast Recognition System for Tree Images based on Caffe Platform and Deep Learning

Aiming at the difficult problem of complex extraction for tree image in the existing complex background, we took tree species as the research object and proposed a fast recognition system solution for tree image based on Caffe platform and deep learning. In the research of deep learning algorithm based on Caffe framework, the improved Dual-Task CNN model (DCNN) is applied to train the image extractor and classifier to accomplish the dual tasks of image cleaning and tree classification. In addition, when compared with the traditional classification methods represented by Support Vector Machine (SVM) and Single-Task CNN model, Dual-Task CNN model demonstrates its superiority in classification performance. Then, in order for further improvement to the recognition accuracy for similar species, Gabor kernel was introduced to extract the features of frequency domain for images in different scales and directions, so as to enhance the texture features of leaf images and improve the recognition effect. The improved model was tested on the data sets of similar species. As demonstrated by the results, the improved deep Gabor convolutional neural network (GCNN) is advantageous in tree recognition and similar tree classification when compared with the Dual-Task CNN classification method. Finally, the recognition results of trees can be displayed on the application graphical interface as well. In the application graphical interface designed based on Ubantu system, it is capable to perform such functions as quick reading of and search for picture files, snapshot, one-key recognition, one-key exit and so on.


I. BACKGROUND
With more and more serious environmental pollution, trees are widely used in urban greening, and play a significant role in improving and protecting the environment. After careful selection and ingenious arrangement, all kinds of trees can play a major role in protecting environment, improving environment, beautifying environment and generating economic by-products [1][2] . On the contrary, if the tree greening construction is not rationally planned, problems such as single variety, neglect of economic benefits, and blind construction will continue to be evident, resulting in the lack of sustainability of forestry development, the stagnant state of forestry economy, and the lack of long-term of forestry construction. These even leads to some unpredictable consequences [3][4] . The analysis of tree species is an important theoretical basis for the development of forestry planning, providing a reference for landscape engineering design, management personnel and related researchers [1][2] . From the creation of botany founded by Theophrastus to modern and enduring tree classification system, people have been improving the tree classification system based on the original tree classification system to provide better application for garden trees and guidance for forestry development. However, due to the current similarity among trees and the diversity of surrounding landscapes, the identification of tree species in forestry development has become a complex and professionally demanding work. The cycle for training forestry professional is very long. It is difficult to attract high-end technical personnel. The traditional manual investigation methods is not sufficient to meet the requirements of modern forestry development because of its low efficiency, time consuming, and a fact that it is easy for workers to become fatigue. All these will affect the correctness of the results [5][6][7] . Therefore, how to efficiently and quickly classify tree images has become an urgent problem to be solved.
Traditional machine learning often requires timeconsuming and complex processing of tree image features, and it still requires a large amount of professional knowledge to manually design feature extraction methods and classifiers, and then identify and predict. When some features are not extracted during the extraction of artificial features, the useful information missed will not be recovered from the classification training, which will affect the detection results. While shallow layer learning models such as SVM, Boosting, and LR are superior to rule-based learning, their ability to express complex functions and generalization capability is not strong due to the finite samples and limited computational units. Shallow layer manual neural networks cannot identify tree species quickly and efficiently, because there are a large number of parameters, and there are defects such as over-fitting and long training time [8] . CNN is a deep neural network with convolution structure [9][10][11] , which reduces the number of weights that need to be trained through weight sharing and reduces the computational complexity of the network. Moreover, through the pooling operation, the network has a certain invariance to the local transformation of the input image, which improves the generalization ability of the network. The convolutional neural network can process complex big data through multi-layer nonlinear transformation, and replace the manual design features with automatic learning features, avoiding the error accumulation of manual design features. Through feature extraction and classifier joint optimization, the accuracy and recognition speed of tree classification are improved. In recent years, more and more researchers have begun to work on how to identify trees quickly and effectively. In order to better improve the convolutional neural network to make it more effective in tree classification applications, researchers continue to improve existing networks from various aspects such as network layer design, loss function design, activation function, and regular terms. Aivars Lorencs et al., based on multiple clustering attempts to design a multi-class classifier with better classification results through the collected multispectral image data near the tree top, but only achieved the goal of semi-automatic identification of species [12] . Du Jianxin et al., used a new mobile center hypersphere classifier for classification based on the extraction of plant leaf shape features [13] . Although the classification performance is improved compared with the two classifiers of 1-NN and K-NN, it is still greatly affected by the feature extraction process, and the classification performance needs to be improved. Hou Tong et al., used BP neural network combined with plant leaf feature shape to classify plants, and the average recognition rate reached 87.5% [14] . However, manual neural networks require massive parameters, and have a long learning time. Stephen Gangwu et al., used the probabilistic neural network to automatically identify the leaves [15] . Compared with the BP network, the classification speed is greatly improved, the process is simple and the noise tolerance is high, but this method has high requirements for the representativeness of the training samples and requires more storage space. Liu Jiandu et al., transformed the blade image into time-frequency domain image by wavelet transform, and then uses SVM classifier to classify the blade, which has better classification performance [16][17] . Wang-Su Jeon et al., used the existing Flavia dataset to train traditional CNN model to identify plant leaves based on the TensorFlow framework. Compared with traditional machine learning, the CNN model has better classification performance in plant identification, but the tree species in the data set is limited and it also need to manually expand the data set for further testing [18] . Sue Han Lee et al. used the CNN network to learn unsupervised feature representations for 44 different plant species. Experimental results using CNN models with different classifiers show consistency and superiority compared to existing technology solutions that rely on handcrafted features, but in the process of image preprocessing, the traditional manual methods to filter pictures are too inefficient [19] .
Based on the above references, this paper designs a rapid recognition system for tree image based on Caffe platform and deep learning, which can realize two tasks of image cleaning and tree classification, and the CNN model is the natural choice of multi-task problem because the learned convolution features can be shared by different advanced tasks [20] . The improved multi-task Auto-Clean convolutional neural network model trains multi-label data sets with attribute values as numerical values, and obtains the joint features of the two task models [21][22][23] . The extracted features are more discriminative, and can realize two tasks of image cleaning and tree classification, and the dual task model is less time-consuming and has better classification accuracy than the single task model. Firstly, we input the tree image into the neural network to determine whether the input image is a dirty one. If it is a clean image, the system starts to predict the tree category. Secondly, The trainable feature extractor and the trainable classifier in the CNN model minimize the loss function combined with the Auto and Clean tasks by forward propagation and backpropagation error algorithms, so that the extracted features can better reflect the essence of the image and the recognition model is better. At the same time, the loss function is used to drive the model to use the small batch random gradient descent method for self-weighted learning, so as to further adjust the task weight to obtain a pre-trained recognition model. Finally, we optimize the classifier through super parameter tuning to obtain a better recognition model. The experimental results show that the Dual-Task CNN model can identify the tree species accurately and in real time after simplified training. In addition, considering the influence of tree similarity on the recognition effect, we also introduce the Gabor layer on the improved Dual-Task CNN model, and propose a new re-improved Deep Gabor CNN model that can enhance the learning of local features of the blade to achieve a more effective distinction between different tree species [24][25][26][27] . Finally, based on the Ubantu system, this paper also designs the application graphical interface to realize the functions of fast reading and searching of picture files, quick photo taking, one-key recognition and one-button exit, which is convenient and intuitive.

II. DATA SET CONSTRUCTION AND PREPROCESSING
We took photos for trees and collected all kinds of image data from the six views of the trees to form an initial database, and then generated six models to identify the trees by pretraining and fine-tuning of the CaffeNet convolutional neural network. By comparing the classification performance of the recognition models obtained by training the six views, we select the tree blade image with higher classification accuracy to reconstruct the data set. Then, the data set is expanded by rotating, translation, zooming and horizontal flipping, Finally, the data is cleaned by script classification and manual screening to form an experimental tree image data set.

1) INITIAL DATABASE CONSTRUCTION
At a botanical garden, we collected hundreds of thousands of trees in six different views in different environments and shapes for training and test sets during database construction. These six views include the entire plant, branches, flowers, fruit, leaves, and stem parts [28] .
Several pictures of the collected trees are shown in Figure  2. below.

FIGURE 2. Example Image of 6 Views of Trees in Training Set
The specific distribution of the training set and the test set in the acquired image is as follows.  View  Training  Testing   Branch  8588  2330  Entire  15233  5308  Stem  5468  500  Flower  13458  2500  Fruit  6588  1265  Leaf  19456  5544  Total  68791  17447 2) DATA SAMPLE SCREENING We firstly pre-trains a model on the basis of CaffeNet convolutional neural network, and then uses the image finetuning strategy to transmit the learned identification information to get a better learning model, and realize the weight sharing of the two models of the two tasks. The formed joint features are further iteratively optimized by the BP algorithm, and finally an Auto-Clean CNN model achieving the dual tasks of experimental image cleaning and tree classification is designed. In the training set of the six tree views, we randomly selected 5000 images for the microtraining of the improved Dual-Task CNN network model, and decided the best view according to the classification performance of each model, and then use the best view as the data set for later model training optimization [29] .
The classification accuracy of the six views of the tree varies with the number of iterations is shown in the Figure.3 below. It can be seen from Fig. 3(A) to Fig. 3(F) that after 5000 iterations, the recognition accuracy rates of the Fruit, Entire and Branch views are not more than 70%. The universality of the three views for the identification of tree species is generally not high, and it is difficult to identify in different scenarios accurately and effectively. In the six possible views of plants, bark and stems are important organs for trees so they are not suitable for identification. The use of flowers and fruits image does not have the universal significance of research for trees without flowers and fruits and cannot be the basis for the identification of trees. The leaves are the longest-lived organs in the various organs of the tree. The extraction process does not affect the normal growth of the trees, and the scanned images of the flowers and leaves have higher classification accuracy in 6 views, so we screened out all the tree leaf image sets for the latter training optimization.

3) DATA SET EXTENSION
When training deep convolution neural networks, the accuracy of the network can be improved and over-fitting can be prevented by using a large amount of training data. However, in the actual operation process, it is often difficult to obtain enough training samples, but the artificial expansion of the data can achieve similar effects, and can enhance the generalization ability and robustness of the training model. The data set used includes a total of 50 classes, and finally obtains 100,000 images by means of rotation, zooming and horizontal flipping [30][31] . In this experiment, the data set is divided into two categories, 70% is the training set, and 30% the test set. The data is preprocessed by manual screening and script classification to obtain the tree image data set for training experiments, and finally 100,000 images of trees for training experiments were obtained.

1) SCRIPT CLASSIFICATION
Before using the convolutional neural network to classify the dataset, it is necessary to classify the messy tree image data in the database and store it in the folder corresponding to the tree type. We use scripts to implement the function of classifying trees into corresponding folders to facilitate the training, verification and testing of the later models. English, please get a native English-speaking colleague to carefully proofread your paper.

2) MANUAL SCREENING
A good data set is an important prerequisite for training a good model. Therefore, before inputting a tree image to a training model, it is necessary to clean the data set to ensure the authenticity and robustness of the training as much as possible. When there are fewer pictures in the current period, we mainly use manual cleaning to remove unqualified dirty pictures visible to human eyes, and whether human eyes can identify dirty pictures as the overall screening standard. The specific cleaning rules are as follows.

A. DUAL-TASK CONVOLUTIONAL NEURAL NETWORK BASED ON CAFFE PLATFORM
Caffe has advanced deep learning algorithms and reference models to provide a good starting point for new research and applications, making a clean and modifiable framework available for many practitioners. The classification model trained by the Caffe team using magenet images is already very good and worth learning. When using our own data set to train, if the sample size is too small, the model has too low precision and takes a long time. However, using the parameters in the model that has been trained in Caffe as our initialization parameters reduces the requirements of training accuracy of the model on the data set, and improves the classification performance of the same data set training. Therefore, the Auto-Clean CNN model of this paper is based on CaffeNet's network framework and then uses our own dataset to train the model's feature extractor and classifier, and then the initialization parameters are continuously updated and optimized. After the entire fine-tuning process, we get our optimization model called Auto-Clean CNN [32][33] .

B. DESIGN OF DUAL-TASK CONVOLUTIONAL NEURAL NETWORK MODEL
The Auto-Clean convolutional neural network we designed based on the famous architecture proposed by Krizhevsky in 2012 mainly includes five convolutional layers, two connection layers, and a softmax loss layer，and among the layers also exist pooling, regularization, RELU activation functions and other operations. However, the network structure of the two models in Dual-Task CNN, which implements image cleansing and tree multi-class recognition, is not identical. The first fully connected layer of each CNN model is represented as a feature vector of the input image, and the formed joint features are then input into two different specific softmax loss layers. These classification tasks collectively use the feature vectors output by the convolutional network and jointly update the network parameters during training. The structure of the improved Dual-Task CNN model is shown in Figure 5 below. As can be seen from Figure 5, the input image with the attribute tag information (left side) is entered into the model, and each CNN will predict a binary attribute. First, the two CNN models in the Dual-Task CNN will form a joint feature between the last convolutional layer and the first fully connected layer through forward propagation after inputting the original image. Then, the formed joint features are input to the specific softmax loss layer and the multi-class softmax loss layer to complete the two tasks of tree classification and image cleaning. After learning the hidden layer parameters to achieve hard sharing and then iteratively optimizing the network through the back propagation algorithm, the risk of over-fitting is reduced, and the generalization ability of the model is also improved [34] . For the Clean CNN network model, the training task is simpler than tree multi-class recognition, and image cleaning is a pre-processing of multilevel prediction. The last layer of the fully-connected layer can reduce a large number of errors in the training process and improve the generalization performance, but the fullyconnected layer contains a large number of redundant parameters, only the fully-connected layer parameters can account for about 80% of the entire network parameters. This is not conducive to the rapid cleaning of the image, so in the clean CNN model we remove the last fully connected layer and directly connect the first fully connected layer to the softmax layer. Although the performance of the model is slightly reduced, the cleaning speed will be greatly improved.

C. CROSS ENTROPY LOSS FUNCTION
We use the learning method of forward propagation and backpropagation in convolutional neural networks. Firstly, we calculate the cross entropy loss function of Auto task and Clean task, and then minimize the loss function of combining the two tasks and update the weight through Auto-Clean convolution model to obtain a better identification model [35][36] .
The formula of the Cross Entropy Loss Function in the feature extraction process is described as follows: We  The loss function of the clean task is calculated in the same way as the auto task.
For the classification of tree images, image clearing and the connection of two fully connected layers, we use whether the input image is a clean one. If the image entered cannot be classified accordingly, the model will output the image as a dirty one. If the image is a clean one, the model will output the predicted tree category.

D. ADAPTIVE WEIGHT LEARNING
In the MTL-based convolutional neural network, how to set the weight of two tasks is also an important part of the research. In our previous work, we either treated equally the tasks we encountered or obtained weights through brute force search, and searching all weight combinations was timeconsuming. We further train and optimize the model by automatically performing weight adjustments by using a mini-batch random gradient descent method in convolutional neural networks [37] .
When the mini-batch random gradient descent method is used to solve the above optimization problem, the weight adjustment is aggregated in the batch sample, and the final model yields the setting weights of the two tasks through adaptive learning.

1) PARAMETER CONFIGURATION OF TRAIN.PROTOTXT FILE
The parameter configuration file defines super-parameters such as the learning rate multiple of the weight parameter and the offset parameter, the convolution kernel size, the step size, the down sampling parameter, the full connection layer parameter and the classification accuracy rate, designs the object needed to be optimized by the model, train the network and test the network.

2) PARAMETER CONFIGURATION AND OPTIMIZATION OF SOLVER.PROTOTXT FILE
This parameter configuration file defines some super parameters that need to be set during the network model training process, such as the basic learning rate, the weight attenuation coefficient, the number of iterations, the momentum, and whether to use the GPU or the CPU. Since the loss function is not absolutely a convex function, it may be only a local optimum value found according to the CNN model described above, and optimization to it is also needed.
The file calculates the loss in the forward process and the gradient in the reverse process, and alternately calls the forward and backward algorithms to iteratively update the parameters. The process is as follows: Step 1: Call the train.prototxt configuration file, design the objects that need to be optimized, the training network for learning and the test network for evaluation; Step 2: Update parameters by forward and backward iterations; Step 3: Regularly evaluate the test network and set the test cycle； Step 4: display the state of the model and solver during the optimization process.

F. COMPARISON BETWEEN CNN MODEL AND EXISTING METHODS
By reviewing some data and previous experiments, we found that traditional learning methods such as SVM and BP neural network are not effective enough to identify complex tree images in comparison with CNN models. Even if there is a over-fitting in CNN models, CNN models' experimental results are better than these traditional learning algorithms. In the improved research of CNN model, the Dual-Task convolutional neural network we designed includes two model structures of image cleaning and tree recognition compared with the ordinary single-task CNN network, and it can extract more representative features based on the correlation between the two tasks. It not only improves the accuracy of tree identification and saves labor costs, but also reduces the burden on trainers and promotes the long-term development of tree classification.
Among them, the recognition accuracy of various tree classification methods is shown in Table 3 below.  Figure 6 shows the average classification accuracy of the Dual-Task CNN model and Single-Task CNN model. It can be seen from the graph that the Dual-Task CNN model has higher average classification accuracy than the Single-Task CNN model. Figure 7 shows the variation of the classification accuracy of 25 species under the Dual-Task and Single-Task models with the number of iterations. When the number of iterations is 80000, the classification accuracy of the Dual-Task CNN model is increased from 91.1% in the single task to 95.6%, which is a significant improvement.

3) COMPARISON OF CLASSIFICATION PERFORMANCE WITH OR WITHOUT DATA SET PREPROCESSING
The following two pictures show the comparison of the classification accuracies between the dual-task and singletask models without dataset pre-processing. It can be seen from the four pictures that high quality training data is necessary. Comparing the classification accuracy with or without pre-processing, the classification performance of the model recognition time and recognition accuracy after data pre-processing is better, and the classification accuracy of the single-task model after pre-processing is approximately equal to that of the dual-task model without pre-processing, and there is even a tendency to exceed.

1) INFLUENCE OF SPECIES SIMILARITY ON CLASSIFICATION PERFORMANCE OF CNN MODEL
From the Single-Task CNN to the Dual-Task CNN model, the accuracy of image recognition of 25 types of trees have been greatly improved. However, by analyzing the classification results of various tree species, it is found that the improvement of the classification effect of the Dual-Task CNN model compared with the ordinary CNN network is mainly concentrated on the leaf species with obvious characteristics, and for the identification of similar tree species, the effect of the Dual-Task CNN model is still unsatisfactory. The accuracy of the above tree species image recognition is shown in the table below. By analyzing the recognition accuracy of 25 categories and the similarity with other species in the dataset, we can also conclude that the classification accuracy of Dual-Task CNN compared to the Single-Task CNN model is mainly reflected in species with low similarity. The higher the species similarity, the lower the classification accuracy of the class. In the existing spatial information extraction technology, the Gabor filter can enhance the robustness of feature learning for different scales and directions, especially for images with spatial transformation, so in order to improve the classification effect of DTY model on similar species, a new deep Gabor Convolutional Neural Network (GCNN) is proposed. We introduce the Gabor filter to generate the Gabor feature as the input of the Dual-Task CNN, thus enhancing the extraction of the edge and texture features of the blade image, and achieving the purpose of improving the recognition rate of similar tree species. Moreover, the reimproved GCNN is easy to implement and can be compatible with the Caffe deep learning framework used in this experiment.

2) GABOR FILTERS
In image processing, the Gabor function is a linear filter for edge extraction, which is very suitable for the expression and separation of image texture. In the spatial domain, a twodimensional Gabor filter is a Gaussian kernel function modulated by a sinusoidal plane wave and consists of a real part and an imaginary part. The real part of the Gabor filter can be expressed as the following formula. x y x g x y where the values of x and y are: From a wide range of experiments, we have found that the value of the wavelength is specified in pixels, and it is usually greater than or equal to 2, but cannot be greater than one-fifth of the size of the input image. What we usually use is that its reciprocal frequency is greater than or equal to 0.2. The parameter of direction (θ) specifies the direction of the parallel stripe of the Gabor function, which takes values from 0 to 360 degrees. The phase offset (φ) ranges from -180 degrees to 180 degrees. Among them, 0 and 180 degrees respectively correspond to the center-symmetric center-on function and the center-off function, and -90 degrees and 90 degrees correspond to the antisymmetric function. The aspect ratio (γ) refers to the spatial aspect ratio, which determines the shape of the Gabor function. When γ = 1, the shape is round, and when γ < 1, the shape is elongated with the direction of the parallel stripe.
We use the Gabor filter to replace the conventional first layer convolutional layer, and the extracted feature vector is input into the CNN model for classifier training. In previous experiments, a convolutional layer was usually used as the first layer. However, experiments show that the Gabor filter improves the results for the feature extractor. The improved depth Gabor CNN model can effectively enhance the recognition of similar species.

A. THE CONSTRUCTION OF HARDWARE AND SOFTWARE ENVIRONMENT
This study uses a desktop computer configured with Intel i5 GPU, 8GB of running memory, and ROG STRIX-GTX1060 GPU. The experimental software environment is Ubuntu 16.04. 1 Desktop TLS operating system, and Caffe is installed as a framework for deep learning. The data set used contained 15,000 tree leaf images preprocessed by manual screening and script classification and expanded to 100,000 images finally, and the training set accounted for 80% and the test set 20%.

1) STEPS DESCRIPTION
Experiments were carried out on the data sets by using CaffeNet CNN. In order to obtain better experimental results and speed up the experimental training, we can obtain better classification results by parameter tuning of pre-trained model in the network. The experimental steps are as follows: Step 1: Clean and classify pictures, eliminate blurred pictures and other pictures with more distracting items, and generate listing files. Due to the small quantity of pictures at the beginning, we can use the manual classification method to pre-process the obvious pictures, such as removing blurred, multi-interference and multi-tree pictures, and then transfer them to the model for secondary cleaning; Step 2: Convert images into lmdb format that is easily processed by the Caffe framework and calculate their mean files to speed up and enhance accuracy of training and testing; Step 3: When designing the model to be used for training, use the training model that comes with the Caffe framework, and modify the layer of the model according to our needs; Step 4: Configure the parameters in the train.prototxt and solver.prototxt files, and alternately call the forward and backward algorithms to iteratively update the parameters to avoid local optimization of non-convex functions, and train the dataset to generate a caffemodel file; Step 5: Adjust the deploy.prototxt file and convert the mean file to a mean.npy file; Step 6: Call Caffe's python interface, make the above modifications based on Caffe's own Caffenet model and use our dataset to train the classification model, and then use the classification model to identify the test image.

V. RESULTS
First, the following 50 CNN models were trained in 50 tree species. It can be seen from Figure 14 that the improved Dual-Task CNN model has a higher classification accuracy than the Single-Task CNN model, and the re-improved GCNN model has better classification accuracy for the data set. The DCNN and GCNN models obtained by training two types of similarly similar species are shown in Figure 15 and Figure 16. When the number of iterations reaches 4000, the classification accuracy of the GCNN model is increased from 87% to 98% compared with the DCNN model. From the ROC curve, the GCNN model is also better than the DCNN model for the classification performance of similar species.

VI.CONCLUSIONS
Aiming at the problem of complex image extraction in existing complex backgrounds, we propose an Auto-Clean convolutional neural network model that can achieve image cleaning and multi-class prediction of trees, and improve the generalization and classification ability of the model. The joint feature shares through parallel training and learning of related tasks, and then the BP algorithm and the mini-batch random gradient descent algorithm are used to optimize the model. In order to avoid local optimization of non-convex functions, the super parameters in the model are selected and tuned, and the training model is further trained. Finally, upon the above improvements and experiments, the training precision of the Auto-Clean dual-task model and the ordinary single-task model can reach 96.0% and 91.7%, and the recognition time is as low as 2s, achieving the goal of efficiently and accurately identifying the number of types.
Then, in order to reduce the interference of species similarity on the classification performance of Dual-Task CNN model and further improve the classification performance of the tree image recognition system, we propose a re-improved deep Gabor convolutional neural network model called GCNN. The Gabor filter is used to enhance the extraction of edge and texture features of the tree image, and the extracted features are input into the CNN model to achieve the goal of enhancing the recognition accuracy of similar species. The experimental results also show that the re-improved deep Gabor convolutional neural network is more advantageous in distinguishing similar trees.
At present, recognition work of tree image is also interfered by many factors such as species type, species similarity, and complexity of the surrounding environment. However, with the further research and improvement of convolutional neural networks, it is believed that there will be more models with excellent performance. It can be applied to the classification and identification of trees to make efforts for the improvement and protection of the ecological environment.

ACKNOWLEDGMENT
This work was supported by the National Natural Science Foundation in China (Grant Nos. 61703441)