Build a black rectangular box for target detection network recognition
Due to the obvious features of the black rectangular box, it is easy to identify. After comparative analysis of Fasterr CNN, SSD and YOLOV3, this paper selects the YOLOV3 target detection network with good recognition accuracy and speed to identify and cut the black rectangular box, and obtains the black rectangular box water meter image data set. The steps to build a Yolov3 network are as follows:
First, Calculate the width and height of the mark. All the black rectangular boxes are manually marked as the same class (this class is named screen) through the labelImg widget, generate xml files one by one, read out the coordinates in the xml file, calculate the width and height of the mark box, namely w and h, by calculating the values of xmax - xmin and ymax - ymin.The image labeling effect is shown in Figure 4.
Second,Clustering algorithm is used to design the target detection box which conforms to the standard. Calculate all w and h of the annotation box and gather 9 center points to design 9 target detection boxes.
(1) Nine target detection boxes are gathered by K-means algorithm. Build an array object with each group w and h. There are 3600 array objects in total, form a two-dimensional array, initialize 9 data points as the center of mass (w1, h1), (w2, h2), (w3, h3), ..., (w9, h9), that is, nine classes. Calculate the distance from the center of mass for each array object in a two-dimensional array, such as (wa, wb). If (wa, wb) is closer to (w1, h1), then (wa, wb) is divided into the class (w1, h1). After the first cycle, for each class, the average value of w and the aaverage value of h are calculated to obtain (w1’, h1’) as the new center of mass, and then the distance and new center of mass are recalculated until no new center of mass position is generated. The classification effect figure 5 is as follows.
(2) Hierarchical clustering method was used to divide 9 target detection boxes.3600 array objects equals 3600 clusters. The Euclidean distance between clusters[19] is calculated to determine whether it is the same class. Define a distance threshold. If the distance between clusters is closer and does not exceed the threshold (the threshold is set as 130 here), the two clusters are classified into one cluster; if the threshold is exceeded, the two clusters are divided into two categories. The center of mass of the clusters is represented by the mean value of the pair-distance of their respective data points.
Part of the effect of hierarchical clustering is shown in Figure 6.
The classification comparison between K-means and hierarchical clustering algorithm is shown in Table 1, and its distribution is shown in Figure 7.
Table 1
Comparison of clustering algorithms
Clustering algorithm | Total width difference | Total height difference | Average width | Average height |
K-means | 593.23 | 113.39 | 210.02 | 60.42 |
Hierarchical clustering | 67.58 | 14.48 | 310.51 | 84.98 |
It can be seen from the figure that, compared with the classes divided by hierarchical clustering, the categories divided by K-means are more obvious in width and height, and can better classify the target box of identifying large target, medium target and small target. Combined with the water meter image data set, K-means can be used for better classification. The effect of some image target boxes is shown in Figure 8.
Third, The training model of migration[20]. In this experiment, model parameters pre-trained by COCO were selected for fine-tuning.
The specific training steps are as follows:
(1) Preprocess the water meter image, unify the image size, for example: 416*416, shrink the long side to 416, shrink the short side and fill it with pixel RGB (128,128,128). The processed image is shown in Figure 9.
(2) Select the pre-training model. In this experiment, model parameters pre-trained by COCO were selected for fine-tuning.
(3) Process the data set and set the data set to VOC format.
(4) Set the parameters of the experiment. Some parameter Settings are shown in Table 2.
Table 2
identifies part of the network parameters of the black rectangular box
Parameter Names | Parameter Values |
BATCH_SIZE | 2 |
IOU_LOSS_THRESH | 0.5 |
ANCHOR_PER_SCALE | 3 |
LEARN_RATE_INIT | 1e-4 |
LEARN_RATE_END | 1e-6 |
EPOCH | 30×1800 |
Epoch represents the number of training sessions, 30 of 1,800 each. Batch size represents the two images used at each time. After repeated experiments, the batch size of 2 can meet the environmental requirements of this experiment. The learning rate is set between 1e-6 and 1e-4.
(5) Training and testing.
Data enhancement of water meter image
Data enhancement is to expand more data by changing the picture data of the training and synchronously changing the position information of the label, but not changing the category of the label, so as to improve the quality of the picture. Meanwhile, this paper only expands the training set. In computer vision, many image transformation methods are used to expand data sets. In this paper, the following data expansion methods are used in the image preprocessing stag
(1)The original image was flipped left and right to expand the data set and increase the diversity of the data.
(2)The original image is cropped randomly. Set the random function and the clipping threshold (for example, 0.5). If the value of the random function is less than the clipping threshold, it will be clipped randomly; otherwise, it will not be clipped randomly.
(3)Translate the original image. The fruit images in the data set were randomly shifted horizontally within the range of [0, width*0.1] or vertically within the range of [0, height*0.1].
After the above geometric transformation, the images were amplified by about 3 times, and the whole training set was expanded to 14400 images. An example diagram of a water meter turning left and right is shown in Figure 12.
a b
Figure 12 Water meter image after geometric transformation
Construction and selection of reading recognition model
Deep learning model building environment
Fasterr CNN uses the candidate boxes generated by RPN algorithm, and the candidate boxes are the same network as the CNN network for target detection, which reduces the number of candidate boxes to 300 and improves the quality of the selection boxes at the same time. To solve the problem of slow speed in Fasterr CNN, SSD adopts the idea of feature stratifying extraction, border regression and scoring simultaneously, and integrates the Anchors idea in Fasterr CNN, which is suitable for multi-scale target detection and faster at the same time.YOLOV3 is a target detection network with a very balanced speed and accuracy. Combined with this experimental environment, this paper uses Fasterr CNN in the two-stage target detection algorithm and SSD and YOLOV3 in the one-stage target detection algorithm to train the classification model of water meter graphics. The introduction of the experimental environment is shown in Table 3.
Table 3
Introduction to the experimental environment
CPU | AMD Ryzen 7 1700 Eight-Core Processor 3.00GHz |
Memory | 16GB |
Operating System | windows64 |
GPU | NVIDAGeForce GTX 1060 6GB |
Programming Language | Python3.7 |
Computer image vision library | OpenCV |
Deep Learning Framework | TensorFlow2 |
CUDA Version | 10.0 |
Selection of model plans
First, construct Faster RCNN to identify water meter numbers.The specific training steps are as follows:
(1) Make the data set in VOC format. There are 14400 trainval files (training set) and 400 test files (test set).
(2) Debug the environment. The .pyx files of cython_bbox and bbox are compiled to generate .c files so that the files can be run on windows to generate bounding box.
(3) Determine the pre-training model. Select the public VGG16 as the pre-training model and fine-tune its parameters.
(4) Debugging parameters are shown in Table 4.
(5) Training and testing.
Table 4
Faster RCNN partial parameters
Parameter Names | Parameter Values |
BATCH_SIZE | 32 |
LEARNING_RATE | 0.000001 |
EPOCH | 12000 |
RPN_POSITIVE_OVERLAP | 0.7 |
RPN_NEGATIVE_OVERLAP | 0.3 |
Second, construct SSD to identify water meter image. The specific training steps are as follows:
(1) Preprocessing on the water meter image. Scale the size of the image to 300*300.Make VOC data sets and convert images and labels to TF Records format.
(2) Determine the pre-training model and fine-tune it using the public VGG16 model parameters.
(3) Set experimental parameters as shown in Table 5.
(4) Training and testing
Table 5
Partial experimental parameters of SSD
Parameter Names | Parameter Values |
BATCH_SIZE | 3 |
END_LEARNING_RATE | 1e-6 |
EPOCH | 12000 |
MATCH_THRESHOLD | 0.5 |
NUM_CLASS | 11 |
TRAIN_IMAGE_SIZE | 300 |
Since the image size is changed to 300*300, the batch size setting of 3 can best meet the requirements of the experiment. When the learning rate is greater than 0.001, the loss function of the model does not converge. Through many experiments, 1E-6 was selected as the learning rate of this experiment.
Third, construction of YOLOV3 recognition water meter image.
(1) Nine Anchorbox were selected by clustering algorithm. By calculating w and h of each annotation box, an array object is constructed with each group of w and h. There are a total of 16,402 array objects. The K-means algorithm and hierarchical clustering algorithm described in the second step of Section 3.2.1 are used respectively to gather 9 AnchorBox classes. The experimental results are shown in Table 6, and the category distribution is shown in Figure 13.
Table 6
Experimental results of K-means and hierarchical clustering
Clustering algorithm | Total width difference | Total height difference | Average width | Average height |
K-means | 93.8 | 106.5 | 30.62 | 34.86 |
Hierarchical clustering | 13.88 | 16.56 | 40.54 | 47.84 |
It can be seen from the experimental results that the 9 AnchorBox classes gathered by the K-means clustering algorithm are more average compared with the hierarchical clustering algorithm, and can be used for multi-scale recognition with sizes ranging from [20, 23] to [114.03, 129.53], which is more in line with the design of YOLOv3. So select the 9 AnchorBox separated by K-means.
(2) Use transfer learning. The specific training steps are as follows:
① Resize the image to 416 by 416.
② Pre-training model selection. In this experiment, model parameters pre-trained by COCO were selected for fine-tuning.
③ Processing data sets. Put the data in VOC format and generate voc_train and voc_test texts.
④ Set experimental parameters. Parameter Settings are shown in Table 7.
⑤ Training and testing.
Table 7
Some parameters of YOLOV3
Parameter Names | Parameter Values |
BATCH_SIZE | 2 |
IOU_LOSS_THRESH | 0.5 |
ANCHOR_PER_SCALE | 3 |
LEARN_RATE_INIT | 1e-4 |
LEARN_RATE_END | 1e-6 |
EPOCH | 30×7200 |
Experimental comparison and analysis