To evaluate the performance of our proposed network, we conducted extensive experiments on the benchmark dataset Seaships7000. We compared the proposed LSDNet with other networks via quantitative and qualitative results. The implementation details and experimental results are described as follows.
4.1 Implementation Details
4.1.1 Experimental Environment and Parameter Settings
The experimental environment is based on Pycharm software with operating system Ubuntu 22.04, deep learning framework PyTorch1.13, programming language Python 3.9, and integrated development environment CUDA11.4. The CPU model is 12th Gen Intel (R) Core (TM) i5-12490F, with a main frequency of 3 GHz, and the graphics card is NVIDIA GeForce GTX 1080Ti. For the optimal hyperparameters used in our network, the basic learning rate, momentum, and weight attenuation are set to 0.01, 0.937, and 0.0005, respectively, with the optimizer Adam. In all experiments, the size of the input image is 640 × 640 pixels, epoch is set to 200, and batch size is set to 32. The other parameters use the default values from the original YOLOv7-tiny.
4.1.2 Evaluating Indicator
The nine objective metrics are used to evaluate the effectiveness and efficiency of the proposed model, namely Precision, Recall, F1, [email protected], [email protected]:0.95, Inference, Param, FLOPs, Train time.
Precision is used to evaluate the accuracy of ship predictions, reflecting the proportion of actual positive samples among all predicted positive samples. Recall is used to evaluate whether all ships in the test dataset have been correctly predicted, which reflects the proportion of positive samples correctly predicted by the model to the total positive samples. The F1 score is the harmonic average of Precision and Recall [email protected] and [email protected]:0.95 is a comprehensive indicator for measuring the accuracy and robustness of ship detection. Param is the parameter quantity that reflects the memory usage of the model, and FLOPs is the computational cost that reflects the complexity of the model. The fewer model parameters and FLOPs, the lower the cost of detecting the model. In addition to the above evaluation indicators, we also provided inference speed and training time to verify the progressiveness of our model.
4.2 Dataset Settings
4.2.1 Dataset
We used the free and publicly available ship dataset SeaShips7000 [31] as the dataset for this experiment, which consists of a total of 7000 images and includes six types of ships: ore ships, bulk cargo ships, regular container ships, container ships, fishing ships, and passenger ships. In this experiment, they are randomly divided into a training set, a validation set, and a testing set, with a ratio of 3:1:1.
Figure.6 shows a statistical analysis on the number of samples labeled with the type of ships in the dataset, as well as the size and shape of the labeled boxes. As shown in Figure. 6(a), uneven distribution of the numbers for six types of ships increases the difficulty of detection tasks. Figure 6 (b) reveals that the value of the aspect ratio of the candidate boxes is generally large, conforming to the shape characteristics of the ship itself. In Figure 6 (c), x and y represent the ratio of the coordinate of the center point of the annotation box to the size of the instance, respectively. Fig.1 (d) shows that the SeaShips7000 dataset has a wide range of ship instances and is concentrated in the middle and bottom positions of an image.
4.2.2 K-means
In detection tasks, appropriate anchor boxes are beneficial for improving detection accuracy and speed, while the anchor boxes in YOLOv7 tiny are based on the COCO dataset. To make the training more suitable for our dataset, we used K-means clustering [21] to automatically generate anchor boxes during the training process. K-means is essentially a data partitioning method based on Euclidean distances, where dimensions with high mean and variance have a decisive impact on the clustering results of the data. The core goal is to divide the given dataset into K clusters (where K is a hyperparameter) and provide the corresponding center point for each sample. The core part is to first fix the center point, adjust the category of each sample to reduce the loss function, then fix the category of each sample, adjust the center point, and continue to reduce the value of the loss function. These two processes cycle alternately until the loss function monotonically decreases to the minimum (minimum) value.
The size comparison between the anchor box generated by K-means clustering and the original anchor box is shown in Table 1. We can see that compared with the original anchors, the size of clustered anchor is larger and is more consistent with the aspect ratio of ships.
Table 1 Comparison of Anchor Frames.
anchors
|
Big
|
Medium
|
Small
|
original
|
(116,90)
|
(30,61)
|
(10,13)
|
(156,198)
|
(62,45)
|
(16,30)
|
(373,326)
|
(59,119)
|
(33,23)
|
K-means [21]
|
(412,49)
|
(142,41)
|
(54,13)
|
(304,83)
|
(232,40)
|
(98,23)
|
(506,87)
|
(237,59)
|
(186,25)
|
We performed a performance comparison of models with original anchor boxes and the automatically generated anchor boxes, and the experimental results are shown in Table 2. YOLOv7-tiny* denotes the anchors generated by K-means clustering.
Table 2 Comparison of YOLOv7-tiny and YOLOv7-tiny*.
Methods
|
Precision
|
Recall
|
F1
|
[email protected]
|
[email protected]:0.95
|
Inference(ms)
|
Params(M)
|
FIOPs(G)
|
Train time(h)
|
YOLOv7-tiny [14]
|
0.970
|
0.958
|
0.964
|
0.983
|
0.750
|
1.8
|
6.02
|
13.1
|
3.118
|
YOLOv7-tiny*
|
0.979
|
0.966
|
0.972
|
0.985
|
0.760
|
1.9
|
6.02
|
13.1
|
3.110
|
From Table 2, we can see that the anchor boxes automatically generated using K-means clustering are more suitable for the dataset used in this experiment. Compared to the original YOLOv7-tiny, the accuracy of YOLOv7-tiny* is improved by 0.9%, and the recall rate is improved by 0.8%, [email protected] gets an increase of 0.2%, and [email protected] 0.95 is increased by 1%. It can be derived that we can achieve better results using K-means clustering to generate anchor boxes, so the following experiments are all based on the K-means clustering.
4.3 Training Results and Analysis
To better evaluate the performance of the proposed LSDNet, we analyzed its training results. In Algorithm 1, we describe the data training process in detail. During training, we set the epoch to 200 and the loss curve of LSDNet is given in Figure 7. We can see that the loss value continuously decreases and converges after 200 epoches of training. The Precision and Recall curves rise slowly and converge rapidly, indicating that the model is well trained without overfitting.
Algorithm 1 Training Procedure for LSDNet
Input: Original image of ship detection
Output: Calibration results of detection in lightweight conditions
Initialize LSDNet D ^ θ with random weights θ.
Set the training stage: num_epochs=200, batch_size=16.
Prepare the normal dataset Seaships7000_trainval.
for i in num_epochs do
repeat
Take a batch images M from Seaships7000_trainval.
for j in batch_size do
if random. randint (0, 3) > 0 then
Randomly select a batch of images and their corresponding labels from the training dataset.
Apply data augmentation techniques to the selected images.
Use the augmented images and labels to calculate the loss.
end if
end for
Update LSDNet D ^ θ according to detection loss.
until all images have been fed into training models
end for
|
We made quantitative comparison between the proposed LSDNet and the original YOLOv7-tiny on six types of ships in terms of accuracy, recall, and mAP. Additionally, the plotted P-R curves for each type of ships are shown in Figures 8 and 9. We can observe that the proposed model achieves the better performance on most types of ships in terms of accuracy, recall, mAP and overall mAP than the original YOLOv7-tiny. This proves the superiority of our network in ship detection.
The F1 score is an important indicator of model stability, and the higher its value, the better the model performance. We calculated the F1 values of YOLOv7-tiny and LSDNet, and plotted corresponding curves, as shown in Figure 10. It is obvious that our model obtains a F1 score greater than YOLOv7-tiny, indicating better performance of the LSDNet model.
The confusion matrices on the validation sets are shown in Figure 11 to further evaluate the detection performance of the trained model.
It is obvious that the proposed model achieves the FP and FN of most types of ships comparable or even better than the original data, indicating that the improved model has higher classification accuracy.
4.4 Quantitaive Evaluation
To verify the effectiveness of our proposed network, we made a quantitative evaluation on SeaShips7000 dataset for the YOLO series and other lightweight backbone networks. The YOLO series includes YOLOv5n, YOLOv7-tiny, and YOLOv7-tiny*. The popular lightweight networks, MobileNetv3 [17], ShuffleNetv2 [19], and MobileOne [20], replace the backbone of YOLOv7-tiny. The experimental results are shown in Figure 12, Table 3 and Table 4.
Figure 12 shows the mAP, Precision, and Recall curves of all detection algorithms during model training. It can be seen that our model has slightly higher mAP and recall values than other models, and maintains a high level of accuracy. All curves slowly rise and converge rapidly, indicating that the model has undergone good training and no overfitting has occurred.
Table 3 shows that our model achieved the best performance in the YOLO series, as its accuracy, recall, and mAP were improved compared to the original YOLOv7-tiny. In addition, our model is more lightweight, with a size of only 3.69MB, accounting for only 61.3% of YOLOv7-tiny. As shown in Table 4, our model outperforms other lightweight backbone networks in terms of accuracy, recall, and mAP. Although it has slightly higher computational complexity than MobileNetv3, our model has the least number of parameters. The experimental results also show that our model has the fastest inference speed, indicating that our model can achieve real-time monitoring, thus its cost-effectiveness is better than other models.
Table 3 Performance Comparison on SeaShip7000 Dataset (Mainly in YOLO Series).
Methods
|
Precision
|
Recall
|
F1
|
[email protected]
|
[email protected]:0.95
|
Inference(ms)
|
Params(M)
|
FIOPs(G)
|
Train time(h)
|
YOLOv5n [11]
|
0.959
|
0.959
|
0.959
|
0.980
|
0.745
|
0.9
|
1.77
|
4.2
|
2.522
|
YOLOv7-tiny [14]
|
0.970
|
0.958
|
0.964
|
0.983
|
0.750
|
1.8
|
6.02
|
13.1
|
3.118
|
YOLOv7-tiny*
|
0.979
|
0.966
|
0.972
|
0.985
|
0.760
|
1.9
|
6.02
|
13.1
|
3.110
|
Ours
|
0.974
|
0.970
|
0.972
|
0.985
|
0.757
|
2.5
|
3.69
|
7.4
|
4.904
|
Table 4 Performance Comparison on SeaShip7000 Dataset (Mainly in Lightweight Backbone Series).
Methods
|
Precision
|
Recall
|
F1
|
[email protected]
|
[email protected]:0.95
|
Inference(ms)
|
Params(M)
|
FIOPs(G)
|
Train time(h)
|
MobileNetv3 [17]
|
0.969
|
0.946
|
0.957
|
0.981
|
0.723
|
4.1
|
4.18
|
6.9
|
3.129
|
ShuffleNetv2 [19]
|
0.967
|
0.943
|
0.955
|
0.979
|
0.717
|
2.9
|
4.51
|
8.6
|
3.272
|
MobileOne [20]
|
0.954
|
0.949
|
0.951
|
0.977
|
0.713
|
2.8
|
4.48
|
10.3
|
12.319
|
Ours
|
0.974
|
0.970
|
0.972
|
0.985
|
0.757
|
2.5
|
3.69
|
7.4
|
4.904
|
4.5 Qualitative Evaluation
In the qualitative evaluation, we selected the baseline model as a reference and presented the visual results of our model and baseline model, as shown in Figure 13. The first column in Figure 13 shows the original image, the second column shows the detection results of YOLOv7-tiny, and the third column shows the detection results of our model.
From Figure 14, it can be seen that our network has a much higher detection accuracy for occluded ships (the first row) than the original YOLOv7-tiny. Moreover, our network has higher detection accuracy in the hazy night (second line). Due to the small size of the fishing boat, YOLOv7-tiny mistakenly detected two fishing boats that were closer together as one (in the third row), and our model was able to effectively detect both fishing boats.
4.6 Ablation Experiment
To explore the effectiveness and efficiency of the proposed network, we performed an ablation study on the components of the LSDNet architecture and the experimental details are shown in Table 5. It can be seen that when only PConv is added, parameters and computation cost are reduced markedly, while the values of [email protected]:0.95 are decreased. When only GhostConv is added, mAP remains almost unchanged. Although the model parameters and computation cost has decreased, it is still not lightweight enough. When PConv and GhostConv are added together, the model parameters and computational complexity are further reduced, and mAP remains at a high level. In addition to PConv and GhostConv, our network structure also incorporates Mosaic-9 data augmentation, which can enhance the robustness of the model while maintaining its lightweight. It is obvious from Table 5 that the proposed LSDNet achieves the comparable performance to YOLOv7-tiny while requiring less parameters and computation cost. This indicates that LSDNet achieves a better balance between detection accuracy and model complexity.
Table 5 Comparison of Ship Detection Performance of Different Modules.
Number
|
K-means [21]
|
PConv [25]
|
GhostConv [29]
|
Mosaic-9 [31]
|
Params(M)
|
FIOPs(G)
|
[email protected]
|
[email protected]:0.95
|
N1
|
|
|
|
|
6.02
|
13.1
|
0.983
|
0.750
|
N2
|
√
|
|
|
|
6.02
|
13.1
|
0.985
|
0.760
|
N3
|
√
|
√
|
|
|
4.35
|
9.2
|
0.985
|
0.738
|
N4
|
√
|
|
√
|
|
5.80
|
12.3
|
0.985
|
0.759
|
N5
|
√
|
√
|
√
|
|
3.69
|
7.4
|
0.984
|
0.750
|
N6
|
√
|
√
|
√
|
√
|
3.69
|
7.4
|
0.985
|
0.757
|