Based on the previous evaluation of the homologous data and the sampling points of the FPS method, we added the factor of two homologous data to the initial training dataset as the training dataset and used the point cloud obtained from the MMW system as the test dataset, with the number of sampling points set to 8192, and performed the training for 300 epochs. The training results are shown in Fig. 7(a). The blue data in the figure represents the accuracy rate of the training dataset, which reaches 0.998 after 300 epochs. The orange data represents the accuracy rate of the test dataset, which reaches 0.996 after 300 epochs. These training results show that we have achieved a very high accuracy rate in model training.
Additionally, the loss value is crucial in evaluating trained models, as it serves as the core metric for assessing predictive performance and guiding model optimization. In the PointNet + + classification model, cross-entropy is used as the loss function, represented as the average cross-entropy loss across all samples [24]. When the predicted and actual classes differ significantly, the loss value is high; conversely, the loss value is low when the predicted and actual classes are similar [38]. By calculating the loss function value, the model can quantify the difference between the current predictions and the true values. As shown in Fig. 7(b), the blue line represents the train mean loss, while the orange line represents the test mean loss. The train mean loss rapidly decreases and converges within the first 30 epochs, ultimately approaching zero, indicating that the model is highly fitted to the training data. The test mean loss also decreases rapidly within the first 40 epochs, then fluctuates slightly, but eventually approaches zero, demonstrating good generalization performance on the test data. Moreover, the train loss and test loss run almost parallel, with the test loss slightly higher than the train loss, indicating consistent performance between training and testing data and no significant overfitting. In summary, the trained model exhibits ideal performance on both training and test data, showcasing strong learning capability and generalization ability.
In order to evaluate the trained model, each sample in Fig. 2(b) is transformed by scaling down by a factor of 0.95 and shifting to the right by 0.5 cm as an evaluation dataset input to the trained model. The data used for evaluation are point clouds of 11 hammers, 11 knives, 12 pistols, 13 scissors, and 12 wrenches. The prediction results are shown in Fig. 8. For the available samples, the trained model classified all objects correctly, which conforms with a high accuracy rate of 99.6%.
To ensure that the trained models have good generalization ability, further evaluation and validation of the trained models were performed. A sample not included in the dataset was used to evaluate the trained model. The samples are shown in Fig. 9 (a), with a hammer, two wrenches, and a knife. The right side of Fig. 9(a) shows the actual image of the samples, and the left side shows the point cloud obtained from the MMW imaging processing. The point clouds of the four objects are input into the trained model, and the results obtained are shown in Fig. 9(b), where all the samples are classified correctly.
Considering the practical application that objects containing plastic or wood will not be sprayed with metallic paint, we evaluated objects that were not sprayed with metallic paint and also were not included in the dataset. The sample, shown in Fig. 10(a), contains scissors with a plastic handle, a knife with a plastic handle, and a hammer with a wooden handle. The right side of Fig. 10(a) shows the actual image of the sample, and the left side shows the point cloud obtained after the MMW imaging reconstruction process. From the point cloud image, it can be seen that the plastic or wood parts exhibit hollow areas, attributed to the high penetration of MMW to objects made of these materials, which are weak in reflecting signals. The point clouds of the three objects were input into the trained model, and the results obtained are shown in Fig. 10(b). Even though the point cloud data have hollow defects in some parts, the samples are all correctly classified, which proves that the trained model has the potential for the classification of non-metallic objects in practical applications.
Finally, in order to verify the feasibility of this study in practical applications, a cardboard box with concealed objects was used as a sample to be scanned by an MMW imaging system, the composition of which is shown in Fig. 11. The IWR1443 module was attached to a 2-axis mechanical stage for X-Z scanning purposes. The duration of the chirp signal in the 77–81 GHz range is 40 µs. The x-axis scanning speed is 200mm/s and the time taken to scan an area of 500mm x 500mm is approximately 4 minutes.
Due to the limitations of the scanning area, the five samples were divided into two sets and scanned in separate cardboard boxes. It is worth noting that of these 5 samples, the hammer, scissors, wrench, and knife are items not included in the dataset, and due to the limited number of samples of pistols, the pistol used in this concealed object identification was the sample included in the dataset.
The first set of concealed objects were a wooden-handled hammer, a plastic-handled knife, and plastic-handled scissors. The position of the hammer at the top layer is at approximately Y = 160 mm, as shown in Fig. 12(a), and the knife and scissors at the bottom layer are located at approximately Y = 232 and 235 mm, respectively, as shown in Fig. 12(b). Notably, the hammer and the two objects at the bottom layer overlapped with each other. In Fig. 12(c), the reconstruction results from four-sided scanning using our previously developed MMW point cloud reconstruction algorithm are illustrated. The light blue part is the perimeter of the cardboard box, the pink part is the scissors, the dark blue part is the knife, and the red part is the hammer. The results show that even though the objects overlap each other, the reconstruction results of four-sided scanning can still get the rough outline of each object, which is unachievable with 2D imaging. Due to the fact that some of the canceled objects are made of non-metallic materials such as wood and plastic, the reconstructed contours of the objects appear uneven and hollow. Since the point cloud has been segmented by clustering in the first step of data processing during single-sided scanning, the box and the inner objects can be segmented directly in the result of four-sided scanning. Subsequently, the segmented objects were input into the trained model. It is worth noting that it took a total of 16 minutes to scan all four sides of the cardboard, and it only about 4 seconds to input the segmented point cloud into the training model to obtain the prediction results. Figure 12(c) presents the object classification results and the supplementary video showcases rotated 3D results, wherein the concealed objects are boxed, and the object classification results are displayed beneath the box. Notably, all classification outcomes are accurate.
The second set of concealed objects are the pistol and the wrench. The position of the wrench on the top layer is approximately at Y = 165 mm, and the pistol on the bottom layer is located approximately at Y = 210 mm, as shown in Fig. 13(a). After scanning the six sides of the cardboard box, Fig. 13(b) shows the reconstruction results and the supplementary video showcases rotated 3D results. The light blue part is the perimeter of the cardboard box, the red part is the pistol, and the orange part is the wrench. The segmented objects inside the cardboard box are input into the trained model. Figure 13(b) showcases the results of object classification, where the concealed objects are boxed, and the object classification outcomes are displayed beneath the box, all of which are correctly classified. These outcomes demonstrate the robust performance of our trained model for object classification using MMW point cloud data, which often contain missing, hollow, noisy, and overlapping features. The efficacy of this method in practical applications has been successfully validated.