The correlation coefficients between the targets and predicted outputs of FFNN and CNN models were studied for all elements as well as background radiation related to both train and test data. The study demonstrated the efficient training of both FFNN and CNN models. These trained networks were subsequently utilized to analyze gamma spectra obtained during the inspection of containers with various targets in following section.

## 3.1. Container inspection using trained ANN models

The 1D inspection of the container was conducted in slices with a depth of 10 cm and a 5 cm horizontal shift (see Fig. 1), with a measuring time of 10 minutes and with various targets as outlined in Table 1. The position of the middle of the targets was approximately fixed at 75 cm deep inside the container, and a gamma-ray spectrum was obtained for each slice of the container. The positions of slices were determined using neutron time-of-flight, assuming a constant neutron velocity of 5.13 cm/ns (14-MeV neutrons). The distance of each slice from the iron wall of the container, as illustrated in Fig. 1, was considered, with the center of each slice used as the reference point. Figure 3(a) and (b) illustrates the depth profiles of C, N, and O count contributions for a 1 kg of TNT simulant as the target in the container (without cargo matrix around), employing FFNN and CNN models for analyzing the obtained gamma-ray spectra. Similar profiles were obtained for 1 kg of each of the other simulants presented in Table 1. The error bars of count contributions at each depth were derived from a “small” dataset of size 500 constructed for each measured spectrum, as detailed in section 2.2. In addition to C, O, and N, other elements were also inspected for material and threat identification in a container. As an example, the depth profiles of Cl and S count contributions corresponding to measuring Yperite simulant (mustard gas, a warfare chemical) as the target (still without cargo matrix around) were obtained, and the results demonstrated precise identification of this simulant. Figure 3(a) and (b) highlights the importance of further investigating the area around 75 cm (65 to 85 cm) within the container to identify material in this region. Initially, we assessed how well the ANN models can reconstruct the gamma-ray spectrum at a distance of 75 cm for TNT explosive as an example case. The reconstructed spectra, obtained using the mean values from both FFNN and CNN models, are compared to the measured spectrum in Fig. 3(c). According to Fig. 3(c), both ANN models perform fairly well in reconstructing the measured spectrum. Furthermore, Fig. 3(d) to (i) overlays the SHAP values corresponding to the C, O, and N elements on the measured spectrum, to visualize the impact of individual energy bins on the predictions made by the FFNN and CNN models. These SHAP values were obtained from both the FFNN and CNN models. The measured spectrum is represented by a white line on the x-axis depicting energy (MeV) and normalized counts on the y-axis. The color map indicates the magnitude of the SHAP values, offering insights into the contribution of each energy bin to the overall prediction. Both models primarily focus on the major gamma lines of the elements C (4.44 MeV) and O (6.13 MeV) to predict their count contributions in the TNT gamma spectrum. However, the CNN model demonstrates a more pronounced emphasis on these gamma lines compared to the FFNN model. As the 5.11 MeV gamma line of N element is not observed in the TNT spectrum in this measurement, the CNN model distinctly emphasizes the 2.31 MeV gamma line to predict the count contribution of N.

Finally, the count contributions should be converted to weight fractions for C, N, and O for threat identification purposes. To achieve this, calibration lines were obtained separately for each element and for both FFNN and CNN models. For each element, the maximum mean values in the corresponding depth profiles of three simulants, in addition to the (0,0) point, were used to map the count contribution to the weight fraction. The simulants RDX, Cocaine and Peroxide methylethylketone shown in Table 1 were used for C element; RDX, Nitromethane, and Tetranitromethane for N element; and RDX, Cocaine and Tetranitromethane for O element.

The obtained calibration lines were then applied to convert the count contributions of C, N, and O elements obtained using ANN models to weight fractions for all simulants presented in Table 1. Figure 3(a) and (b) shows variations in the count contributions in any selected suspicious area. To mitigate these variations, the mean of the three maximum count contributions in the selected region was considered for each element and then converted to weight fractions using the calibration lines, followed by normalization to ensure that the sum of the weight fractions for C, N, and O equals one. The resulting normalized weight fractions were then considered as a point in the CNO barycentric triangle [12, 39]. The CNO barycentric triangle illustrates material compositions based on normalized weight fractions of C, N, and O. Each vertex represents pure C, N, or O, while the interior shows combinations of these elements. Points within the triangle indicate mixtures of C, N, and O, offering a visual representation of material compositions in our plots. Figure 4 depicts the theoretical point represented by a black triangle (▲) in the barycentric triangle obtained from weight fractions for each simulant (true theoretical composition), except for Yperite, compared to the measured points obtained using FFNN and CNN models, respectively. The measured points are represented as black points (●) for FFNN model and star symbols (★) for CNN model for each simulant listed in Table 1. To illustrate the uncertainty in the location of the measured point in barycentric triangle, the measured points were calculated for “small” dataset of size 500 constructed for each measured spectrum (as detailed in section 2.2) using both ANN models. The uncertainties obtained are shown in Fig. 4, with grey points forming a “cloud” around the measured points. Additionally, blue points represent some of the illicit drugs, red points represent some explosives, and green points represent some benign materials, illustrating the distribution of materials within the CNO barycentric triangle. As evident in Fig. 4, both ANN models predict the measured points close to the theoretical points, with the FFNN model showing better performance. The better performance of FFNN is further supported by the depth profiles (TNT results shown in Fig. 3(a) and (b)), as the FFNN exhibits smaller standard deviations for each element.

To evaluate the performance of the ANN models in the presence of a matrix, 10 kg of TNT was hidden within the container behind 40 and 80 cm wood bales, with a density of 0.2 g.cm− 3 (mimicking an organic cargo). In another series of experiments, the same TNT target was hidden behind 40 and 105 cm iron boxes filled with bundles of iron wire, with an apparent density of 0.2 g.cm− 3. Figure 5 displays the depth profiles of C, N and O count contributions obtained using both FFNN and CNN models for 10 kg of TNT hidden in wood and iron matrices. According to the Fig. 5, the distances of 70–90 cm and 115–135 cm were considered as the suspicious areas for the TNT hidden at depths of 40 cm and 80 cm in the wood matrix, respectively. Furthermore, the distances of 85–105 cm and 160–180 cm were considered as the suspicious areas for the TNT hidden 40 cm and 105 cm deep inside the iron matrix, respectively. The resulting measured points, along with their uncertainty obtained using ANN models in the CNO barycentric triangle for TNT hidden behind wood and iron matrices have been shown in Fig. 6.

The depth profiles of C and O are obscured by the large background profiles of the wood matrix, particularly for TNT hidden in the wood matrix at a depth of 40 cm. This effect is noticeable in Fig. 6 (a) and (b), where the weight fraction of N is underestimated.

The SHAP values were calculated for N element for both FFNN and CNN models at distances of 80 cm and 125 cm, which are at the middle of suspicious regions shown in Fig. 5, for TNT behind 40 cm and 80 cm of wood matrix, respectively. The overlay of SHAP values on the measured spectra of TNT is shown in Fig. 7.

It is obvious in parts (a) and (c) of Fig. 7 that the measured gamma-ray spectrum is mainly influenced by intense gamma lines of C and O for TNT behind 40 cm of wood. The effect is less pronounced for TNT behind 80 cm of wood, resulting in better visibility of the gamma lines of N and consequently facilitating the estimation of N using its major gamma lines (2.31 and 5.11 MeV) by both models.

The depth profiles, particularly for N, are significantly affected by the poor counting statistics when TNT is behind 105 cm of iron, as shown in Fig. 5 (g) and (h). This effect is evident in Fig. 6 (g) and (h), showing large uncertainties and fluctuations of the measured points.

Due to this critical situation, SHAP values were calculated for the C element for both FFNN and CNN models at distances of 95 cm and 170 cm, which are at the middle of the suspicious regions shown in Fig. 5, for TNT behind 40 cm and 105 cm of iron matrix, respectively. The overlay of SHAP values on the measured spectra of TNT for the C element is shown in Fig. 8 (a) to (d). Additionally, the overlay of SHAP values for the Fe element is depicted in Fig. 8 (e) and (f) for TNT behind 105 cm of iron matrix for both models.

It is evident in parts (a) and (c) of Fig. 8 that both models still mainly utilize the 4.44 MeV gamma line of the measured gamma-ray spectrum to predict C for TNT behind 40 cm of iron, albeit with a lower contrast compared to Fig. 3(d) and (g). The effect is much more pronounced for TNT behind 105 cm of iron, where no specific region is used for predicting C, but rather the entire spectrum. In fact, there is no specific peak in the measured spectrum corresponding to C, N and O elements, and the measured spectrum is similar to the gamma signature of pure Fe shown in Fig. 2(c). Since iron is the most abundant material inside the container in this case, both ANN models effectively predict the count contribution of Fe using its major gamma line, as shown in Fig. 8(e) and (f). Even in this situation where the gamma-rays originating from different elements are heavily obscured by the iron matrix, both ANN models could still fairly determine the position of the threat (Fig. 5(g) and (h)) and the category of the unknown material as explosive (Fig. 6(g) and (h)).

As can be seen from the depth profiles presented in Figs. 3 and 5, both the FFNN and CNN models identify the suspicious areas at the same positions (i.e. depths) within the container. This consistency enhances the reliability of container inspection, as it indicates that two different ANN models pinpoint the same locations.

Indeed, the consistency observed in the identification of suspicious areas by both FFNN and CNN models was further supported by the results of XAI techniques. Specifically, the SHAP values overlaid on the measured spectrum provide insights into the contribution of individual energy bins to the models' predictions. Figures 3, 7, and 8 illustrate how both models primarily focus on major gamma lines of C, O, and N elements to predict their count contributions accurately. Additionally, the CNN model demonstrates a more pronounced emphasis on these gamma lines compared to the FFNN model, highlighting differences in their predictive strategies. This analysis underscores the reliability and interpretability of the models' predictions, enhancing trust in their performance in identifying suspicious areas within cargo containers.

Furthermore, the results from Fig. 4, which depicts the performance of the models in positioning the measured points in the CNO barycentric triangle for simulants without matrices, and Fig. 6 which present the results for TNT simulant hidden within wood and iron matrices, indicate effective performance in identifying material category within the CNO barycentric triangle. Generally, a slightly better performance in terms of proximity to the theoretical points (except for TNT in presence of 105 cm iron matrix) was observed for the FFNN model. However, both models displayed robust predictive capabilities, allowing for the accurate identification of suspicious regions within cargo containers.