The initial model, Eant_1000px_200616, was constructed without overfitting and the error was 0.136 (Supplementary Fig. 2). Verification tests for the model were performed with a test dataset prepared using sample JNOC-G501 and confidence values for each category in all images, which were calculated by the Classification Unit using the softmax function (Supplementary Table 2). All test images that were predicted either correctly or incorrectly were compiled in the Supplementary Image Dataset. A prediction of the category for each image was made by selecting the ones assigned the highest confidence value. In this study, of the three categories, the ones that are assigned the first or second highest confidence values are referred to as the 1st and 2nd categories, respectively. For convenience, a manually classified category that is assigned to an object for preparing test datasets is referred to as a “True-category” in this paper. The evaluation of the model using a test dataset and the accuracy of the Eucampia Index calculated by the model are described below.
Verification tests of the classification model
The components of the 1st category are shown in Table 1. For each group, i.e., [Terminal], [Intercalary], and [Other particles], 57%, 77%, and 87% of the images were predicted correctly, respectively. The overall accuracy evaluated from these results was 78.8%. The accuracy was not as high as that for the CNN model reported by Bueno et al. (2017), which described the first classification model of diatom valves. However, overall, the correct predictions made by the model showed a tendency towards having higher confidence values than incorrect predictions. The predicted number of each confidence value range estimated for 1st category is shown in Fig. 2. The histogram of the number of incorrectly identified images is uniform, indicating that tens of the images occur almost constantly throughout the confidence value range (0.300–1.00) calculated for the 1st category. The number of correctly classified images increases markedly in the confidence range of 0.800–1.00, and is much higher than the number of incorrectly classified ones. In the confidence value range of 0.300–0.599, 124 images were incorrectly classified and 86 were correctly identified (counted from Supplementary Table 2). These findings imply that there are many incorrect predictions when the confidence value for the 1st category is 0.599 or less, and that the images with relatively high values in the range 0.00–0.499 for 2nd category contain the other two categories including true-categories.
Table 1 also shows the type and number of images assigned to the 2nd category when the 1st category was incorrectly assigned. Approximately 70% of the incorrectly predicted images were correctly identified at the stage of selecting the 2nd category. Of 114 true-[Terminal] images predicted incorrectly as [Intercalary], 113 images were assigned to [Terminal] of the 2nd categories. For all of 54 of the true-[Intercalary] images incorrectly predicted as [Terminal] of the 1st category, [Intercalary] was assigned to the 2nd category. Of 109 true-[Intercalary] images predicted as [Other particles], 103 images were identified as belonging to [Intercalary] of the 2nd category. The reason for more images in the true-[Intercalary] category being recognized as [Other particles] compared to the true-[Terminal] category (Table 1) is because of the greater variety in the shape of intercalary valves. Eucampia antarctica valves are asymmetrical and therefore have a wide range of aspect ratios (Allen et al. 2014).
Figure 3 shows examples that represent trends recognized visually from the classification results, including original images taken by “Collection Pro” and images generated using the Keras package. The confidence values for all three categories are also described in each image in Fig. 3. The confidence value of the image generated by Keras and the original image are slightly different, so the images are found to be identified as different objects by the model. The correctly predicted objects in the [Terminal] and [Intercalary] categories with markedly higher confidence values in the 1st category had better-preserved valves and the images were in focus (Figs. 3(a)-1–3, 3(b)-1–3). Conversely, some objects in the true-[Terminal] category, which were incorrectly predicted as [Intercalary], had horns that were unclear (Figs. 3 (c) and (d)). The true-[Intercalary] objects with longer and unclear horns were incorrectly predicted to be [Terminal] (Figs. 3(e) and 3(f)). Furthermore, the differences in confidence values between the [Terminal] and [Intercalary] categories in these images are smaller than those obtained for the images in Figs. 3(a) and 3(b), indicating that the classification was uncertain. For the out of focus images and images containing more than two particles, [Other particles] was selected as the 1st category (Figs. 3(g) and 3(h)). This classification tendency is probably caused by a criterion in the [Other particles] training dataset, which included out-of-focus and/or two or more particles in an image.
Eucampia Index comparison between automatic and manual counting
The results of the model evaluation revealed that confidence values calculated for the test images reflect the degree of similarity between the three categories, i.e., whether the shape of particles in an image resembles a terminal or intercalary valve, or neither. Moreover, the confidence values can indicate the information that includes even the difficulty in classification because of poor preservation of diatom valves, and out-of-focus of images. Shoji et al. (2018) reported that the relative abundances of each category could be shown as the average confidence values using CNN model that learned outline similarity of particles categorized into four. Thus, the average confidence values obtained for the [Terminal] and [Intercalary] categories in this study must reflect similarly a ratio between the two valve types in an image dataset.
To compare the Eucampia Indexes that counted manually and predicted by the model, each index was calculated based on the abundances of true-[Intercalary] and true-[Terminal] in the test dataset, and the average confidence values obtained for them by the model, respectively (Table 2). The Eucampia Index value derived from the average confidence values is 0.76, and the index value estimated from number true-category images was 0.80. Considering the counting probability error was < ± 0.053 when total of 100 E. antarctica valves were used (Whitehead et al., 2005), this result shows that the Eucampia Index detected automatically using the developed model is comparable to those obtained manually.
Future perspectives for automatic diatom detection using the miCRAD system
This study revealed that a model capable of detecting the ratio of two diatom species can be constructed using the miCRAD system for the first time. The Image Collection Unit in the miCRAD system enables researches not only to obtain cropped object images for training datasets to construct CNN models, but also to conduct automated classification of particle images at the same time of capturing after constructing CNN models (Itaki et al., 2020). Using the model constructed in this study, automatic detection of Eucampia Index from a diatom slide can be applied to a large-scale investigation of the index variation and geographical distribution in the Southern Ocean. Depending on the setting of the classification category, similar method is relevant to investigators who have to process a large number of diatom samples such as for detecting specific species for biostratigraphic and paleoenvironmental studies.
When samples that differ in age and/or sedimentary environment from the specimens used to construct the training dataset are used practically to detect Eucampia Index, loss of model accuracy is presumed to occur if there are a largely different number of images in each category. The test dataset used in this evaluation differed from a normal diatom slide in the number of [Other particles] images. The test dataset used in this evaluation differed from a normal diatom slide in the number of [Other particles] images. The test dataset contained 1311 E. antarctica and 998 [Other particles] images. From a normal diatom slide, for example, from a slide prepared using the site G501 sediment sample, 154 and 1991 images of E. antarctica and [Other particles] were obtained using the Image Collection Unit, respectively. When the images of [Other particles] is detected at overwhelmingly abundant than E. antarctica valves in a slide, then it is predicted that the average confidence values of [Intercalary] and [Terminal] decrease significantly. As a result, the difference between the manually counted Eucampia Index value and the detected value inferred using the average confidence value may be larger because of the relatively larger errors in the average confidence values.
To increase the accuracy of the diatom species detection, it is necessary that many other particles are not captured. Some studies have employed deep-learning-based automatic segmentation techniques to detect each diatom in a field of view (Pedraza et al., 2018; Tang et al., 2018; Ruiz-Santaquiteria et al., 2020). In addition, many studies have been developed the CNN models with sufficient accuracy for diatom classification (e.g., Bueno et al., 2017; Pedraza et al., 2017). These knowledges have contributed the software utility, on the other side, the results of diatom classification using the miCRAD system will contribute the development of devices for practical use. It is expected that new practical and accurate automatic identification and detection techniques will be realized by further development of the miCRAD system once automatic segmentation is implemented or CNN models constructed by other programs of the Classification Unit can be used.