Preparing an image dataset for the classification model involved the following three steps: (1) making permanent slides for diatom assemblage analysis, (2) compiling a dataset of digital images of particles in the slides (Figs. 2a–2c), and (3) constructing a deep learning model using the miCRAD system software (Figs. 2d–2g). The details of each step and the evaluation of the model performance are described below.
2.1 Slide preparation
Permanent slides for light microscopy observation were prepared from surface sediment materials. Sediments were collected using gravity and piston cores from the Technology Research Center of Japan National Oil Corporation (JNOC) on the TH83 cruise of R/V Hakurei-maru undertaken in 1983. The sediment samples investigated in this study are mainly collected from the Indian sector of the Southern Ocean and presented in Supplementary Table 1.
The methods for sediment treatment and slide preparation were the same as those used for manual counts of fossil diatoms (Schrader and Gersonde 1978). Approximately 0.1 g of dried sediments were placed in 200 ml beakers containing approximately 1 ml hydrogen peroxide (H2O2, 10%) and hydrochloric acid (HCl, 10%) and boiled to remove organic and calcareous materials. Distilled water was added to a volume of 200 ml and left for 5 h to separate the residues and acidic water. The residue was separated by decanting the supernatant and the beaker was then refilled with distilled water. This process was repeated four times to neutralize the suspension. All processed deposits were then preserved in 4 ml bottles. Permanent slides were made from the deposits in 4 ml bottles. Approximately 10 µl of agitated suspension with 1000 µl distilled water was settled and dried on a 24 × 32 mm coverslip (NEO Micro Cover Glass, Matsunami Glass Industries Ltd., Osaka, Japan). This is a larger coverslip than that typically used for manual observations, and was used to ensure that the particles were sparsely distributed. The coverslip was mounted onto a slide (MICRO SLIDE GLASS, S1214, Matsunami Glass Industries Ltd., Osaka, Japan) with mounting medium (Norland optical adhesive No. 61, refractive index: 1.56). We prepare thirty-eight slides in total and used to collect images for training dataset.
2.2 Image acquisition of E. antarctica and constructing of the training dataset
All particle images in the prepared slides were captured using the image collection unit of the miCRAD system, as described by Itaki et al. (2020a). The image collection unit, which is based on “Collection Pro” from Micro Support co., ltd., automatically acquires digital microscopic images of particles scattered in the observation field using a motorized X-Y stage microscope controlled by a computer (Fig. 2a). The field of view (Fig. 2b) was projected onto a display using a ⋅50 objective lens (N.A. 0.10–0.45, ZOL-50, Sigmakoki Co., Ltd., Tokyo, Japan), a CCD camera (5 million-pixels, STC-MCS500U3V, Omron Sentech Co., Ltd., Kanagawa, Japan), and the transmitted light mode of the software was set to ×6. This setting showed 2,248 ⋅ 2,048 pixels per field of view at a resolution of 0.066 µm/pixel. After scanning a slide, individual particle images were clipped to 512 ⋅ 512 pixels (Fig. 2c). We applied binarization, opening, and closing techniques in the image processing software to each field-of-view image to crop individual particles. When the particles overlap, they are erroneously recognized as a single individual. Therefore, by choosing larger coverslips and making sparse slides, most particles are isolated successfully and captured singly in a clipped image. All training images were compiled in the Supplementary Image Dataset.
2.3 Construction and evaluation of the classification model
The classification model was constructed using the classification unit of the miCRAD system (Itaki et al. 2020a) to distinguish the intercalary (Fig. 2d) and terminal valves (Fig. 2e) of E. antarctica from other particles (Fig. 2f) on a permanent slide. The classification unit consists of the deep learning software “RAPID machine learning” (version 2.1, NEC Corp., Tokyo, Japan), which incorporates as AlexNet (Krizhevsky et al. 2017) based convolutional neural network with approximately 10 layers. To construct a supervised learning model, images in the training dataset were manually labeled when imported into the unit (Fig. 2g).
To distinguish the intercalary and terminal valves of E. antarctica from other sediment particles, three labels were used for image classification ([Terminal], [Intercalary], and [Other particles]). The images showing one terminal or intercalary valve of E. antarctica were assigned the labels [Terminal] and [Intercalary], respectively. This study identified the E. antarctica var. recta and var. antarctica following descriptions in Fryxell and Prasad (1990) and classified them into one group of E. antarctica. For the [Other particles] label, images displaying other diatom species, fragments of diatom valves, and other particles that were not diatoms were selected randomly. Finally, we manually selected 604, 606, and 600 images for [Terminal], [Intercalary], and [Other particles], respectively, in the training dataset (Supplementary Image Dataset).
The model was trained for 100 epochs, a batch size of 1, and a learning rate of 0.01 for stochastic gradient descent (SGD) optimizer were set for the training data sets. These values were chosen as the training loss was low enough. Threefold cross-validation was applied to test generalizability of the model performance. Two-thirds of the training dataset (approximately 400 images per class) was randomly selected and used to train the model and validate the training progress. Subsequently, the model classified the remaining subset (approximately 200 images per class) to evaluate its performance. For every three runs, augmentation was conducted applying the corresponding function of the Keras package (version 2.1.4, Chollet 2015) to compensate for the lack of diversity of training images and improve generalization of the model. New images were generated by random rotating (range 178 degree), random flipping (horizontal and vertical), random brightening or darkening (channel shift range 5), and random horizontal shifting (height and width shift range 0.05) from the original training images. The total number of images in the training dataset amounted to approximately 4,000 for each label.
We repeated this process three times and calculated averages of the following classical measures:
Overall accuracy: the proportion of images identified as true labels to the total number of images classified.
Precision: the proportion of images belonging to a true label among the images classified into the label.
Recall: the proportion of images identified as true labels among the total images belonging to the label.
F1 score: the harmonic average of precision and recall.
2.4 Classification test using an experimental dataset
As an assessment of the model application for the actual count process, we prepared an experimental dataset that contains almost all particle images acquired throughout the scanning of a permanent slide and tested the model classification accuracy. Surface sediments (0–1 cm) from core site G501 off Prydz Bay in East Antarctica were selected for the particle images of the experimental dataset (Supplementary Table 1). The site of the experimental dataset was differentiated from that of the training dataset because the application of the constructed model to unknown sediment samples was considered.
Each particle image in the experimental dataset was prepared using an image collection unit to simulate practical use. We acquired 3,010 images under the same capture conditions as those used for the training dataset. In total, 35, 104, and 2,881 images of [Terminal], [Intercalary], and [Other particles], respectively, were manually identified from all particle images.
This classification test used a total of 125 images of E. antarctica, which is sufficient to represent the Eucampia index of each sediment sample (Whitehead et al. 2005). Only the girdle view (Figs. 2d and 2e) was counted as one image of E. antarctica in the test. Images of valves smaller than half of the fragment were removed from the training and experimental datasets.