Preliminary study for counting fossil diatoms using a deep learning system: An approach to automated estimation of a paleoenvironmental index

doi:10.21203/rs.3.rs-2469147/v1

Download PDF

Research Article

Preliminary study for counting fossil diatoms using a deep learning system: An approach to automated estimation of a paleoenvironmental index

https://doi.org/10.21203/rs.3.rs-2469147/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Two types (intercalary and terminal) of valves of Eucampia antarctica, a species of diatom, have shown potential as paleoenvironmental tools in the Southern Ocean. Taxonomists have counted the valves manually; however, they have required considerable time to assess the relationship between the ratio of the valves and environmental factors. Here, we present an end-to-end automatic approach for counting E. antarctica using the microfossil classification and rapid accumulation device (miCRAD) system, which enables model classification while acquiring microscopic images. We constructed a deep learning-based model for identifying the intercalary and terminal valves of E༎antarctica in a diatom assemblage. Additionally, we tested whether the constructed model functions as a manual count using an experimental image dataset containing all particle images acquired during the whole-scanning of permanent slides. Following cross-validation to verify the model performance potential, the model accuracy reached 0.92 with the use of the training images. The proportion of intercalary valves to all E. antarctica valves (i.e., a total of terminal and intercalary valves) calculated from the model counts yielded 0.55 on average, showing a + 0.05% difference against the actual value of 0.50. However, using the experimental dataset, the model classifications performed worse than the ones estimated based on the cross-validation. The lower performance was attributed to the imbalanced class dataset from the whole-scanning of permanent slides, which includes many other particles. This experiment demonstrated that the classification model constructed with miCRAD system has comparable performance in predicting E.antarctica valves to manual counting; however, screening images before the classification step will be necessary to completely automate the classification.

Artificial intelligence

Automated classification

Diatoms

Microfossils

Paleoenvironment

Numerous fossil diatom studies have focused on assemblage structure, abundance, and morphometric changes. Such analyses provide evidence for ecological, paleoenvironmental, and chronological constraints. Considering studies on sea surface environmental indicators, Eucampia antarctica, a diatom species endemic to the Southern Ocean, has shown potential as an index for estimating paleo sea ice coverage and sea surface temperature (SST) (Kaczmarska et al. 1993; Whitehead et al. 2005; Allen 2014). Eucampia antarctica cells consist of two valves, grow in chains (Fig. 1), and vary in chain length across the Southern Ocean (Fryxell and Prasad 1990). The difference in chain length mainly arises from the ratio of morphologically and biogeographically different two varieties E. antarctica (Castracane) Mangin var. antarctica and E. antarctica var. recta (Mangin) Fryxell & Prasad (Fryxell and Prasad 1990; Fryxell 1991). The chain length of E. antarctica can be estimated by counting two types of valves: intercalary valves, which are located at connecting points between adjacent cells, and terminal valves, which are located at the ends of chains (Fig. 1). However, counting the number of these valves in diatom slides to estimate chain length is a time-consuming manual process, a substantial amount of time is required when the relative abundance of E. antarctica in a diatom assemblage and/or diatom abundance in a sediment sample is low. In general, 100–300 counted valve data points are needed from hundreds of sites to verify elucidate the relationship between the occurrence of a particular species and environmental factors in the Southern Ocean (e.g., Zielinski and Gersonde 1997; Armand et al. 2005; Gersonde et al. 2005; Esper and Gersonde 2014).

To address a time-consuming manual process, some studies have introduced machine learning techniques to automate modern microorganism and microfossil classifications (e.g., Culverhouse et al. 1996; Yu et al.1996; Beaufort and Dollfus 2004; Schulze et al. 2013). The current approach involves the application of deep learning, a supervised machine learning technique, to automate taxonomic classification (Dollfus and Beaufort 1999; Pedraza et al. 2017; Hsiang et al. 2019; Bourel et al. 2020; Itaki et al. 2020a; Marchant et al. 2020; Tetard et al. 2020). Constructing and using a deep learning model for image classification has become a user-friendly method for taxonomists owing to the increased availability of deep learning software libraries (for example. Keras) and simple adjustment of the classification criteria, such as labeling images and loading the images into the software. However, with regard to automatic diatom classification, few studies using deep learning (e.g., Pedraza et al. 2017; Kloster et al. 2020) have been conducted as most studies have dealt with discriminants that describe the shape and texture of each species (e.g., du Buf and Bayer 2002; Pappas and Stoemer 2003; Bueno et al. 2017).

An image acquisition technique has been contributed to rapidly construct a deep learning model and automate a general analysis workflow, including classifying and counting each particle in a slide. In the case of fossil radiolarians and foraminifera, Tetard et al. (2020) and Marchant et al. (2020) presented a classification workflow using the deep learning classification software "SYRACO" and an automatic image acquisition system. Combining the automated workflow of image acquisition, classification, counting, and image analysis, Beaufort et al. (2022) prepared assemblage data using millions of biometric features of fossil coccoliths. However, these studies employed separate software for automated processing, image acquiring, and classifiaction. Recently, Itaki et al. (2020a) developed "microfossil classification and rapid accumulation device (miCRAD)” system that enables an end-to-end automated workflow. The miCRAD system enables model classification while acquiring microscopic images (Itaki et al. 2020a). The miCRAD system was used to detect the relative abundance of Cycladophora davisiana in fossil radiolarian assemblages (Itaki et al. 2020b). These radiolarian studies have proposed a helpful process for assemblage analysis; however, fossil diatoms have yet to be examined.

In this study, we used miCRAD systems to construct a diatom classification model to establish an automatic count of two types of E. antarctica valves in a permanent slide. First, we present a protocol that acquires images, constructs a classification model, and classifies particle images from the scanned permanent slides. Subsequently, we show the model performance when the constructed model classifies the unknown image dataset (referred to as the experimental dataset in this study). Finally, we discuss the limitations and measures against the practical use of a deep learning model for counting E. antarctica.

Preparing an image dataset for the classification model involved the following three steps: (1) making permanent slides for diatom assemblage analysis, (2) compiling a dataset of digital images of particles in the slides (Figs. 2a–2c), and (3) constructing a deep learning model using the miCRAD system software (Figs. 2d–2g). The details of each step and the evaluation of the model performance are described below.

2.1 Slide preparation

Permanent slides for light microscopy observation were prepared from surface sediment materials. Sediments were collected using gravity and piston cores from the Technology Research Center of Japan National Oil Corporation (JNOC) on the TH83 cruise of R/V Hakurei-maru undertaken in 1983. The sediment samples investigated in this study are mainly collected from the Indian sector of the Southern Ocean and presented in Supplementary Table 1.

The methods for sediment treatment and slide preparation were the same as those used for manual counts of fossil diatoms (Schrader and Gersonde 1978). Approximately 0.1 g of dried sediments were placed in 200 ml beakers containing approximately 1 ml hydrogen peroxide (H2O2, 10%) and hydrochloric acid (HCl, 10%) and boiled to remove organic and calcareous materials. Distilled water was added to a volume of 200 ml and left for 5 h to separate the residues and acidic water. The residue was separated by decanting the supernatant and the beaker was then refilled with distilled water. This process was repeated four times to neutralize the suspension. All processed deposits were then preserved in 4 ml bottles. Permanent slides were made from the deposits in 4 ml bottles. Approximately 10 µl of agitated suspension with 1000 µl distilled water was settled and dried on a 24 × 32 mm coverslip (NEO Micro Cover Glass, Matsunami Glass Industries Ltd., Osaka, Japan). This is a larger coverslip than that typically used for manual observations, and was used to ensure that the particles were sparsely distributed. The coverslip was mounted onto a slide (MICRO SLIDE GLASS, S1214, Matsunami Glass Industries Ltd., Osaka, Japan) with mounting medium (Norland optical adhesive No. 61, refractive index: 1.56). We prepare thirty-eight slides in total and used to collect images for training dataset.

2.2 Image acquisition of E. antarctica and constructing of the training dataset

All particle images in the prepared slides were captured using the image collection unit of the miCRAD system, as described by Itaki et al. (2020a). The image collection unit, which is based on “Collection Pro” from Micro Support co., ltd., automatically acquires digital microscopic images of particles scattered in the observation field using a motorized X-Y stage microscope controlled by a computer (Fig. 2a). The field of view (Fig. 2b) was projected onto a display using a ⋅50 objective lens (N.A. 0.10–0.45, ZOL-50, Sigmakoki Co., Ltd., Tokyo, Japan), a CCD camera (5 million-pixels, STC-MCS500U3V, Omron Sentech Co., Ltd., Kanagawa, Japan), and the transmitted light mode of the software was set to ×6. This setting showed 2,248 ⋅ 2,048 pixels per field of view at a resolution of 0.066 µm/pixel. After scanning a slide, individual particle images were clipped to 512 ⋅ 512 pixels (Fig. 2c). We applied binarization, opening, and closing techniques in the image processing software to each field-of-view image to crop individual particles. When the particles overlap, they are erroneously recognized as a single individual. Therefore, by choosing larger coverslips and making sparse slides, most particles are isolated successfully and captured singly in a clipped image. All training images were compiled in the Supplementary Image Dataset.

2.3 Construction and evaluation of the classification model

The classification model was constructed using the classification unit of the miCRAD system (Itaki et al. 2020a) to distinguish the intercalary (Fig. 2d) and terminal valves (Fig. 2e) of E. antarctica from other particles (Fig. 2f) on a permanent slide. The classification unit consists of the deep learning software “RAPID machine learning” (version 2.1, NEC Corp., Tokyo, Japan), which incorporates as AlexNet (Krizhevsky et al. 2017) based convolutional neural network with approximately 10 layers. To construct a supervised learning model, images in the training dataset were manually labeled when imported into the unit (Fig. 2g).

To distinguish the intercalary and terminal valves of E. antarctica from other sediment particles, three labels were used for image classification ([Terminal], [Intercalary], and [Other particles]). The images showing one terminal or intercalary valve of E. antarctica were assigned the labels [Terminal] and [Intercalary], respectively. This study identified the E. antarctica var. recta and var. antarctica following descriptions in Fryxell and Prasad (1990) and classified them into one group of E. antarctica. For the [Other particles] label, images displaying other diatom species, fragments of diatom valves, and other particles that were not diatoms were selected randomly. Finally, we manually selected 604, 606, and 600 images for [Terminal], [Intercalary], and [Other particles], respectively, in the training dataset (Supplementary Image Dataset).

The model was trained for 100 epochs, a batch size of 1, and a learning rate of 0.01 for stochastic gradient descent (SGD) optimizer were set for the training data sets. These values were chosen as the training loss was low enough. Threefold cross-validation was applied to test generalizability of the model performance. Two-thirds of the training dataset (approximately 400 images per class) was randomly selected and used to train the model and validate the training progress. Subsequently, the model classified the remaining subset (approximately 200 images per class) to evaluate its performance. For every three runs, augmentation was conducted applying the corresponding function of the Keras package (version 2.1.4, Chollet 2015) to compensate for the lack of diversity of training images and improve generalization of the model. New images were generated by random rotating (range 178 degree), random flipping (horizontal and vertical), random brightening or darkening (channel shift range 5), and random horizontal shifting (height and width shift range 0.05) from the original training images. The total number of images in the training dataset amounted to approximately 4,000 for each label.

We repeated this process three times and calculated averages of the following classical measures:

Overall accuracy: the proportion of images identified as true labels to the total number of images classified.

Precision: the proportion of images belonging to a true label among the images classified into the label.

Recall: the proportion of images identified as true labels among the total images belonging to the label.

F1 score: the harmonic average of precision and recall.

2.4 Classification test using an experimental dataset

As an assessment of the model application for the actual count process, we prepared an experimental dataset that contains almost all particle images acquired throughout the scanning of a permanent slide and tested the model classification accuracy. Surface sediments (0–1 cm) from core site G501 off Prydz Bay in East Antarctica were selected for the particle images of the experimental dataset (Supplementary Table 1). The site of the experimental dataset was differentiated from that of the training dataset because the application of the constructed model to unknown sediment samples was considered.

Each particle image in the experimental dataset was prepared using an image collection unit to simulate practical use. We acquired 3,010 images under the same capture conditions as those used for the training dataset. In total, 35, 104, and 2,881 images of [Terminal], [Intercalary], and [Other particles], respectively, were manually identified from all particle images.

This classification test used a total of 125 images of E. antarctica, which is sufficient to represent the Eucampia index of each sediment sample (Whitehead et al. 2005). Only the girdle view (Figs. 2d and 2e) was counted as one image of E. antarctica in the test. Images of valves smaller than half of the fragment were removed from the training and experimental datasets.

3.1 Model performance

The results of the model training are listed in Table 1. The number of images predicted by the models and all the measures calculated for each run are listed in Supplementary Table 2. The precisions for [Terminal], [Intercalary], and [Other particles] were estimated to be 0.94, 0.85, and 0.99, respectively, and the recalls were calculated as 0.86, 0.95, and 0.96, respectively. The F1-scores for [Terminal], [Intercalary], and [Other particles] were 0.90, 0.90, and 0.97, respectively. The overall accuracy was 0.92, indicating an acceptable performance comparable to that reported for diatom classification models (Pedraza et al. 2017; Kloster et al. 2020).

Both the precision and recall performance of [Intercalary] were comparable to those of [Terminal] and are desirable for accurately estimating the ratio of terminal and intercalary valves, as are the higher F1-scores. In this experiment, the average proportion of intercalary valves to the Eucampia valves (i.e., the total number of terminal and intercalary valves) was 0.55, showing a + 0.05% difference against the actual value of 0.50. Considering that the standard error for the proportion was reported as ± 0.044 (Whitehead et al. 2005), the model prediction showed a value comparable to that of manual counting.

Table 1. Classification performance results shown by the trained model using the miCRAD system. The average accuracy and proportion of intercalary valves were calculated as 0.92 and 0.55, respectively, based on the model prediction results shown in Supplementary materials.

3.2 Evaluating the classification test using an experimental dataset

The model that demonstrated potential performance in the previous section was applied to the experimental dataset. Table 2 presents a confusion matrix that includes the number and proportion of images used for the classification test. The overall accuracy was 0.84, which was lower than the model performance estimation calculated from Table 1. The recalls for [Terminal], [Intercalary], and [Other particles] were 0.80, 0.77, and 0.85, respectively, and the performance was maintained rather accurately. Conversely, the precisions for [Terminal], [Intercalary], and [Other particles] were calculated to be 0.08, 0.25, and 1.00, respectively, indicating an imbalance in performance between the three labels. Owing to these lower precision values, the F1-scores for [Terminal], [Intercalary], and [Other particles] were 0.15, 0.38, and 0.92, respectively. The proportion of intercalary to total Eucampia valves was predicted to be 0.57, which differs from the true value of 0.81. This difference could be sufficient to mislead the paleoenvironment inferred from the Eucampia index model (Whitehead et al. 2005).

Table 2. Confusion matrix of classification test using the practical dataset. The value shown in the table are proportions of the selected labels to total true labels. The number of images used in the test and classified by the model are shown in each parenthesis.

The classification results of the experimental dataset provided lower precision and accuracy than the cross-validation results (Table 1). The model accuracy calculated at 0.84 may provide unacceptable classification results for taxonomic specialists. The decrease in these values could have occurred due to the slight difference in lighting conditions, fossil preservation, and other particle composition. As Marchant et al. (2020) has established, changes in the background of the particles can cause misclassification; therefore, the consistent imaging setup is essential to maintain accurate classification. The images in the training dataset vary in brightness depending on the sediment composition and operating time (Figs. 2d–2f). Therefore, it is necessary to establish a technique to collect training data optimized for the analysis sample; for example, collecting training images from rim samples of core, which includes layer-mixed components through the core for analyzing image samples from the sediments.

The precision of [Terminal] and [Intercalary] showed a significant decrease compared with recall (Table 2), and the degree of each error was inconsistent. Consequently, the model prediction had an error of -0.24% in estimating the proportion of intercalary to total Eucampia valves. This estimation was mainly attributed to the number of false positives for [Terminal] and [Intercalary]. The number of [Other particles] accounted for 2,881 out of 3,010 images in the experimental data set. Therefore, the number of [Other particles] classified into [Terminal] and [Intercalary] was much larger than those of true positive for them, even if the recall of [Other particles] was high (Table 2).

The model classification results using the experimental dataset revealed that the performance (i.e., accuracy, recall, and precision) estimated by cross-validation was not consistent with those calculated using imbalanced datasets obtained from full scanning images of permanent slides. Classification models with high accuracy have been reported using diatom assemblages (Pedraza et al. 2017; Kloster et al. 2020); however, those results were obtained by the usual methods of cross-validation among the image dataset for model constructing, rather than by a test using full scanning images of permanent slides. Although these models are suitable for classifying image datasets that include well-preserved diatom valves, they may not be sufficient to automatically classify fossil diatoms in sediments, which comprise various components.

Considering that the classification results of the experimental (unknown) dataset are insufficient for E. antarctica analysis despite adequate model performance, end-to-end automated classification of E. antarctica should introduce two approaches: adding more diversified images to the training dataset and setting up a screening process for images before classification. The latter is the most technically feasible and suitable multipurpose approach. One improvement is the combination of image segmentation with the existing morphometric analysis. The morphometric approach has been used to quantify the differences among diatom taxa in numerous studies (e.g., Pappas et al. 2014), and the utility of automated measurements has been assessed (Spaulding et al. 2012; Kloster et al. 2014; 2017). Another improvement could be through the use of a cell sorter to obtain suspension samples consisting of diatom aggregates (Ijiri et al. 2021) before slide preparation. This approach is limited to users who need to classify disc-shaped diatoms with species-specific optical properties. Challenges remain in the preparation of fossil diatom samples and images to completely automate classification workflows using deep learning.

Here, a deep learning-based model for distinguishing intercalary and terminal valves of the diatom species E. antarctica from other particles was constructed using the miCRAD system, which previously succeeded in automating radiolarian assemblage analysis. The model performance was estimated using three-fold cross-validation. Subsequently, as a test for practical use, an experimental dataset prepared from the scanning of permanent slides was classified by the constructed model. Based on the result of three-fold cross-validation, the accuracy of model classification was estimated to be 0.92, and the proportion of intercalary valves to the Eucampia valves (i.e., the total number of terminal and intercalary valves) was predicted as 0.55 on average, showing a + 0.05% difference against the actual value of 0.50. The model classification results using the experimental dataset showed lower performance than the estimated one based on cross-validation because the whole-scanning of permanent slides involves numerous other particles and provides an imbalanced class dataset. This experiment established that the classification model constructed using the miCRAD system has a comparable performance in predicting E.antarctica valves to manual counting; however, screening images will be necessary before and after using the miCRAD system to fully automate the classification. As the next step for the practical usage of the system, we plan to improve the image collection step to combine automatic slide scanning and image segmentation to introduce morphometric analysis.

Availability of data and material

The training datasets and the constructed CNN model used in this study are available upon reasonable request from the corresponding author.

Competing interests

The authors declare that they have no competing interest.

Funding

This work was supported by JSPS KAKENHI Grant Numbers 17H06318 and 18H01329 and 21H01201.

Authors' contributions

SI proposed the topic, conceived and carried out the experimental study. TI led the development of the microscope system, helped in the interpretation of the experimental study, and acquired funding for the experiment. DH and YT are developers of “RAPID machine learning” (NEC Corp.) and help with the model-construction process. All authors read and approved the final manuscript.

Acknowledgements

The authors are grateful to Minoru Ikehara of Kochi University and Saiko Sugisaki of Geological Survey of Japan for compiling information of JNOC sediment cores, and Masato Ito of Japan Agency for Marine-earth Science and Technology for obtaining sediment samples and data at 59th Japanese Antarctic Research Expedition. We also would like to thank Hitomi Yamazaki for their assistance in the laboratory experiments.

Allen CS (2014) Proxy development: a new facet of morphological diversity in the marine diatom Eucampia antarctica (Castracane) Mangin. J Micropalaeontol 33:131–142. 10.1144/jmpaleo2013-025
Armand LK, Crosta X, Romero O, Pichon JJ (2005) The biogeography of major diatom taxa in Southern Ocean sediments: 1. Palaeogeogr Palaeoclimatol Palaeoecol 223:93–126. 10.1016/j.palaeo.2005.02.015
Beaufort L, Bolton CT, Sarr AC, Suchéras-Marx B, Rosenthal Y, Donnadieu Y, Barbarin N, Bova S, Cornuault P, Gally Y, Gray E, Mazur JC, Tetard M (2022) Cyclic evolution of phytoplankton forced by changes in tropical seasonality. Nature 601:79–84. 10.1038/s41586-021-04195-7
Beaufort L, Dollfus D (2004) Automatic recognition of coccoliths by dynamical neural networks. Mar Micropaleontol 51:57–73. 10.1016/j.marmicro.2003.09.003
Bourel B, Marchant R, de Garidel-Thoron T, Tetard M, Barboni D, Gally Y, Beaufort L (2020) Automated recognition by multiple convolutional neural networks of modern, fossil, intact and damaged pollen grains. Comput Geosci 140:104498. 10.1016/j.cageo.2020.104498
Bueno G, Deniz O, Pedraza A, Ruiz-Santaquiteria J, Salido J, Cristóbal G, Borrego-Ramos M, Blanco S (2017) Automated diatom classification (Part A): handcrafted feature approaches. Appl Sci 7:753. 10.3390/app7080753
Chollot F (2015) Keras: Deep learning library for theano and tensorflow. https://github.com/fchollet/keras, Accessed 18th February 2021
Culverhouse PF, Simpson RG, Ellis R, Lindley JA, Williams R, Parisini T, Reguera B, Bravo I, Zoppoli R, Earnshaw G, McCall H, Smith G (1996) Automatic classification of field-collected dinoflagellates by artificial neural network. Mar Ecol Prog Ser 139:281–287. 10.3354/meps139281
Dollfus D, Beaufort L (1999) Fat neural network for recognition of position-normalised objects. Neural Netw 12:553–560. 10.1016/S0893-6080(99)00011-8
du Buf H, Bayer MM (2020) Automatic diatom identification vol. 51. World Scientific. doi:10.1142/4907
Esper O, Gersonde R (2014) Quaternary surface water temperature estimations: new diatom transfer functions for the Southern Ocean. Palaeogeogr Palaeoclimatol Palaeoecol 414:1–19. 10.1016/j.palaeo.2014.08.008
Fryxell GA, Prasad AKSK (1990) Eucampia antarctica var. recta (Mangin) stat. nov. (Biddulphiaceae, Bacillariophyceae): life stages at the Weddell Sea ice edge. Phycologia 29:27–38. 10.2216/i0031-8884-29-1-27.1
Gersonde R, Crosta X, Abelmann A, Armand L (2005) Sea-surface temperature and sea ice distribution of the Southern Ocean at the EPILOG Last Glacial Maximum—a circum-Antarctic view based on siliceous microfossil records. Quat Sci Rev 24:869–896. 10.1016/j.quascirev.2004.07.015
Hsiang AY, Brombacher A, Rillo MC, Mleneck-Vautravers MJ, Conn S, Lordsmith S, Jentzen A, Henehan MJ, Metcalfe B, Fenton IS, Wade BS, Fox L, Meilland J, Davis CV, Baranowski U, Groeneveld J, Edgar KM, Movellan A, Aze T, Dowsett HJ, Miller CG, Rios N, Hull PM (2019) Endless forams: > 34,000 modern planktonic foraminiferal images for taxonomic training and automated species recognition using convolutional neural networks. Paleoceanogr Paleoclimatol 34:1157–1177. 10.1029/2019PA003612
Ijiri A, Izumi T, Morono Y, Kato Y, Terada T, Ikehara M (2021) Purification of Disc-Shaped Diatoms from the Southern Ocean Sediment by a Cell Sorter to Obtain an Accurate Oxygen Isotope Record. ACS Earth Space Chem 5:2792–2806. 10.1021/acsearthspacechem.1c00201
Itaki T, Taira Y, Kuwamori N, Maebayashi T, Takeshima S, Toya K (2020a) Automated collection of single species of microfossils using a deep learning–micromanipulator system. Prog Earth Planet Sci 7:1–7. 10.1186/s40645-020-00332-4
Itaki T, Taira Y, Kuwamori N, Saito H, Ikehara M, Hoshino T (2020b) Innovative microfossil (radiolarian) analysis using a system for automated image collection and AI-based classification of species. Sci rep Sci Rep 10:21136. 10.1038/s41598-020-77812-6
Kaczmarska I, Barbrick NE, Ehrman JM, Cant GP (1993) Eucampia Index as an indicator of the Late Pleistocene oscillations of the winter sea-ice extent at the ODP Leg119 Site 745B at the Kerguelen Plateau. In: van Dam H. (eds) Twelfth International Diatom Symposium. Developments in Hydrobiology, Springer, Dordrecht. doi:10.1007/978-94-017-3622-0_13
Kloster M, Esper O, Kauer G, Beszteri B (2017) Large-scale permanent slide imaging and image analysis for diatom morphometrics. Appl Sci 7:330. 10.3390/app7040330
Kloster M, Kauer G, Beszteri B (2014) Sherpa: an image segmentation and outline feature extraction tool for diatoms and other objects. BMC Bioinform 15:218. 10.1186/1471-2105-15-218
Kloster M, Langenkämper D, Zurowietz M, Beszteri B, Nattkemper TW (2020) Deep learning-based diatom taxonomy on virtual slides. Sci rep Sci Rep 10:14416. 10.1038/s41598-020-71165-w
Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60:84–90. 10.1145/3065386
Marchant R, Tetard M, Pratiwi A, Adebayo M, de Garidel-Thoron T (2020) Automated analysis of foraminifera fossil records by image classification using a convolutional neural network. J Micropalaeontol 39:183–202. 10.5194/jm-39-183-2020
Pappas J, Kociolek P, Stoermer EF (2014) Quantitative morphometric methods in diatom research. Nova Hedwig 143:281–306. 10.1127/1436-7270/2014/015
Pappas JL, Stoermer EF (2003) Legendre shape descriptors and shape group determination of specimens in the Cymbella cistula species complex. Phycologia 42:90–97. 10.2216/i0031-8884-42-1-90.1
Pedraza A, Bueno G, Deniz O, Cristóbal G, Blanco S, Borrego-Ramos M (2017) Automated diatom classification (Part B): a deep learning approach. Appl Sci 7:460. 10.3390/app7050460
Schrader H, Gersonde R (1978) Diatoms and silicoflagellates. In: Zachariasse, ed. Microplaeontological counting methods and techniques-an excercise on an eight metre section of the lower Pliocene of Capo Rossello. Sicily. Utrecht micropal. Bull vol. 17, pp 129–176
Schulze K, Tillich UM, Dandekar T, Frohme M (2013) PlanktoVision-an automated analysis system for the identification of phytoplankton. BMC Bioinform 14:115. 10.1186/1471-2105-14-115
Spaulding SA, Jewson DH, Bixby RJ, Nelson H, McKnight DM (2012) Automated measurement of diatom size. Limnol Oceanogr Methods 10:882–890. 10.4319/lom.2012.10.882
Tetard M, Marchant R, Cortese G, Gally Y, de Garidel-Thoron T, Beaufort L (2020) A new automated radiolarian image acquisition, stacking, processing, segmentation and identification workflow. Clim Past 16:2415–2429. 10.5194/cp-16-2415-2020
Whitehead JM, Wotherspoon S, Bohaty SM (2005) Minimal Antarctic sea ice during the Pliocene. Geology 33:137–140. 10.1130/G21013.1
Yu S, Saint-Marc P, Thonnat M, Berthod M (1996) Feasibility study of automatic identification of planktic foraminifera by computer vision. J Foram Res 26:113–123. 10.2113/gsjfr.26.2.113

Graphicalabstractfig2.tif
Supplementarytable1.xlsx
Supplementary Table 1. Sediment sample list used for the training and testing datasets of the constructed model. JNOC: Technology Research Center of Japan National Oil Corporation. JARE 59: 59th Japanese Antarctic Research Expedition conducted between 2017 and 2018.
Supplementarytable2.xlsx
Supplementary Table 2. Three-fold cross-validation results were obtained when the model was constructed. ter: terminal valve; int: intercalary valve; oth: other particles.
Supplementarytable3.xlsx
Supplementary Table 3. Classification results using the experimental dataset. “True labels” are the manually classified label used to compare the model classification. Object names correspond to those in the Supplementary image dataset.
Supplementaryimagedataset.docx

Download PDF

Version 1

posted

You are reading this latest preprint version

Preliminary study for counting fossil diatoms using a deep learning system: An approach to automated estimation of a paleoenvironmental index

Status:

Version 1

Abstract

Figures

1 Introduction

2 Methods

2.1 Slide preparation

2.2 Image acquisition of E. antarctica and constructing of the training dataset

2.3 Construction and evaluation of the classification model

2.4 Classification test using an experimental dataset

3 Results And Discussion

3.1 Model performance

3.2 Evaluating the classification test using an experimental dataset

4 Conclusion

Declarations

Availability of data and material

Competing interests

Funding

Authors' contributions

Acknowledgements

References

Supplementary Files

Status:

Version 1