Automatic identification of harmful algae based on multiple convolutional neural networks and transfer learning

The monitoring of harmful phytoplankton is very important for the maintenance of the aquatic ecological environment. Traditional algae monitoring methods require professionals with substantial experience in algae species, which are time-consuming, expensive, and limited in practice. The automatic classification of algae cell images and the identification of harmful phytoplankton images were realized by the combination of multiple convolutional neural networks (CNNs) and deep learning techniques based on transfer learning in this work. Eleven common harmful and 31 harmless phytoplankton genera were collected as input samples; the five CNNs classification models of AlexNet, VGG16, GoogLeNet, ResNet50, and MobileNetV2 were fine-tuned to automatically classify phytoplankton images; and the average accuracy was improved 11.9% when compared to models without fine-tuning. In order to monitor harmful phytoplankton which can cause red tides or produce toxins severely polluting drinking water, a new identification method of harmful phytoplankton which combines the recognition results of five CNN models was proposed, and the recall rate reached 98.0%. The experimental results validate that the recognition performance of harmful phytoplankton could be significantly improved by transfer learning, and the proposed identification method is effective in the preliminary screening of harmful phytoplankton and greatly reduces the workload of professional personnel.


Introduction
Phytoplankton is one of the main primary producers in oceans and lakes and of great value to humans and fishery resources. However, some cyanobacteria can produce toxins that are harmful to other organisms, including humans (Wood 2016), and cyanotoxins can bioaccumulate and their toxic effects may be amplified in the food chain (Ettoumi et al. 2011), which are serious hazards to marine wildlife, coastal shellfish, fish, and other aquatic organisms (Liu et al. 2013). Harmful phytoplankton such as Microcystis sp., Anabaena sp., Aphanizomenon sp., and Oscillatoria sp. are common cyanobacteria (Paerl et al. 2001), some of which can produce neurotoxins and hepatotoxins directly threatening the water supply and the life safety of animals and humans.
In recent years, the number of recorded harmful red tides along the coasts of China is increasing year by year and the area of pollution is constantly expanding ( Xiuwen 2003). The eutrophication of water will trigger the excessive reproduction of phytoplankton and lead to red tide, which destroys the ecological environment and the balance of ecosystems of marine organisms. The method of shortterm early warning of the location and magnitude of harmful algal blooms is valuable to the aquaculture industry, the protection of sustainable fisheries and human health (Wells et al. 2020). Therefore, the early or preliminary monitoring of phytoplankton that are toxic or easily cause red tide is an important measure to protect the water ecosystem and the safety of human life. Governments and relevant environmental protection agencies around the world have gradually made more efforts to contain the environmental problems caused by red tides, and previous methods for phytoplankton monitoring rely on professionals with considerable experience of professional training and practical experimental experience in phytoplankton identification, which involves a timeconsuming, high cost, and labor-intensive process to inspect a limited number of samples. Hence, a variety of automatic identification methods of phytoplankton images have been proposed and made great progress with the maturity of related technologies, such as devices like flow cytometry camera system technology and related convolutional neural networks (CNNs) analysis methods (Dunker et al. 2018). Other analysis and classification methods have also been developed, like support vector machine (Giraldo-Zuluaga et al. 2018) and other types of neural network methods (Henrichs et al. 2021;Liang et al. 2021), etc. The classification method is normally based on the analysis of morphology and texture features in algal microscopic images, fluorescence images, and spectrum analysis, and CNN methods have been introduced to perform the classification (Deglint et al. 2018). However, the number and the types of phytoplankton involved in these identification methods are relatively limited, and few experiments were conducted in these methods to recognize the harmfulness of phytoplankton. Inspired by the methods proposed in previous works, we aim to learn various characteristics of phytoplankton cell images through the five proposed classical CNN models (AlexNet, VGG16, GoogLeNet, ResNet50, and MobileNetV2), and transfer learning method was used to fine-tune and train the model with the limitation of the number of samples and training cost to recognize the collected images. Transfer learning is to transfer knowledge from the learned related tasks in the source domain to the target domain to achieve better learning results in order to reduce the reliance of learners on large amounts of data in the target domain (Zhuang et al. 2020). In addition, harmful phytoplankton were identified based on the proposed method for identifying the harmfulness of phytoplankton cells by combining the identification results of multiple models. Fig. 1 The comparison of the process between manual identification and counting of phytoplankton species and the proposed automatic identification method Figure 1 shows the comparison of the process between manual identification and counting of phytoplankton species and the automatic identification proposed in this paper.
The most commonly performed early monitoring methods are to sample and process water, and then the distribution of phytoplankton in the water samples is observed by naked eyes with the help of microscopes and other instruments in the laboratory; the proposed method in this work can be used to automatically recognize and classify microscopic images of phytoplankton and identify the harmful phytoplankton images in samples to save a lot of time and efforts.
The proposed phytoplankton image classification method has three major contributions compared to the existing methods as follows.
First contribution: The proposed method is the first application of combining multiple CNNs to classify various phytoplankton cell images and identify the harmfulness of phytoplankton images, which can effectively synthesize the representation learning capabilities of multiple CNN models to achieve better recognition results. Second contribution: The proposed method leverages transfer learning technology to train the CNN classification models, which can effectively fine-tune the fully connected layer of CNN structure to solve the problem that training a classification model with strong generalization performance is difficult if the number of phytoplankton images and computing power are limited. Third contribution: The proposed new method can specifically identify the harmfulness of phytoplankton cells in the images based on the recognition results of multiple CNN classification models, which can realize the careful identification of harmful phytoplankton to play a key role in the preliminary screening of water quality monitoring.
The remaining arrangements are as follows. "Material and methods" section describes the use of five CNN models based on transfer learning and the proposed new harmful phytoplankton identification method. "Results and discussion" section describes, analyzes, and discusses the experimental results. Finally, "Conclusion" section summarizes the effectiveness and the practicability of our proposed method and puts forward some assumptions for future work.

Material and methods
The proposed method in this paper aims to explore the feasibility of using deep learning and multiple CNN classification models based on transfer learning to automatically classify different species of phytoplankton and identify harmful phytoplankton. The model architecture of this study is illustrated in Fig. 2. First, the phytoplankton image samples were mainly collected from the database resources of aquatic biology research centers and some enthusiasts' image resources and finally were integrated into an phytoplankton image database through image augmentation processing technology. Images including 11 species of harmful phytoplankton and 31 species of common harmless phytoplankton widely distributed along the coasts of China were used in this study, which were respectively constructed to the training set and the testing set of the classification model at a ratio of 70:30. The convolutional layers in CNN aim to solve two problems: one is that the convolutional layers retain the shape of the input so that the correlation between the pixels of the image in both the height and width directions can be effectively identified; the second is that the convolutional layers repeatedly calculate the same convolution kernel and the input at different positions through a sliding window to avoid the parameter size being too large. Five classical CNN models of AlexNet, VGG16, GoogLeNet, ResNet50, and MobileNetV2 are leveraged for image classification to learn the features of the input phytoplankton image dataset layer by layer, pretrained parameters are used to initialize the parameters of the network in combination with transfer learning, and then all the convolutional layers are frozen and only the fully connected layer of the network structure are trained to gain five CNN classification models for phytoplankton recognition. Different from other studies, considering that the monitoring center pays more attention to harmful phytoplankton in actual water monitoring, the above five CNN classification models based on transfer learning are used to conduct harmfulness identification experiments of the phytoplankton image samples.

Data collection and processing
The image library (7859 images) in this study has two sources of images, approximately 35% of the library (2765 images) were sampled, recorded, and manually recognized between 2013 and 2020, using a standard phytoplankton identification and counting procedure developed by Chinese Research Academy of Environmental Sciences (CRAES). Various fixative solutions and Lugol's solutions are used for sample preparation, then high-resolution microscopic photography was performed using eye lens magnification of 10-20 times and objective lens magnification of 40-160 times. Image quality was generally better in these samples, but there were also differences in image color and resolution caused by the use of various microscopic equipment within the institute. The rest 65% of the library (5094 images) were collected from internet, uploaded by environmental researchers and enthusiasts, covering a broader range of genera. Optical character recognition (OCR) and semantic recognition technology were used to identify the description text of the phytoplankton in the original microscopic images and the classifications were confirmed by manual re-examination. A total of 5 researchers and students with experience in phytoplankton image recognition participated in the manual re-examination work. This team spent approximately 1.5-2 h a day for re-examination, and it took about 1 month to complete the aforementioned 5094 pictures' re-examination.
Thus, 2765 CRAES images and 5094 images from the internet, a total of 7859 phytoplankton images covering 159 genera that have been manually annotated and confirmed, were included in the image library for deep learning training and analysis in this paper. The database used in this paper is available for download from the web page www. kaggle. com/ mengy uy/ the-algae-cell-images.
An immediate problem after the database was created is the number of collected images of 159 genera is significantly imbalanced, some common species has more than 200 images, some are less than 50, and some relatively rare species has less than a dozen images. Since the negative impact of category imbalance on classification performance would emerge with the increase of the tasks size (Buda et al. 2018), image augmentation processing on the dataset is needed to expand the number of small phytoplankton samples to improve the accuracy of the model and reduce the over-fitting. Image augmentation methods were used such as random rotation, brightness variation, and noise addition (random noise and Gaussian blur) to expand the number of phytoplankton samples as shown in Fig. 3, which were randomly combined or superimposed to process phytoplankton images. Despite efforts to increase the number of images per species, 11 of the 159 species still produce fewer than 100 images, and continuing to use augmentation methods to increase the number of images for these species will produce duplicate images that are not meaningful in deep learning model training. Thus, these 11 species were removed from the dataset. The standard dataset including 148 species of phytoplankton and 150 images of each species (22,000 original and augmented images) was finally established by further manually screening to ensure the quality of the pictures and verifying the pictures are in the correct categories.
Finally, based on some preliminary experiments, we found that image recognition and classification modeling for 148 species is very time-consuming, even on state-of-the-art servers. After discussion with experts in algae field exploration, we selected 42 species of phytoplankton, covering estimated 90-95% of phytoplankton environment protection agencies encounter on daily basis in eastern China. The dataset includes all known 11 species of harmful phytoplankton and 31 species of common harmless phytoplankton with a total of 6300 images (including 4410 training data and 1890 test data). The input pictures are 224*224*3 pixels (RGB color images of 224 pixels in length and width) to train and test the classification performance of five CNN network models.

Automatic classification and identification of phytoplankton
With a standard phytoplankton dataset established, the objective is to use five CNN classification models to achieve automatic identification of various phytoplankton genera, among which the toxic phytoplankton species are Microcystis sp., Aphanizomenon sp., Anabaena sp., Nostoc sp., Oscillatoria sp., and Nodularia sp. (Chorus et al. 2000;Grant and Hughes 1953); and the phytoplankton species easily causing harmful red tide in China as the dominant species are Gymnodinium sp., Noctiluca sp., Karenia sp., Skeletonema sp., and Prorocentrum sp. (Baohong et al. 2021), as seen in Fig. 4. As mentioned in the "Introduction," convolutional layers, nonlinear layers, pooling layers, and fully connected layers are included in typical CNNs (Lee and Park 2015). The five CNNs used in this study are AlexNet, VGG16, GoogLeNet, ResNet50, and MobileNetV2, which are widely used in image classification, their network structure, activation function, and main features are summarized in Table 1. The recognition and classification error rate of a single CNN is relatively high, which is mainly because the external morphology and other characteristics of some phytoplankton are similar.
Therefore, five CNNs were used to construct a multi-classification model for comparative experiments to identify and classify phytoplankton images, which is mainly to compare the performance of the five models for phytoplankton recognition and to explore the influence of using transfer learning method on model performance. A method for identifying harmful phytoplankton was proposed in this work to realize the identification of harmful phytoplankton by integrating the results of the phytoplankton cell images in the test dataset with five CNN classification models.

Five CNN classification models
Convolutional neural networks widely used in the field of visual image recognition were proposed in 1998 (LeCun et al. 1998), which is a deep artificial neural network method combining artificial neural networks and deep learning technology. In 2012, AlexNet was proposed (Krizhevsky et al. 2012) which contains 5 convolutional layers, 3 max pooling layers, and 3 fully connected layers in the structure, and the ReLU activation function was proposed which solves the problem of gradient diffusion and overfitting of the Sigmoid function for the deep network and greatly improves the training and convergence speed of the network.
Unlike AlexNet, in order to reduce the number of parameters required in the experiment and expand the local receptive field, two or three 3 × 3 convolution kernels replacing 5 × 5 and 7 × 7 convolution kernels respectively are stacked in VGG (Simonyan and Zisserman 2014) to deepen the network structure to improve the generalization performance of the model. Thirteen convolutional layers, 5 pooling layers, and 3 fully connected layers were contained in the structure of VGG16 network used in this paper.
The network layers in GoogLeNet ) are deeper and the training speed is faster than the two CNNs mentioned above, but the weight parameters are only 1/12 of AlexNet on account of the introduced inception structure which integrates feature information of different scales. The inception structure is a topological structure composed of four parallel branches; the filters of different sizes of 1 × 1, 3 × 3, and 5 × 5 are used in the first three branches to extract feature information of different spatial sizes; the second and third branches are pre-convoluted with 1 × 1 to limit the number of input channels to reduce the computational cost; and the fourth branch is the max pooling layer and after which a 1 × 1 convolutional layer is used to change the number of channels. Finally, the output of the four branches is cascaded to the next inception module. The structure of GoogLeNet is shown in Fig. 5.
In this paper, the GoogLeNet InceptionV1 was selected for experimentation and the batch normalization (BN) layer (Ioffe and Szegedy 2015) was added to reduce the size of the parameters and the influence of parameter initialization. The BN can be expressed as: (1) where E(x (k) ) is the average value of all training sets, var(x (k) ) is the variance value of all training sets, and x (k) is a training batch.
The problem of gradient disappearance, gradient explosion, and degradation caused by the increasing number of network layers were solved by the emergence of ResNet (He et al. 2016). A shortcut structure is added to quickly connect the different layers of the network and

ReLU
The inception structure is introduced to replace the fully connected layer with a 1*1 convolutional layer to achieve a multilayer nesting of (convolutional layer + 1*1 convolutional layer) structure, which greatly reduces the number of parameters. ResNet50 VGG Net + residual structure ReLU A "short-circuit" structure is proposed, i.e., the original signal skips some network layers and passes directly into the deeper layers to avoid signal distortion, accelerates the training efficiency, and improves the training accuracy. MobileNetV2 Depthwise convolution + pointwise convolution ReLU6 Using depth-separable convolution to decompose the normalized convolution into depth-wise convolution and point-wise convolution, with each layer of convolution followed by batch regularization (BN) and ReLU activation functions.

Fig. 5
The structure of GoogLeNet summed with the output of the convolutional layer to fully train the underlying structure so that the accuracy rate is significantly improved as the depth increases. The input can be directly added to the output when the input and output dimensions are the same, which can be calculated as follows: where F is the residual function, x l and x L are the input and the output of the residual unit, respectively, and f is the mentioned activation function ReLU.
When the input and output dimensions are inconsistent, stride = 2 is used to match the dimension with F and then added together, which can be expressed as: where W s is a dimensional transformation of x l . In 2017, Google's MobileNet (Howard et al. 2017) met the lightweight requirements of small parameters and calculation under the conditions of applicating deep learning in mobile and the embedded terminal, depthwise separable convolution was proposed to reduce the number of parameters, and high accuracy was achieved when the complexity of models accepted on the mobile terminal was maintained. Now CNN has made great progress in marine environmental pollution, such as the recognition of floating plastics at the shoreline or the sea surface (Kylili et al. 2020). These five CNNs are relatively mature and strong network models in the field of image classification and recognition, which can effectively extract the feature information of large-scale complex images. Comprehensively considering the characteristics of phytoplankton cell images and the difficulties of phytoplankton recognition in practical work from theory and engineering practice, these classical CNN models are chosen to effectively extract the low-dimensional and highdimensional features of phytoplankton images to automatically identify them, and transfer learning method is used to solve the problem of accurately identifying small samples of phytoplankton images.

Various phytoplankton classification with transfer learning
The classification performance of the model trained on the large dataset is excellent, the basic features of the image such as edges and contours can well be extracted by shallow convolution layers of CNN, and the high-level abstract features of the images can also be extracted by deep convolution layers. Therefore, using model-based transfer learning and retraining the fully connected layers to comprehensively estimate and classify feature combinations can save computing time and resources and accelerate the convergence of the model. The pre-trained weight parameters on the ImageNet dataset were used as the initial weights of the five CNN models in this paper, the convolutional layers in the network were frozen first, and only the fully connected layers of the network model were adjusted and trained to realize the classification of the phytoplankton cell images. The network parameters of the fully connected layers are shown in Table 2.
Three fully connected layers after the convolutional layers were used in AlexNet and in order to reduce the dependence of different neurons and prevent overfitting during training, a dropout layer was added to the structure and the deactivation ratio was set to 0.5. Dropout randomly fails a part of the nodes and the connections between them during training to reduce the risk of overfitting in the machine learning system with a large number of parameters (Srivastava et al. 2014). More nonlinear transformations were added in the VGG model while reducing the network parameters to enhance the ability of learning high-dimensional features of the images, and three fully connected layers and dropout layers were used for the VGG16 model in this work. GoogLeNet can maintain the sparsity of the network structure and use the high computational performance of the dense matrix, so only the fully connected layers were trained and the inactivation ratio of the dropout layer was set to 0.4 in this work. Multiple parameterized layers are used in ResNet50 to learn the residual representation between input and output to ensure that the network layer of l + 1 must contain more information than the l layer, which network increases the time complexity and model size and accelerates the training efficiency of the model, so two fully connected layers and two dropout layers were used after the convolutional layer of the network. MobileNetV2 selected in this paper introduces a residual network on the basis of V1 and adds the inverted residual structure with linear bottleneck, in which the information  (2) ---loss of the activation layer is reduced, memory is greatly saved, and the need for main memory access of embedded hardware design is reduced. A dropout layer was used after the convolutional layer of MobileNetV2 during training and the inactivation rate was 0.5. Softmax classifier mainly used in multi-classification problems was used in the last layer of all five classification models which is a generalization and modification of the logistic regression model, the output is the probability distribution of all categories in the experiment, and the loss function can be expressed as: where N is the tot al number of samples and y i (y i ∈ {1, 2, …, k}) is the label that may be labeled for the sample, j(j ∈ {1, 2, …, k}) represents the label of samples, θ is the training parameters of the model, x is the input, and 1(y i = j) is an indicator function whose value is consistent with the true value in parentheses. p(y i = j| x i ; θ) is the conditional probability of recognizing the type of input sample x as j, and the formula can be expressed as: The classification effect is best when p(y i = j| x i ; θ) of each sample is the biggest. Finally, the error function is minimized by the Adam optimizer. Some source codes of (4) the five CNN classification models are given in the GitHub link of https:// github. com/ Mengy uY330/ Mulit iple-CNNsfor-Algae-Images-Class ifica tion.

Harmful phytoplankton identification with transfer learning
Harmful phytoplankton can quickly become dominant species and multiply in aquatic ecosystems because of the unique physiological structure and ecological habits, and thereby squeeze the living space of other aquatic flora and fauna. When encountering a situation that has a great impact on the water environment, such as disinfection and sudden weather changes, the harmful phytoplankton are often the first to appear. In view of the harmful phytoplankton hazards described above, relevant monitoring departments are more sensitive to harmful phytoplankton in actual water quality monitoring; therefore, the phytoplankton image dataset was reclassified into harmful phytoplankton and harmless phytoplankton in this paper to identify the harmfulness of phytoplankton. Similar to the above-mentioned classification experiments of 11 harmful phytoplankton and harmless phytoplankton, five CNN classification models were leveraged for training and the trained weights of the model were loaded to identify the test dataset of phytoplankton images and get the classification results of five CNN models, respectively. Then the harmful phytoplankton in the samples were identified using the proposed identification method of Fig. 6 The identification process of harmful phytoplankton the harmfulness of phytoplankton by combining five identification results. The process of identifying harmful phytoplankton is shown in Fig. 6. Based on the actual needs of phytoplankton monitoring, a new identification method was proposed that if the cells in the phytoplankton image were recognized as harmful phytoplankton species by at least one CNN model, then the phytoplankton cells in the image were finally identified as harmful phytoplankton; if the identification results of all CNN models are harmless phytoplankton, the phytoplankton cells in the image are finally identified as harmless phytoplankton, which method is the "AND" and "OR" operation of the five results shown in Fig. 6. The possibility of missing harmful phytoplankton in the monitoring of phytoplankton is effectively reduced by this identification method and the workload of professionals was greatly reduced by the preliminary screening of a large number of phytoplankton cells.

Model evaluation
The generalization ability of the model needs to be measured by performance metrics, and the pros and cons of the model's performance can be understood by comparing the indicators of different models in order to further adjust the parameters and gradually optimize our models. The evaluation of the performance is to judge whether the predicted labels match the true labels well. The indicators such as accuracy, precision, and recall are usually used in the field of image recognition to evaluate models.
Accuracy is the percentage of correctly predicted results in the total samples and calculated as follows: where TP, TN, FP, and FN is the number of true positives, true negatives, false positives, and false negatives, respectively. But high accuracy does not mean whether the performance of the model is good or bad with unbalanced samples, so two other indicators are derived: precision and recall.
Precision represents the proportion of samples that are true positives among the samples predicted to be positive, which can be expressed as: In general, the higher precision means the better effect of the model.
Recall is the probability of being predicted as positive samples in actually positive samples, which can be expressed as: Considering that the sensitivity to harmful phytoplankton species is higher in actual phytoplankton monitoring, so we focus more on the recall rate (true positive rate) and false negative rate to measure the recognition performance of harmful phytoplankton to realize the comprehensive monitoring of harmful phytoplankton. False negative rate (FNR) can be expressed as:

Results and discussion
Utilizing the transfer learning method and multiple CNN classification models, two sets of experiments were carried out: in the first set of experiments, a variety of phytoplankton images were classified and recognized and finally were divided into 12 categories, and the classification effects of five CNN models before and after using transfer learning on the phytoplankton images were analyzed; in the second set of experiments, harmful phytoplankton were identified using the proposed identification method and the same algorithms of CNNs and transfer learning as the first experiment. The phytoplankton images were randomly divided into five groups of the training set and testing set and used to conduct five training and testing experiments, respectively, the training samples were randomly shuffled to realize better training effects during the experiment, and the learning rate was 0.0001.

The results and discussion of various phytoplankton classification experiments
The various phytoplankton classification results of different CNN models are shown in Table 3. For 1890 target images to be recognized, the recognition accuracy rates  . 7 The histogram of a precision rates (refer to Eq. (7)), b recall rates (refer to Eq. (8)), and c false negative rates (refer to Eq. (9)) of various phytoplankton identification experiments of five CNN models with transfer learning are considered better than that of not using the transfer learning, and the average accuracy rate improves 11.9%, which proves that leveraging transfer learning can significantly improve the recognition performance of the classification models. By comparing the accuracy rates of the five CNN models with transfer learning, it can be observed that the accuracy of the GoogLeNet model was the highest. Better performance of transfer learning has been noticed based on the above results, thus transfer learning method is further utilized to identify a variety of phytoplankton, and Fig. 7a-c are the histograms of the recognition precision rates, recall rates, and false negative rates of the five optimized CNN models, respectively. Figure 7a clearly shows that the recognition precision rates of Oscillatoria sp. and Gymnodinium sp. are relatively poorer compared with other species of phytoplankton of five classification models, while that of Noctiluca sp. are relatively higher; as seen in Fig. 7b-c, the recall rates of Gymnodinium sp. are relatively poorer and the false negative rates of that are higher than others, which is mainly because of the similarities in the morphology and texture of Gymnodinium sp. with some harmless phytoplankton, while the recognition accuracy rates of Noctiluca sp. are relatively highest and false negative rates are the lowest. Therefore, it can be concluded that Oscillatoria sp. and Gymnodinium sp. should be paid more attention to and monitored by the monitoring personnel, which is mainly because low accuracy rates and high false negative rates mean that these two species of phytoplankton are easily considered harmless to increase the risk of missed detection. Figure 7 demonstrates that all five models perform well in identifying harmless phytoplankton, and the GoogLeNet classification model realizes relatively higher recognition accuracy rates and recall rates for various phytoplankton classification, and the classification effect is the best even for Oscillatoria sp. and Gymnodinium sp. which are difficult to identify for the five models.

The results and discussion of harmful phytoplankton identification experiments
We reclassified phytoplankton image samples into harmful and harmless phytoplankton genus to verify the effectiveness of the harmful phytoplankton identification method proposed in this paper. By studying the results of the above experiments, we can conclude that the classification performance based on transfer learning is significantly better than that of only using CNN models, so the same five CNN models based on transfer learning as the various phytoplankton identification experiments were used to train the training set and to predict the test samples with the proposed harmful phytoplankton identification method. The final results of recognition accuracy rates, precision rates, recall rates, and false negative rates are shown in Tables 4, 5, and 6. Table 4 presents that for the target harmful and harmless phytoplankton images to be recognized, the accuracy rates of the five CNN models achieved 95.3%, 95.5%, 98.1%, 92.7%, and 92.6%, respectively, and the average accuracy rate was 94.8%. By comparing the recognition accuracy of the five CNN models, it is obvious that the accuracy of the Goog-LeNet model was the highest, which was the best performing model for harmful phytoplankton identification. Table 5 shows the results that whether the harmful or harmless phytoplankton images were correctly classified using the proposed harmful phytoplankton identification method for 1890 test samples, and 485 of the 495 harmful phytoplankton images were correctly classified as harmful phytoplankton, and 10 of them were misjudged as harmless phytoplankton; among the 1395 images of harmless phytoplankton, 1016 were correctly classified as harmless phytoplankton, and 379 were mistaken for harmful phytoplankton. Then the accuracy rates, precision rates, recall rates, and false negative rates of the harmful  phytoplankton identification experiment can be calculated based on the classification results in Table 5, as shown in Table 6. It can be observed that the accuracy rate was 79.4%, the recall rate (true positive rate) of harmful phytoplankton was 98.0%, and the false negative rate of that was 2.0%, which results prove that the proposed harmful phytoplankton identification method in this paper can greatly reduce the number of harmful phytoplankton which are misjudged as harmless phytoplankton, and to a large extent meet the requirement of not letting any harmful phytoplankton be overlooked and realize the preliminary investigation of harmful phytoplankton. Although the proposed method for harmful phytoplankton detection is proven to be effective and practical, there are still several problems in current research and development. At present, all test datasets and validation datasets in this paper are based on high-quality images, but in the real-world condition, the quality of image data will be significantly reduced, such as overlapping and shape of the cells missing. Another trend is that fully automatic algae classification and detection instruments are entering the stage of practical application, while generating more data, the image quality of which is not necessarily higher than that of manually processed images. How to filter and utilize these large quantity but low-quality images will be the focus of follow-up research.

Conclusion
In this paper, five CNN models (AlexNet, VGG16, Goog-LeNet, ResNet50, and MobileNetV2) were established to classify and identify harmless phytoplankton and 11 species of harmful phytoplankton which are toxic or easily cause red tides, and comparative experiments were conducted to determining whether transfer learning should be used. The results show that the average accuracy of the five CNN models using the transfer learning improved 11.9% compared with that of the traditional method, which proves the effectiveness of the transfer learning method for the classification performance of the model. In addition, the same CNN models were used to identify the harmfulness of phytoplankton based on transfer learning and a new harmful phytoplankton identification method was proposed, which is that only when the phytoplankton cells of the phytoplankton images were predicted as harmful by more than one CNN model, the phytoplankton cells are judged to be harmful. Finally, the recall rate of 98.0% and the false negative rate of 2.0% are achieved, which indicates that the harmful phytoplankton identification method integrating multiple CNNs proposed in this work is reliable for phytoplankton identification and can play a crucial role in the preliminary screening of practical phytoplankton monitoring and reducing the workload of monitoring personnel. However, the phytoplankton images used in this research are not a relatively complete standard dataset, which may have a negative impact on the final recognition performance. In order to ensure the consistency of the samples in the experiments, a more complete and carefully selected phytoplankton dataset will be leveraged in the future, which may be able to achieve a higher accuracy rate.