Unsupervised deep learning of bright-field images for apoptotic cell classification

The classification of apoptotic and living cells is significant in drug screening and treating various diseases. Conventional supervised methods require a large amount of prelabelled data, which are often costly and consume immense human resources in the biological field. In this study, unsupervised deep learning algorithms were used to extract cell characteristics and classify cells. A model integrating a convolutional neural network and an autoencoder network was utilised to extract cell characteristics, and a hybrid clustering approach was employed to obtain cell feature clustering results. Experiments on both public and private datasets revealed that the proposed unsupervised strategy performs well in cell categorisation. For instance, in the public dataset, our method obtained a precision of 96.72% on only 1000 unlabelled cells. To the best of our knowledge, this is the first time unsupervised deep learning has been applied to distinguish apoptosis and live cells with high accuracy.


Introduction
Apoptosis is the cell-independent orderly death that occurs to resist external stimulation and maintain the homeostasis of the internal environment, which is often referred to as programmed death. Unlike other methods of cell death, apoptosis is a self-protection mechanism, activated, expressed and regulated by specific genes [1,2]. The term apoptosis was first coined by Kerr et al. in 1972 to describe a new morphological feature of cell death [3]. Caenorhabditis elegans' programmed cell death has led to a better understanding of apoptosis in mammals [4]. Physiologically, apoptosis plays a crucial role in the growth, development and evolution of organisms [5]. The process of apoptosis maintains homeostasis and the dynamic balance of cell number in the body, but it can also be used as a defence mechanism to eliminate unnecessary or unwanted cells [6]. Appearing in cell apoptosis, cell shrinkage, smaller volume, nuclear enrichment, B Tongsheng Chen chentsh@scnu.edu.cn 1 Guangdong Provincial Key Laboratory of Laser Life Science, College of Biophotonics, South China Normal University, Guangzhou 510631, China 2 SCNU Qingyuan Institute of Science and Technology Innovation Co., Ltd., South China Normal University, Qingyuan 511500, China nuclear membrane nucleoli and DNA fragmentation, then the cell cleaves into apoptotic corpuscles, which are formed by the cell membrane-enclosed cytoplasm, organelles and broken nucleus, and eventually, the apoptotic body is recognised around the macrophage, which is swallowed, resulting in degradation. The cell membrane structure is complete during apoptosis, no contents are spilled, no cytokines are released, the duration is short, and there is no inflammatory reaction around. There are several signalling pathways involved in apoptosis. Apoptosis in cells can be triggered by either the caspase-mediated extrinsic or intrinsic pathways. Both pathways converge to activate effector apoptotic caspases, ultimately resulting in morphological and biochemical cellular alterations and characteristics of apoptosis [7].
Apoptosis is a rational and active decision made to sacrifice specific cells for the greater benefit of an organism [8]. Detection of apoptotic cell death in cells and tissues has become of paramount importance in many fields of modern biology, including studies of embryonic development, degenerative disease and cancer biology [9]. TEM is regarded as the gold standard for conforming apoptosis. This is because the categorisation of an apoptotic cell is irrefutable if the cell exhibits certain ultrastructural morphological characteristics [10]. Cost, time and only being able to analyse a small area at a time are the major disadvantages of TEM. The TUNEL (Terminal dUTP Nick End-Labelling) method is used to assay the endonuclease cleavage products by enzymatically end labelling the DNA strand breaks [11]. Terminal transferase is used to add labelled UTP to the 3 -end of the DNA fragments. The dUTP can then be labelled with a variety of probes to allow detection by light microscopy, fluorescence microscopy or flow cytometry. These techniques are costly, are potentially phototoxic and may even interfere with the cell death process itself [12,13]. As a result, novel alternative approaches to studying apoptosis and distinguishing it from living cells are required.
In 1998, Lecun proposed the convolutional neural network (CNN) and demonstrated excellent performance in image recognition [14]. Following that, some improved CNNs with deeper layers were designed and achieved satisfactory image recognition results, including AlexNet, VGGNet, GoogLeNet and ResNet [15,16]. CNNs mainly include the input layer, output layer, convolutional layer and pooling layer. The most important layers in CNNs are the convolutional layers themselves. Different convolution kernels are filtered without interfering with each other, from an input image or a feature map output from the previous layer, and each filter effectively extracts a particular type of feature. In recent years, CNNs have been widely applied to medical imaging analysis [17,18]. Machine learning algorithms have been demonstrated to be successful in classifying apoptotic and live cells [19]. To achieve successful performance in supervised learning, such as with CNNs, a large amount of high-quality human-labelled data from the target domain is required. In the bioimage informatics domain, structured and machine-readable labelling is still uncommon and generally very expensive to produce. To address the issue of collecting high-quality labelled data while utilising a large amount of existing unlabelled data, we applied an unsupervised deep learning method to our model.
As an unsupervised pretraining method for artificial neural networks (ANN), the autoencoder was originally called the "autoassociative learning module" in the 1980s [20]. As a method of learning features without supervision, autoencoders have been widely adopted. In order to improve their performance, autoencoders' outputs are often used as inputs into other networks and algorithms. There are generally two modules in an autoencoder: an encoder and a decoder. The encoding module encodes the input signals into a latent space, while the decoder module transforms signals from the latent space back into the original domain. The main applications of autoencoders are dimensionality reduction and feature extraction [21][22][23]. In this study, we used a deep learning unsupervised classification model that integrated a CNN and an autoencoding network known as the constraint convolutional autoencoder (CCAE) to extract cell characteristics, and a hybrid clustering approach was applied to obtain the final clusters [24]. Certain changes were made to improve the performance of CCAE, such as adding a new loss function to aid model convergence and improve the network fundamental structure. Experiments on public and private datasets revealed that this unsupervised method performed well in cell categorisation. We also created a bright-field imaging dataset of HeLa cells and calculated the accuracy of classifying apoptotic cells to confirm the effectiveness of our method. To verify the generalisation of the network, we tested it on a public dataset with the same parameters. This method yielded positive results for both datasets.
The remainder of this paper is organised as follows. In Sect. 2, we introduce the main method, including the data preprocessing, the CCAE neural network model structure and the hybrid clustering strategy, respectively. In Sect. 3, we exhibit the result of the main method used on datasets. Furthermore, details such as the experimental environment's configuration, the specific configuration of the experimental model and the performance evaluation of each model are provided. Conclusions and discussions are discussed in Sect. 4.

Cells and materials
Human cervical cancer cells (HeLa), human colon cancer cell line (HCT116) and human breast tumour cell line (MCF7) were cultured in DMEM (Gibco, USA) supplemented with 10% foetal bovine serum (Sijiqing, China) in a 37°C incubator containing 5% CO 2 . Staurosporine (STS) was purchased from Sigma-Aldrich (USA). The cells were incubated with 1 µM STS for 1, 2, 4 and 8 h, respectively, to induce apoptosis and subsequently imaged bright fields using a Zeiss (Axio Observer 7, Germany). Use fluorescence microscopy to demonstrate that apoptosis was successfully induced.

Image preprocessing
Generally, bright-field images of cells are susceptible to interference, such as noise, uneven greyscale and uneven image contrast. Cell image preprocessing comprises global equalisation transformation, threshold processing, adaptive binarisation and denoising to obtain high-quality images. Gaussian blur filter and contrast equalisation are initially used to remove high-frequency noise and boost image contrast. Subsequently, adaptive binarisation processing is used to maximise differences in the representation of foreground and background and improve image quality. Finally, we can emphasise the morphological characteristics of the cells by minimising the background noise in the images, as shown in Fig. 1.

Fig. 1 Preprocessing of HeLa cells treated with STS for 8 h and control.
An overview of changes were made to bright-field images of apoptotic and living cells. Image greying and Gaussian blur filter can remove high-frequency noise and boost image contrast. Adaptive binarisation processing and denoising is used to maximise differences in the representation of foreground and background and improve image quality

Deep learning model
This section introduces the CCAE convolutional autoencoder structure and the modifications we made to improve the performance of the model. CCAE is a variant of the autoencoder (AE) model. The AE coding block is replaced by convolutional layers in CCAE, allowing CCAE to effectively compress images, and a constraint is applied to the hidden parameter to assist the model in learning more effective features.
The AE is an efficient method for extracting valid information from raw data [25]. The main information is preserved while the input dimension is reduced by encoding and decoding blocks composed of multiple fully connected layers. Normally, the encoding block can be regarded as a nonlinear mapping that maps the input data x to the latent vector parameter y. For instance, where σ (x) denotes the nonlinear activation function, and W and b are the weight and bias of the encoding block, respectively. The decoding block, which is another nonlinear mapping, maps the latent vector parameter y back to the reconstructed data x . By adjusting the weights in the encoding and decoding blocks, the AE minimises the difference between the reconstructed data x and input data x. The convolutional operation is an effective method of extracting image features. For instance, AlexNet [15] and Lenet-5 networks [14], as variations of the CNNs, enhanced the image classification efficiency significantly. It has been shown that applying a constraint on the hidden parameter y helps the AE to learn more effective features [26]. Therefore, in VAE, the hidden parameters are usually sampled using a Gaussian distribution. In addition, Aytekin et al. [27] sampled the latent vector parameters from the unit ball space.
To assist the model convergence and enhance the network fundamental structure, we used triplet loss instead of L2 loss to optimise the effective features of cell images. The triplet loss pulls in the positive sample pair and pushes out the negative sample pair such that the same label image features can be aggregated in the feature space [28]. In this algorithm, the anchor picture is treated as a positive image, whereas all other photos are treated as negative images. Extracting all morphological features of cells is redundant for cell classification and will lead to a decrease in classification accuracy. In other words, the model must retain valid features while discarding invalid ones. In this model, the latent vector parameter y was sampled from a Gaussian distribution with a mean value of 0. The constraint is transformed into an additional term in the loss function, where M denotes the number of cell images, N means the number of triplets, L is the length of the latent vector, x i is the input image, x i is the decoding data, x a i is the anchor image, x n i means the negative image and α is the margin between positive and negative pairs. The structure of the model and the parameter setting is shown in Fig. 2.

Hybrid clustering algorithm
This section introduces a new clustering strategy known as the hybrid clustering algorithm for cell clustering. The single clustering method groups subsamples from a single perspective, which leads to misclassification. The subsamples can be grouped from a comprehensive perspective, which is multiperspective by considering the clustering results of various types of models; thus, the clustering results are more reliable [29]. The encoded data were used to divide the cell data into two categories (apoptotic and live cells) using different clustering methods with different clustering effects. The hybrid clustering algorithm, a bagging algorithm based on The CCAE only uses convolutional and deconvolution layers, instead of filtering the features using a pooling or upsampling method that may lose important information. This shows several steps within the CCAE, starting from an image. The cropped cells are resized to 512 × 512 pixels before submitting to CCAE. To extract information from these images, several blocks of convolution are applied after the input has been resized. By applying a convolution matrix on top of the original image, the image is recalculated and enhanced based on the values of surrounding pixels. Followed by a few deconvolution layers, these layers will reconstruct image from 32 × 32 pixels to 512 × 512 pixels. The latent vector parameter was sampled from a Gaussian distribution with a mean value of 0. Lastly, input image, reconstructed image and latent vector are all used as parameters of triplet loss and help CCAE learn more effective features of cells the multi-clustering model, is then used to secure the optimum clustering results. In other words, a cell was considered apoptotic if it was considered apoptotic by all of the clustering techniques I used.
Three traditional clustering methods were used in this study: the Euclidean distance k-means [30], agglomerative (AGG) [31] and balanced iterative reducing and clustering using hierarchies (BIRCH) [32] algorithms. The k-means algorithm divides dataset samples into k cluster classes using different distance formulas. The cluster centre was obtained by initialising the mean vector at the beginning. The greedy strategy minimises the distance between the sample and the cluster centre, and the cluster centre is updated. Finally, clustering results were obtained. The AGG and BIRCH algorithms belong to the hierarchical clustering algorithm, which divides the dataset at different levels into a tree-like structure. The AGG algorithm is a bottom-up aggregation strategy.
First, each sample in the dataset is considered an initial cluster; then, at each step of the algorithm, the two clustering clusters that are closest to each other are found to merge. The merging process was repeated until the preset number of cluster clusters was reached. The BIRCH algorithm uses a clustering feature (CF) tree to perform hierarchical clustering. The algorithm constructs a CF tree based on input data. Then, the clustering algorithm and outlier processing on the leaf nodes were performed. At the end of the clustering process, each leaf node becomes a cluster of sample sets. Although the multi-clustering model-based bagging algorithm generates more rejection data, the hybrid clustering algorithm can significantly improve clustering accuracy, as described in the following sections. Figure 3 provides a brief overview of the main methods used for cell classification.

Results
In this section, we provide evaluation indicators for several models. The transfer of the evaluation method with supervised learning to the unsupervised task is the highlight here. All our models used an Adam [33] optimiser with a learning rate of 3 × 10 −4 training 1000 epochs on datasets. The experimental system environment used for learning and testing was as follows: The operating system is Linux Ubuntu 18.04.5; the hardware consists of an Intel Xeon CPU E5-2690v32.60 GHz, 64 GB memory and two NVIDIA GeForce GTX 3090 GPU; and TensorFlow 2.4.0 deep learning framework. A table is provided to demonstrate the performance of several datasets to show the difference between the existing supervised methods and our model. Table 1 reveals that CCAE achieves higher accuracy than the supervised method without any prelabelled training dataset. In addition, the model can accurately determine whether the cells are apoptotic when the morphology of the cells changes slightly.

Effectiveness of the hybrid clustering algorithm
In this subsection, we compare the results of the single clustering model with the hybrid clustering model, as summarised in Table 2, to demonstrate the usefulness of the hybrid clustering technique. Table 2 reveals that the accuracy of the hybrid clustering model is at least 1.08% higher than that of the single clustering model at the cost of at most 6.40% rejected subsamples.

Analysis of CCAE
This section delves deeper into the efficacy of CCAE. The triplet loss function restriction is discussed in detail. Our triplets consist of two matching cell bright-field images and Fig. 3 Flow chart of main method. The first step of the model is cell image preprocessing, including Gaussian blur, adaptive binarisation and denoising. Afterwards, processed images are resized to 512 × 512 pixels before submitting to CCAE model. After all the convolution and deconvolution steps, the encoding data (at the end of the convolutions of size 32 × 32 × 2) are flattened to be used in the following step of the model. The hybrid clustering algorithm, including the Euclidean distance k-mean algorithm, the agglomerative (AGG) algorithm and the balanced iterative reducing and clustering using hierarchies (BIRCH) algorithm, was applied to get clustering results The classification results of current supervised deep learning method for apoptotic and live cells were displayed in the first row of the table. The classification results of the CCAE on 1000 randomly chosen images from the same dataset were displayed in the second row. The remaining rows showed the classification results of CCAE on our own dataset. Our method yielded positive results for all datasets. R.r is the reject rate, which is the percentage of the total number of rejections after the hybrid clustering mechanism a non-matching cell bright-field image, and the loss aims to separate the positive pair from the negative pair by a distance margin. Particularly, we seek an embedding f (x), from an image x into a feature space such that the squared distance between all cells of the same identity, independent of imaging conditions, is small, whereas the squared distance between a pair of cell images from different images is large. In this algorithm, the anchor picture is treated as a positive image, On the public dataset and our dataset, we tried the k-means, BIRCH, and AGG and hybrid clustering methods individually. Experimental results on the both datasets employed show that, compared to single clustering methods, hybrid clustering method generally achieves the best classification accuracy. By considering the clustering result of various kinds of models, one can group subsamples from a comprehensive view, which is a multi-perspective of view, and thus, the clustering result is more reliable In the table, the first and the second rows report the classification results on DLAN dataset by using CCAE with different loss function. Classification results of CCAE with different loss on our dataset are present in the remaining rows. On the same dataset, the triplet loss function not only has greater recognition accuracy than l2 loss, but also rejects less photographs, as seen in the table whereas all other photos within the margin are treated as negative images. We compare triplet loss to l2 loss as shown in Table 3. The strong prior operation is favourable for model learning features. We used the t-SNE (t-distributed stochastic neighbour embedding) [34] visualisation method to directly show the improvement brought by strong prior operation. The technique is a variation of stochastic neighbour embedding [35]. T-SNE can capture much of the local structure of high-dimensional data while also revealing global structures such as the presence of clusters at multiple scales. The results of the t-SNE visualisation are shown in Fig. 4.
A constraint is added to the encoding network so that the generated latent variables roughly follow the standard normal distribution, potentially improving recognition accuracy Fig. 4 Effect of preprocessing on data compression. In order to demonstrate the effectiveness of preprocessing, the t-SNE method is used to visualise the feature maps of CCAE using the same dataset. According to the map of CCAE with preprocessing, samples are almost separated in two dimensions. In contrast, using CCAE without preprocessing features makes it challenging to distinguish two classes. This result suggests that the features extracted with preprocessing are more discriminative than those extracted without preprocessing and help CCAE learn more effective features of cells [26]. A constraint on the latent vector is a mechanism for "forgetting" cell characteristics. We add Gaussian constraint to the encoding data to help our model learn more effective features.

Conclusions and discussions
In this study, we combine convolutional neural networks and an autoencoder to extract cell characteristics and use the hybrid clustering algorithm to cluster cells. The encoder utilises convolutional layers to learn cell features from images and the decoder uses deconvolutional layers to reconstruct cell images. Based on FaceNet [28], which uses triplet loss to learn highly discriminating face features, we introduce triplet loss to guide the model to learn recognisable cell features. We employed a generative model instead of a discriminative model to extract cell structural information in its entirety. The goal of the discriminant model is to locate a high-dimensional representation of data in a high-dimensional space and thereafter transfer them to a low-dimensional space for classification and discrimination. This may contain data with missing structural properties. In contrast, the generation model can compress the data dimension while the generation module maintains the structural integrity of the data. This is a more sensible method of data compression with greater resilience and interpretability for downstream data matching and classification.
Prior study has shown that supervised deep learning performs well in discriminating cell images. Verduijn's method used a training set (~35,000 cells) to tune the model towards its new objective of detecting and discriminating between alive, apoptotic and necroptotic cancer cells [19]. It is possible to use deep learning algorithms such as the convolutional neural network for image classification; however, these methods require an extensive amount of training data that are well annotated. Compared with the existing methods, our algorithm only uses 1000 randomly selected cell images from Verduijn's dataset and achieves higher accuracy. Although our method can only distinguish living and apoptotic cells, it does not require any annotated images compared with other deep learning models and can achieve a considerably high classification accuracy. This is the first study to look into the potential of unsupervised deep learning methods for apoptotic cell classification. The triplet loss function is introduced into the CCAE loss module, which uses image similarity to enhance feature extraction ability. We demonstrate the performance by the published dataset, and the result shows that our proposed method can achieve a high classification accuracy rate. Compared with some existing supervised methods, our method is more applicable to domains in which it is difficult to obtain high-quality, balanced datasets. The approach described here is expected to have a significant impact on research in cancer diagnosis, drug screening and cell characterisation in general.
This study has several limitations that should be acknowledged. Our approach has only been tested on the limited images so far. Furthermore, some cell deaths, such as ageing, necrosis and ferroptosis, could help in a variety of applications, such as developing and screening new drugs and detecting dangerous toxins. It remains to be seen whether unsupervised deep learning approaches work well in other cell lines and cell death. We intend to investigate the effect of our method on a larger cell dataset in the future and make it applicable to multiple cell classification problems.
For future work, we will consider different network architectures, such as residual [36] and the inception [37] networks, which have been shown highly effective in finding latent structures and features. Furthermore, we plan to enlarge the capabilities and generality of the method and migrate the model to other domains. Availability of data and materials The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.