An Artificial Intelligence powered Digital Inline Holographic Microscopy and characterization scheme.

: Digital Inline Holography (DIH) based microscopy is a well-established technique for the characterization of nano and microparticles, such as biological cells, artificial microparticles, quantum dots, etc. Due to its simplicity and cost-effectiveness, various practical solutions such as auto characterization of complete blood count (CBC), cell viability test, and 3D cell tomography have been developed. In our previous work, we demonstrated the feasibility of this system to perform complete blood count along with the auto characterization of cell-lines as well as shape and size characterization of the microparticles. However, its performance suffered due to the weak signals from some of the cells owing to their poor signatures and the presence of background noise. The auto characterization technique therein was based on the parameters determined from our empirical findings, which limit the system in terms of its cell-line recognition power. In this work, we try to address these issues by leveraging an artificial intelligence-powered auto signal enhancing scheme as well as adaptive cell characterization technique. The performance comparison of our proposed method with the existing analytical model shows an increase in accuracy to >98% along with the signal enhancement of >5 dB for most cell types like Red Blood Cell (RBC) and White Blood Cell (WBC), except the cancer cells (HepG2 and MCF-7) for which the accuracy is about 84%.

Further we have developed the auto characterization method based on convolutional neural network 19,20 (CNN) architecture to classify the various cell lines from the DIH micrograph.
In this article, we have described the detailed methods adopted for the designing as well as optimization of various parameters to devise a suitable model with better accuracy. These optimized models are simple, lightweight, and require a smaller number of samples for effectively learning the cell signatures. The details are as given in the following sections.

Denoising modality:
For denoising of the DIH micrographs, we adopted the concept of autoencoder 24 . An autoencoder is an unsupervised scheme that scuffles to recreate the input at its output. It consists of an input layer (x), an output layer (r), and a hidden layer (h). The hidden layer h termed as a code layer stands for the input in a reduced dimension. The whole network structure can be labelled into two parts. The first part is an encoder, which tries to code the input as h = f(x), and the second part is a decoder which tries to recreate the input from the reduced code layer as r = g(h), where r is the recreated assortment of input x (See figure 1(d)). Basically, it tries to attain r = g(f(x)). However, this is not a linear transformation since model is enforced to learn the significant features of the input to encode it into the code layer (in reduced dimension).
In this work, we specifically used the denoising version of the autoencoder. Traditionally, the autoencoders try to reduce the loss as L(x, g(f(x))). However, the denoising autoencoder attempts to reduce the cost as L(x, g(f(x'))). Where x' is the noisy form of the input x.
In this work we tried two different methods to design the denoising architectures, namely, extreme learning machine (ELM) and convolutional neural network (CNN).
ELM: This is a single hidden layer fully connected architecture 25 . In this method, the input weights are initiated randomly and kept intact. Only the output weights take part in the learning process through a straightforward learning contrivance [25][26][27] . For N arbitrary input samples xi ∈ R n and their counterpart targets ti ∈ R m , the ELM achieves this mapping using the following relation as shown in equation (1).

=
(1) Here, H is the hidden layer output matrix, β is the output weight matrix, i.e. between the hidden layer and the output layer, and T is the target matrix or matrix of desired output 26 . From equation (1), we can obtain the β using Moore-Penrose pseudoinverse 25 as shown in equation (2).
In the extended sequential learning form of ELM, the can update sequentially. This provides an added advantage of updating the learning whenever a new type of sample is available, thus providing the flexibility of transfer of learning. The update mechanism 28,29 is as shown in equation (3). Here For n =1, Here 0 is the hidden layer output with the 1 st sample or 1 st batch of samples 30 .

CNN:
Convolutional neural networks (CNN) 6,19,20 are a type of neural network widely used in the analysis of spatial data such as image classification and object segmentation. In this network, two dimensional kernels are used to extract the spatial features from the input patterns, using a convolution operation between the kernel and the input. The typical architecture of a CNN is as shown in figure 1(d). Here, the kernel is shared spatially by the input or by the feature map. The feature at the location ( , ) in the k th feature map of the l th layer can be evaluated as shown in equation (6).
, , = ( ) , + Here, and are the weights and the bias vector of the k th filter in the l th layer. Here the weight layer is shared spatially which reduces the complexity. , is the value of the input at location ( , ) of the l th layer. The nonlinearity in this network can be obtained by introducing the activation function, denoted here as (. ). The activated output can be represent as shown in equation (7).
Additionally, there are pooling layers that introduce shift-invariance by reducing the resolution of the activated feature maps. Each pooling layer connects the feature map to the preceding convolutional layer. The expression for pooling is as shown in equation (8).
In our work, we used CNN for both denoising as well as classification. The details of their architectures and their impacts are discussed in the result section. The detailed description of the CNN is given in the supplementary section.

Results and Discussion:
Performance of denoising algorithms: For efficient and adaptive denoising, we analysed various autoencoder schemes, starting with the fully connected autoencoder. In our first iteration, we experimented with the fully connected network having three hidden layers with 512, 256, and 512 number of neurons respectively. The input layer is the 1D vectorized array of the input image (e.g. of 66 × 66 pixels). The input to the model is the noisy version of the input image and the expected target output is the original image. The noisy images were created using a Gaussian distribution with variance ranging from 100 to 600 with zero mean (refer the supplementary section for detail).
Further, we experimented with an increased network size having five hidden layers with 256, 128, 64, 128, and 256 number of neurons respectively. In all these networks, rectified linear unit (ReLU) was used as the activation function while mean squared error (MSE) [31][32][33][34] was used to calculate the loss. The Adam optimizer 35,36 (after iterations over other optimizers) was found to deliver better convergence and hence used to perfect the weight and biases. The denoising performance was quantified in terms of the improvement in SNR, measured in dB, denoted here by SNR imp , as given by equation (9) 37 .
where SNR out = 10 10 ( ), and SNR in = 10 10 ( ). Here is the value of sampling point in the original DIH signal, ̃ is the value of sampling point in the noisy DIH, and ̂ is the value of sampling point i in the denoised version of the same image.
N is total number of sample points in that DIH image.
The fully connected network for both the above configuration shows no significant improvement in SNR imp after reaching saturation at around -10.08 dB. For further improvement, we experimented with CNN architecture using various models with a different number of convolution layers and distinct kernel sizes. The configuration of the model which accomplished the best outcomes is 3 × 3, 3 × 3, 5 × 5, 5 × 5, 7 × 7, 7 × 7, 1 × 1 with 32 filters in each layer except the last layer. The last layer consists of a single pixel filter (1 × 1 filter) that is used to condense the output across all the 32 filters. Here, the input and output size are the same. Padding was used to maintain the original size after the output of each convolutional layer. Adam optimizer was used to optimize the network to reduce the mean squared error loss.
The CNN results show a better reconstruction as shown in figure 2.    Further, we tried the ELM architecture which is well known for its fast convergence 25  that the 40 × 40 is having higher value of SNR. However, the variation is of 2 as compared to the variation for the size 50 × 50, that is of 1.5, which is the lowest compared to all the other sizes. Since CNN shows a substantial performance with lower variance for the 50 × 50 input size, therefore, we fixed it as optimal for all the further study and comparisons. Figure 3(

Performance of classification algorithm:
Since the diffraction patterns of cells and microparticles in a DIH micrograph depend upon their physical and optical properties, therefore, the diffraction patterns carry the unique signatures of each of the cell types as shown in the 2D contour plot in figure 2. These unique signatures can be utilized for the classification of these cell types. Since our previous inference concludes that CNN works better for denoising, therefore we experimented with the same modality for the classification as well. In this work, in order to determine the optimal architecture of CNN for cell-line recognition, we first proceeded to find the optimal depth of the network by studying the classification performance of the model on increasing the depth, by adding convolutional and pooling layers, as well as by varying number of kernels and kernel size, till we reached performance saturation. We have experimented and evaluated various shallow and deep CNN models to classify cell lines. The details of the model architecture are as described in figure 4.   HepG2 (~0.96). From these results, it can be inferred that the classifier is working well, especially for RBC, WBC, 10 µm and 20 µm beads.