Detecting objects behind scattering media using vortex beams and deep learning

: Reconstructing objects behind scattering media is a challenging issue with applications in biomedical imaging, non-distractive testing, computer-assisted surgery, and autonomous vehicular systems. Such systems’ main challenge is the multiple scattering of the photons in the angular and spatial domain, which results in a blurred image. Previous works try to improve the reconstructing ability using deep learning algorithms, with some success. We enhance these methods by illuminating the set-up using several modes of vortex beams obtaining a series of time-gated images corresponding to each mode. The images are accurately reconstructed using a deep learning algorithm by analyzing the pattern captured in the camera. This study shows that using vortex beams instead of Gaussian beams enhances the deep learning algorithm’s image reconstruction ability in terms of the peak signal-to-noise ratio (PSNR) by ~ 2.5 dB and ~1 dB when low and high scattering scatterers are used respectively.


Introduction
The interaction of light with matter is often characterized by manifold scattering and absorption events. The effects of light scattering in media manifest as low visibility during foggy conditions, blurring in images, and loss of information in medical imaging, to name a few examples. The amount of scattering in any given medium depends on the light source structure, intensity and wavelength, and the medium's optical properties. The amount of scattering can be measured, and these measurements offer deep insight into the properties of the medium and the behavior of light propagating through it. Such studies have led to optical detection using visible and infrared light, which has become hugely popular in biomedicine due to its robustness and tremendous advances in computer science [1][2][3][4][5][6][7][8]. Compared to other classical biomedical techniques like X-ray imaging [9], optical imaging using visible and near-infrared (NIR) light is non-ionizing. It causes no harm to the samples during screening. It is also cheaper to implement than conventional biomedical imaging techniques such as magnetic resonance imaging (MRI) [10]. However, optical imaging systems that use visible and NIR light suffer from optical blurring and photon noise phenomena [11][12][13]. Primarily caused by the interaction of light with tissue and their subsequent propagation, these effects often contribute to the image's degradation, affecting the image's resolution and the characterization of edges [14]. Absorption and multiple scattering in all directions also lead to loss of information in light that does not reach the camera's aperture [15]. The reconstruction of displayed objects in tissues from the images obtained by the camera is also made difficult, time-consuming and cumbersome due to the complex mathematical models involved in inverse problem calculations [16,17].
Hence, an optical imaging system needs to retrieve maximum information about the displayed object, improve resolution, and reduce the complexity of the analytical models used to reconstruct the displayed object. The former two objectives can be achieved by using structured light sources, and the latter can be addressed by implementing deep learning algorithms. Studies have shown that imparting a topological charge to beams used for optical imaging enhances optical imaging systems [18][19][20]. It has also been shown that vortex beams have more transmitivity through scattering media compared to Gaussian beams [21].
Moreover, recent studies indicate that optical imaging is also improved through deep learning algorithms [22][23][24][25][26]. This article proposes a proof-of-concept simulation model to enhance imaging through scattering media using multiple vortex beams and convoluted neural networks.
The use of vortex beams, coupled with a deep learning algorithm, allows us to gain significant insight into the potential impact of vortex beams to enhance the image reconstruction quality and their potential applications in imaging through diffuse media in many applications such as biomedical imaging, imaging through fog, non-destructive testing, computer-assisted surgery, and autonomous vehicular systems.
The structure of the paper is as follows. The theory and simulations are described in section 2. The optical imaging methodology is described in section 2.1. Section 2.2describes the advantages of using vortex beams over Gaussian beams. Section 2.3 describes the convoluted neural network (CNN) used in conjunction with the imaging system. The simulation study results are shown in section 3, and we finish with the conclusions.

Theory and simulations
The concept of imaging using vortex beams, the simulation to validate the concept, and the deep learning architecture used to reconstruct displayed objects behind scatterers are described in this section.

Imaging through scattering media: Numerical simulations
The simulation set-up used in this study and the proposed imaging methodology is schematically shown in figure 1. The imaging apparatus is a lensless imaging system and the input beams have a spot size of 1.26 mm and an input power of 5 mW. The Gaussian (three different beams of varying spot sizes corresponding to the LG mode) and L-G beams are generated using a MATLAB code. The generated beams are then passed through a phase screen containing the displayed object, and the output light from the phase screen, which contains information about the displayed object, a number from the MNIST number dataset [27], is passed through a scatterer, which causes the beams to scatter and create an image on the camera. The scatterers are modeled to be different for each iteration and are created as a phase screen placed after the object and, in front of the camera to cause degradation of the image. The scatterers used in this study are modeled according to reference [15]. Nevertheless, due to the purely numerical nature of this study, certain modifications can be made in the modeling of the scatterers. The scatterers are modeled such that the amplitude and phase term governing the impulse phase response of the scatterer with respect to the input beam, is randomized, i.e., the images acquired at the camera for each iteration of the simulation, for the same beam parameters, is unique with a very slight variation to the remaining images. This is used to simulate the randomness of the structure and properties of particles in real-life objects which, while having the same properties in general, are still unique from one another [15,[28][29][30]. Employing the randomness of scatterers used also makes it harder for the machine learning algorithm to reconstruct the object. Thus, optimizing the algorithm for such a problem helps us to solve less complicated problems more efficiently in the near future. The scatterer is designed such that the impulse response of the scatterer to the field containing the information of the object behind the scatterer is given as: Where ( , ) is the impulse response of the scatterer to the field containing the information about the object which is a function of the input power, wavelength of light used, and the phase response of the scatterer and object. ( , ) is the input field which is the Gaussian or the Laguerre-Gaussian modes, is the phase response of the object behind the diffuser to the input light field, varied between 0 to 2π, and ( , ) is the phase response of the scatterer which interacts with the field. It is given a value from 0 (free space) to 0.62π (highest scattering used in the study) beyond which there is a complete loss of information regarding the object. Here, i and j are the pixel number values in the x and y direction respectively. is the wavelength of light used and is the propagation distance.
Therefore, scatterers of varying response functions are used to perform this study and analyze the effect of the degradation on the final image and the reconstruction algorithm. The beams are propagated using the split-step Fourier method [29]. This operation, which is easily executable in MATLAB, is used to numerically solve complicated partial differential equations for which it is difficult to ascertain a general solution. The imaging is performed separately for each different mode of the vortex beams. The three beams are then combined to create the complete pattern containing all the number's information. The same is repeated using three Gaussian beams with corresponding spot sizes to LG beam modes 0, 2 and 4. The spot size of the Gaussian beam corresponding to the vortex beam carrying topological a charge (l) is given by the formula: Where, 0 is the waist size of the beams used. The vortex beams and the advantages of using them in an imaging set-up are shown in section 2.2. Figure 1 shows the concept of the set-up used in this study.

Orbital angular momentum of light and imaging using vortex beams
Conventional laser beams usually have a spherical wavefront where the azimuthal phase or topological charge (l) is l = 0. However, it is possible to change the wavefront of laser beams by imparting a topological charge. Any beam carrying a topological charge is said to possess orbital angular momentum (OAM). Beams containing OAM have a helical wavefront and are called vortex beams. The Laguerre-Gaussian (LG) are the most common examples of light beams carrying OAM. They are mathematically obtained by solving the paraxial wave equation in the circular cylindrical coordinates [32][33][34][35]. The mathematical expression which describes the complex amplitude of the LG beams is given by: Here, k is the wavenumber, R is the radius of curvature of the phase front, w is the beam waist of the Gaussian term, m, and n are quantum numbers (Such that n=m+l and n+m=2p+l=N. l, N, and p are the topological charge, degree, and order respectively). The LG modes are the easiest type of vortex beams to generate in a laboratory setting, and excellent reviews exist on the theoretical explanation of OAM, generation of LG beams, and applications using LG beams [36,37]. An extensive study of the vortex beams has found ample applications in the field of communications and optical imaging, among others [35,[38][39][40][41][42][43][44][45][46]. Compared to the traditional Gaussian beam, the use of vortex beams in an imaging system shows significant improvement in the image quality. This can be seen in figure 2 where imaging using vortex beam show improvement in Pratt's figure of merit score [47] by up to 10%. When a scatterer of high phase (= 0.62 ) is used. This is due to the object's selective illumination by non-overlapping modes of the vortex beams used, which enhances specific parts of the sample, combined digitally to give a complete sample image.Different modes of OAM beams do not interfere with each other in free space propagation. However, when OAM beams enter diffuse media, the scattering photons move into the propagation path of photons from other modes. Increasing the width of the media or even the scatterering phase and amplitude function of the scatterer significantly reduce the number of ballistic and snake photons that reach the camera which are essential for a beam to hold its shape. At some point, the output pattern no longer resembles the original mode [48]. In figure 3, where the three LG modes used in the study are given a color coding of red (for l=0), green (for l=1and l=2), and blue (for l=3 and l=4), it can be seen that, for the weaker scatterer, the scattering profile is weaker but still there is a slight overlap between the output beams of the different orders. For the high scattering scatterer, it can be seen that there is a high ratio of scattered photons. However, few photons still propagate through the scatterer ballistically and manage to maintain the pattern of the projected number. After considering different combinations of modes used to image the test object, we use LG beams consisting of three topological charges l=(0,2,4) so as to avoid the overlapping of modes and reduce the aberrations arising due to it. The imaging is also done using three Gaussian beams with spot sizes calculated according to equation (1). LG modes 0,2,4 are best suited for this set-up as they cause the least amount of overlapping.

Deep learning architecture
To detect displayed numbers even inside a high scattering scatterer, we use a convolutionbased neural network to classify and reconstruct the displayed numbers. We a convolutional neural network (CNN) that receives the images recorded on camera. The network learns from images to identify the image that has been inserted before the scatterer. The CNN architecture is based on the encoder-decoder idea of UNet architecture [49]. Although originally this architecture was used for a segmentation problem, recent studies have shown that this architecture is very efficient for the required restoration [15,50,51]. However, several changes have been made to the original architecture. Firstly, the image input enters three convolutional layers, all of them with eight kernels of the size 5x5, where one layer is dilated equally to 2 and another layer dilated equally to 3. The three feature maps obtained are connected to obtain one feature map. The resulting feature maps enter the down sample layer (average-pooling 2x2). From this position, the process repeats itself five times in the encoder. The feature map information enters a transition layer consists of a batch normalization(BN) layer followed by the ReLU activation layer, then convolution with 2 4+ kernels of size 3x3, where indicates the step number. From there, the features maps enter the dense block consisting of 4 convolution layers with 16 kernels of size 3x3. Between each of the two convolution layers, there is a BN layer and a ReLU activation layer. In addition, at the entrance to each convolution layer in the block, connections are made with the feature maps at the exit from the previous convolution layers in the block and the feature maps at the entrance to the block via skip connections. At the end of each stage the feature maps enter into a sample layer (max-pooling 2x2). The bottleneck consists of one transition layer and one dense block. Now, the decoder part starts with five steps corresponding to the five that were in the part of the encoder. The output from the bottleneck then enters the up-sample layer (Transpose convolution 2x2). To preserve the information, skip connections are performed with the feature maps obtained at the end of the corresponding phase in the encoding with the feature maps obtained after the up-sample layer. This feature map is then mapped to the transition layer and from there to the dense block (dense block and transition layer are the same as the decoding phase). The CNN also performs an up-sample once more and completes another convolution operation to obtain the network output. It has been recently shown that using a dilated convolution layer and convolution layer with a wide kernel allows better information extraction from images [15]. Additionally, the use of the dense-net method has been shown to improve network performance for better convergence [52].
Since the network solves a regression problem, adjusting the regression problem's loss is essential for optimal network performance. Although Mean Square Error (MSE) and Mean average error (MAE) loss functions are common for the regression problem, studies have shown that they are less good for the problem as the input images are images that do not show a perfect resemblance to the ground truth. We therefore chose to use the Negative Pearson Correlation Coefficient (NPCC) loss function, which is effective for this type of problem [15,31,[53][54][55]: Where W and H are the respective comparator image's width and height, G is the ground truth and Y is the CNN output. G ̃ and Y ̃ are the mean values of ground truth and CNN output respectively. The CNN was built and trained on a computer with an i7 series 9700k processor, an NVIDIA GeForce RTX 2070 graphics processor. The network was trained for 20 epochs by Adam Optimizer with a learning rate of 0.0001. The network was trained two times, once for the camera-recorded images for the Gaussian beams and once for the camerarecorded images for the different modes of LG beams. Figure 4 shows the network structure used in the manuscript. Supervised convoluted learning requires a lot of labeled data to train the learning parameters well; thus, we chose to use a data set containing a large collection of different images. To train and examine the network, we used a Digits data set containing 10000 images of handwritten numbers from 0 to 9 (1000 for each number) where a certain angle rotates each image [27]. Each image is of the size28x28 pixels, containing pixel values from 0-255. Each image is resized to a 512x512 pixel resolution to match the image resolution to the camera resolution. We rescaled the pixel values according to the phase change value that each pixel contributed to the beam. The original image was used as ground truth labels for training purposes. Our simulation gives us the images recorded in a 512 * 512 resolution camera. In this simulation, the camera's pixel size and the number's image are 6.45 micrometers. Due to computational limitations, the images captured on camera have been downsampled to 256x256 pixels, leading to loss of information as the downsample algorithm averages 2*2neighboring pixels.10% of data was randomly selected to test the performance of the network (test data) at the end of all training processes. 90% are used for the training process (train data), while at the beginning of each training, 10% from the training data is randomly allocated to validation (validation data).

Image restoration and enhancement using CNN
In this section, we present the training and validation of the CNN algorithm detailed in section 2.3. As described in section 2.1, two scatterers are used to simulate the beam propagation in a high scattering and low scattering scatterer, and the images are obtained for for the three modes of vortex beams. The CNN is trained for 20 epochs in each iteration. The NPCC loss function is used in the algorithm. With each passing epoch, the algorithm learns more from the image training set, which is comprised of 90% of all the images obtained. 10% of the images are used for testing the algorithm. The training results and tests for low and high scattering scatterers are shown below. Figure 5 shows the convergence graphs for the CNN used in the study. a) and b) show the convergence for the images obtained using Gaussian beams and vortex beams when a scatterer of phase response 0.5π is used, respectively. c) and d) show the convergence graphs when Gaussian and vortex beams are used for imaging through a scatterer of phase response 0.62π respectively. Figure 5 shows the training graphs for and the convergence of the NPCC function with each epoch for a scatterer with low scattering and high scattering. Comparatively, the LG beams perform much better in terms of image resolution of reconstructed images. This highperformance by the CNN when vortex beams are used can be attributed to the orthogonality of the different LG modes and also the distribution of the intensity patterns in the vortex beams with respect to the Gaussian beams. The mean squared errors for the image reconstruction for the low scattering media are 0.00413 when Gaussian beams are used as compared to 0.00231 when vortex beams are employed. The vortex beams also enhance the CNN's reconstruction ability when the imaging is done using a very high scattering media with mean squared errors of 0.0127 and 0.0102 when Gaussian and vortex beams are employed, respectively. The training and validation processes are also considerably smoother and the CNN is much more robust when vortex beams are used which can be observed from the training and validation graphs shown in figure 5. The Sørensen-Dice coefficient [56,57] is employed to the reconstructed images and a Dice mast is applied to check for the similarities between the reconstructed images and the ground truth images. The mask was designed to show in different colors, the parts of the reconstructed images that are missing and the parts that have been reconstructed but cannot be seen in the ground truth. In figure 6, red shows the extra parts and blue shows the missing parts. The Sørensen-Dice coefficients for the Gaussian and vortex beams are 0.920 and 0.936 when the scatterer of phase response 0.5πis used. When the phase response 0.62πscatterer is used, the vortex beams still enhance the edge detection with the Sørensen-Dice coefficients being 0.845 and 0.849 for reconstructed images using Gaussian and vortex beams, respectively.

Conclusions
This article investigated whether beams carrying topological charges enhance CNN's image reconstruction ability and potential to use vortex beams in biomedical photonics applications. Scatterers with varying phase responses are used to take into account the scattering in tissues. Figure 2 shows that using vortex beams in the imaging system significantly enhances the imaging and detection quality thus also enhancing the reconstruction ability of the CNN. This was achieved by employing the non-overlapping helical wavefronts of multiple vortex beams, which selectively illuminate different areas of the object. A direct consequence of improving the image quality using vortex beams is enhancing CNN's reconstruction ability. This can be seen in figures 5 and6, where the CNN shows tremendous improvement in reconstruction ability when vortex beams are used instead of the Gaussian beams. The various parameters used to check for the improvement and the improvement shown in terms of these parameters are shown in table 1. In conclusion, we have shown that using vortex beams instead of Gaussian beams enhances the reconstructed image quality by ~45% in low scattering media and ~25% in highly scattering media in terms of MSE. Using vortex beams also significantly improved the PSNR of images by ~2.5 dB and ~1 dB in low and high scattering media, respectively. The imaging was done using a vast dataset and used both high and low scattering media. The results suggest that the imaging system that uses beams carrying topological charges and, using a CNN to reconstruct the images, has numerous applications in the field of medical imaging among others.  Fig. 2. Enhancement shown by OAM beams compared to Gaussian beams. Fig. 3. Imaging using various modes of OAM beams and reasons to select modes (l = 0,2,4). Fig. 4. Block diagram of CNN architecture. Fig. 5. Training graphs of the CNN. Fig. 6. Reconstructed images from Gaussian and Vortex beams.