Image Denoising to enhance Character Recognition using Deep Learning

— In this paper, we proposed implementing a Deep Convolutional Neural Network. A relationship between a noisy character image to its clean counter-part are mapped using Deep Convolutional Neural Network.The overall process is divided into two stages: noise type classification and image denoising. Firstly, the noise type classification identifies the types of noise, and based on this noise type, a particular denoising model is selected, which increases the image denoising performance. The denoising network inputs a noisy image and a target of its clean corresponding image during the training. After the mapping function is trained, the generated model performs character image denoising. Then, on each band, a trained mapping function perform image denoising irrespective of the other band . Finally, each block is assembled to generate a clean image. In this paper, the MNIST and Char74K dataset of handwritten digits diluted with artificial noise divided into ten types are used for experimentation.. Our experimental results show that the proposed techniques perform better image denoising ascompared to the existing methods, both in terms of image noise type classification and image denoising. The overall Character recognition accuracy increased by 66% after performing the proposed denoising technique.


INTRODUCTION
Digital imaging is a predominant method of information storage [1], [2]. However, due to the presents of noise in the processes ofimage capturing, storage, transformation and deliverythedigital images' quality typically decreases [3]. To counter thisflaw, image denoising is often proposed. A traditional image recovery attempts to predict a clean image from its noisy image [4][5][6]. The model can be formalizedas = + An image x(Clean image) is corrupted with a noisy pixeln which results in a noisy image y. Image denoising techniques have to be applied on xtoeliminate the noisy pixel from the image y [7].
Image denoising improves image quality, enhances data, improves critical information, and enhances the understanding and identification of images by highlighting key features and removing the secondary features [8]. The application of these efficient denoising techniques improves Optical Character Recognition (OCR).
Duringimage capturing and transmission, digital images are oftendistorted by an unwanted noise. Noise also often accumulated overtime on a physical document. This document tends to degrade itself under different situations such as unfavourableweather conditions, folding marks, watermarks, etc. Implementing a robust and effective image denoising algorithm can only restore toclean imageby removing the redundant data [9].
The importance ofthis image denoising technique hasdevisedmany different approaches. However, most of the process attempts to find the optimal mapping technique betweenclean image and its corresponding noisy image by implementing algorithms such as Histogram Enhancement. However, this method fails to generalize dueto the heuristic nature, sparsity. Deep Neural Network (DNN) hasshifted the approach paradigm by generalizing the highly non-linear mapping function between the noisy and noise-free image. However, this highly non-linear complicated noisypattern mapping may perform significantly better for supervised learning but not always for the unseenimage during the training.
To overcome this deficiency, we proposed a method to decompose the noisy input image into its three-channel as RGB. Each of this color channel exhibit a simpler less complicated noise pattern. Therefore mapping this pattern becomes easier. Deep Convolutional Neural Network (DCNN) learns the mapping function independently in each channel for the noisy image and its corresponding clean image. After DCNN has been trained, the generated model is used to remove the noisy pattern from each channel. Each channel is combined so as toproduce a noise-free image inthe original input noise image format.
The experiment is performed using MNIST and Char74K dataset of handwritten digits and characters. Initially, the dataset is a noise-free image; however, since we want to perform and evaluate image denoising, we introduce three types of artifact noise tothe dataset. The experimental datasetsare split into bothtesting and training dataset. The performance of image denoising is assess using Peak Signal to Noise Ratio (PSNR) value.Toprove that our proposed denoising algorithm improves the accuracy of OCR. The noisy and noise-free generated images were evaluated by performing character recognition independently. The noise-free image enhances the OCR performance by claiminga 66 % improvement in the performance accuracy.

RELATED WORK
Image denoising has been widely researched in recent decades, and many denoising techniques from simple pixel manipulation to complex pixel mapping have been proposed.To provide a few, a brief review is provided as follows: 2.1: Statistical Analysis based image denoising: Image denoising has been discussed extensively for several recent decades, and a broad range of denoising algorithmshave been suggested. A simple yet efficient way of denoising pictures, known as the median filter [10], moves through the imagepixel by pixel, replacing each value with the calculated median form its neighbouring pixel. However, the median often tends to blurs the edge. To counter this problem, a median filter witha sorted sequence is proposed in [11].Donoho's wavelet [12] thresholding-based approach leads to aextensive variety of other wavelet-based image denoising [13,14].
However, this wavelet-based image denoising only works well with additive noise such as Gaussian noise and struggles to perform well with other variety of noise. A new hyperspectral image denoising algorithm was recently suggested by Chen and Qian [15] with introducing a principal study of components prior to wavelet shrinkage. Other main findings on images denoising include a partial-differential diffusing (PDE) system [16], a peronamalik diffusion-based selective smoothing method of Catte [17], a non-linear Total variation method based on the pixel intensity [18], and the approach of finding the similarity in the pixel structure [19]. Although anisotropic methods of diffusiontechnique will eliminate noisy pixel without blurring edges from digital image, the optimum solution takes longer to converge [20]. This statistically based image denoising method fails to dispose of the corresponding clean image when there is more than one type of noise [21].

2.2: Non Local Patch-basedmethod:
The interrelationship between pixel patches can be classified into two categories according to thismodelling scheme. The first is an explicit patch model-based method that directly defines regularisation as the correlation between nonlocalpatches. In order to determine the linear relation between isolated similar patches,Buades et al. proposes finding amean value among the non-local patches [22] method. In the sparse representation, Dong et al [23] incorporates the isolatednonlocalmean to achieve a non-locally centralised sparse model of representation for denoising. Zoran et al. [24] distribution of pixel patches with gaussian based technique. Dabov et al. [25] propose the well-known method matching a block and filtering technique based on three dimension, which supports the sparseness of similar pixel patches which claim to increase the performance. Gu and Chen et al. [26] are attempting in non-localpatches to exploit the low-rank properties. Beside these patch models, some scientists also introduce the technique of tensor decomposition into the denotation of the image. In order to restore the noisy image with a harder thresholding, Rajwade et al [27] use a higher order of unique image degradation (HOSVD). Gomes et al. [28] proposed a technique which is established on tensors to denoise perform edge smoothening, image denoising, and image reconstruction successively. In order to capture nonlocalsimilarity in the noisy images, Wu et al. [29] have proposed decomposition method based on the weight assign to the tensor ranking.

2.3: Deep Learning Based Method:
A deep learning based methodattempt to denoise the input noisy image, by finding the optimum relationship between the clean image and the corresponding noisy image in an end-toend way. For one part, some works of literatureproposed a deep neural network architecture to teach a latent informationto generate clean images directly from the noisy observation. For e.g., Mao et al. [30] create a rather deep, symmetrically skipped encoder-decoder network. Zhang & al. [31] is currently offering a deep, coevolutionary, denoising residual neural network. On the other hand, certain works [32,33] use deep neural networks, which learn about the proper regularization of images and incorporate them into the traditional regression model. Zhang et al., for example, [34] learn how to control the picture with a delayed-turned, neuraldeep network and combine it with a semi-quadratic division process [35]. In the alternate minimization approach for denoising Kim et al. [36] integrate a deep network based on aggregation. Koziarski et al. [37] integrate more image denotation with identification to perform image denoising which result in better optimization of loss function. Jain and Seung [38]implement a technique by confirming CNN is more powerful as compare to MRF technique. This study proves that CNN is winningon all front for image processing starts. Vincent [39] uses stack denoising selfencoder (SDA) for denoting images which obtain a high rate of accuracy. HC Burger [40] has also tried to map two independent images one of which is a noisycounter-part using a popularly use Multi-layerperceptron's (MLP's). Lore [41] try's to maps a feature of low light images using a deep learning self-encoder. Chen [42] focus his study on a nonlinear model in specific to learning different types of noisy pattern. A wide range of image denoising and image recovery were consider. An output relatively close to BM3D [43] was achieved in there study.

METHODOLOGY
The proposed methodology can be broadly classified into three stages. Training noise classification model, training denoising model and testing. A noise classification model analyses the noise pattern from the image, and based on the noise pattern; an image classification is performed. A denoising model performs noise removal by performing feature mapping between a noisy image and its clean counterpart. In the testing stage, a test data, which is a noisy image, is input to the system, noise type identification is perform using the noisy classification model obtained during the training phase. After identifying the noise type, the corresponding denoising model concerning the identified noise is selected to perform image denoising. This architecture setup enhances the noise removal process by reducing the noise spectrum band and finding suitable denoising methods for each noise spectrum. The overall methodology is given in Fig.1 below. Fig. 1 (A) classified the types of noise which corrupt the input image. The noise type include such as Gaussian noise, salt and paper noise and spiker noise. Fig. 1 (B) train the CNN autoencoder to denoise character image. Training this denoising model is performed using 2 inputs and 1 target. Inputs to this CNN are noise image, which are generated by the noise generator and the noise classification model's output. Target to this CNN is the correspond clean image from the MNIST dataset. Fig.1 (C) is used for performing image denoising using both the classification model and denoising model obtains from Fig.1 (A) and Fig.1 (B), respectively. To evaluate our proposed technique in terms of its denoising performance, we used a dataset known as MNIST character mixed with Char74K dataset which are both available in public domain. The initial stage of training noise classification model can be subdivided into Noise generation, pre-processing and classification CNN. In noisy image generation, we generate ten types of noise independent of each other, which falls under two types: single type and mixed type noise, as given in Table  1. In pre-processing, each type of noise is label against each type of noise. This pre-processing enables the proposed system to analyse and study each noise's pattern in details, which in turn enhances the noise removal performance. In classificationstage, we then trained the Deep Convolutional Neural Network using the generated noise and their labels obtained from the previous two stages. After the network is trained to identify noisy images, we use the network for noise classification. Based on the output of these classification, image denoising is performed. Image denoising based on the types of noise (after classification) reduces the noise spectrum to a specific range compared with a general noise reduction. This denoising after noise classification enhances noise removal performance as each denoising algorithm is tuned to work on a specific type of noise. The overall performance is evaluated on the testing phase, which shows promising results compared to a stand-alone denoising deep neural network.

Noise Generation:
The Noise generation process is the initial stage in image denoising. In this stage, a noisy image is generated. MNIST dataset is used for the experimentation. The introduction of Gaussian noise (Gn) to a clean image is represented as follows: Where is the corrupted noisy image and is the character image obtain from the MNIST dataset (clean image). ( ) is the Gaussian distribution calculated based on the value of which is the mean value and which is the variance value. The process of adding salt and paper noise (Sn) can be represented as below: Where the character set image consists of number of pixel.
Out of this pixel we randomly select pixel on which we assign a value 0 or 255. The remaining pixel − remains untouched.The process of adding speckle noise (Sn) is represented as below: Where clean character image is multiply with a and variance of . The process of adding the last kind of noise, known as Poisson (Po) noise is given as below: Where ( ) is a noise generated from by taking a poisondistribution against the clean image.
The above 4 types of noisy image are used for training the noise classification network. However to obtain a robust noise removal the noise are again classified into single noise types and mixed noise type. The mixed noise is a combination of the above noise added together to reproduce a different pattern of noise. The characteristic classification of noise is given in as below table. In single type noise, 10% of each noise type are diluted to the original clean image to produce its noisy counter-part. In Mixed type noise we increase the noise percentage increase to 20%. Each of the noise types are label within a range of Type 1 to Type 10. After generation of these noisy image variance, each label along with each noisy image is train so as to obtain a mapping relationship to classified the image based on its types of noise.

Pre-processing
In this stage, the noisy image along with its label are assigned together so as to train the classification neural network. The noisy image of size [28,28,1] forming a total number of 8000 character are label with its associated noise type.The noise labelling process is perform by selecting one image and labelling the transformed image into its correspond noise types. For instance, one image from MNIST datasets, represent character 5, this image is converted into 10 other images of different types of noise. This 10 noisy images are given label as type 1 to type 10 during this stage. This stage of pre-processing is a vital process, as an inaccurate or incorrect labelling of noise types degrade the performance of noise removal. The overall process is depicted in Fig. 2.

Noise Classification Network
The noisyimages are highly based on the content, which means based on the pixel arrangement and intensity value the noise pixel dilution can have different effect, as stated in paper [28].This encourages us to use a Deep Neural Nework based approach to isolate the characteristics and to conduct noise type classification. In certain forms of trends, CNNs are strong alternatives. Beside, CNNs may also be learned to distinguish content-based noise by collecting features of noisy images of various kinds.
Our Noise Classification network contains four convolutions, two down sampling layersand two layersthat are completely linked (fully Connected). The configuration of our network is given in Fig. 3. The CNN layers are composed of Con1-2 and Con3-4say. The pooling layers are composed of Pool1 and Pool2. The output is one of the possible noise type out of the ten types of noise taken into consideration.  The size of input noisy images was modified by our system to 16 pixels, and then inserted into the training classification network. There are 32 convolution kernels in the first two layers, each of which has a size of 3 by 3 and a 1 sliding phase. Similarly16 convolution kernels occur in the third and fourth layers. The convolution kernel's scale and sliding phase size are similar to the first two convolution layers. It conducted the LReLU [44] activation function between any two convolution layers. The last layer is to predict the noisy image types and we use softmax [45] activation function to give us the probability corresponds to the type of noise. During the process of training the loss function is implemented to evaluate the performance. During the training the higher the accuracy reach the probability distribution is closer to 1.

Denoising Network
In our proposed methodology we consider image denoising as an issue of discrimination, which is the separation of noise, from the input image. To achieve this a feed-forward CNN is implemented as a denoising autoencoder. This autoencoder learns the latent feature of the noisy image and try to generate a clean image as a likely as the target clean image. There are two reasons that CNN is used. First, CNN is able to improve capability and consistency in utilising picture functionality using deep architecture. Second, there has been a great deal of improvement in regularisation, including standardisation of the batch [45], LReLU [46] and residual learning [47]. These approaches allow CNN to accelerate and optimise the training of denoising networkefficiently.
In many recent findings autoencoder based denoising have been extensively proposed [48] to learned a model from a separate training dataset (by separating the training and testing dataset). However such models tends to degrade its denoising performance upon encountering a different kind of noise which is not present in the training dataset. To counter this shortcoming we proposed a guided CNN, where our Noise classification Network identify the types of noise, based on the identified noise, a corresponding autoencoder network will be use to perform image denoising. Our denoising network, composed of an autoencoder takes an input of noisy image, the encoder perform mapping of input image with the hidden layer by extracting its latent features, from this latent features a clean counter-part is generated by the decoder. During the training process, a ground truth value of a clean image is used. The overall process is given in Fig 4 Where,  denoted the encoder, which maps the input image to its latent representation . The latent representation is the most important transformation a better latent feature result in better noise cleaning process. The decoder takes this latent representation and transform it to the clean image of its counter-part. A combination of this encoder and decoder are trained using a backpropagation algorithm. The ground truth to calculate the loss function is the clean counter part of the character image. The overall proposed CNN Architecture for denoising network is represented in Fig 5. A series of CNN layers are attached, 3 layer of encoder which produce the latent feature is implemented in the encoder part. The decoder part also implement another 3 layer attached at the end of the encoder. Skipping connection are implemented in the proposed architecture. Firstly, a deeper branch could efficiently smooth the pixel values. Because of this, the pixel may be difficult to restored back in the decoder layer. Therefore, a finer picture information can be generated through skip links. Secondly, the conflicting links often provide advantages for back propagation, allowing training for the deep neural network considerably simpler.

Character recognition
The main objective of our experiment is to improve the character recognition accuracy by applying our proposed denoising technique. To analyse the results we compare an implementation ofCNN, enhance with and without our proposed denoising techniques. CNN are specially design to work with image processing with respect to recognition task. We change the input images by reducing it to 16 × 16. A smaller samples size resultin poorer accuracy during recognition, while big sample size also require an expensive computational cost. Every layer in our network has kernels of size3 × 3. The conv-layer is attached with a Batch normalization (BN), Rectified Linear unit (ReLU) and finally max polling. The proposed DCNN architecture is presented in Fig. 6.

Dataset
Our experiment selected publicly available datasets known as Mnist and Chars 74K. These datasets are widely used for training the character recognition model. Mnist dataset has a total of 70,000 handwritten samples consisting of 10 classes (0-9). While Chars 47K dataset is composed of 64 classes (A-Z, a-z, 0-9), out of which 7705 characters were obtained from capturing natural images such as street label, house number, vehicle number plate and so on, 3410 were hand-drawn characters and 62992 were synthesized characters from computer fonts.
In our use case, we added four different types of noise to this image dataset to analyse the character recognition accuracy's performance. Our noise classification network is trained by adding 10 types of noise, as shown in Table 1. Each noise types were label using a matrix of 10x10. The Noise classification Network is trained using 65% of Mnist and 65% of chars 74K dataset. Our denoising network have to generate 10 models, for each specific kind of noise. In denoising network we use 5% of the combined Mnist and chars74K dataset for checking validation and the remaining another 30% for testing the accuracy.

Noise Classification Network Results
The noise classification network is trained using 45500 (65%) from mnist and 48100 (65%) from chars 74K dataset forming a total of 93600 images. As there are 10 types of noise, each noise type is trained with 9360 images.
A batch resized is performed in the initial stage of training. The image size is reduced to 16 x 16 pixel both from Mnist and chars 74K. The learning rate is fixed at .01. For single type noise (type 1-4), we trained 40 epochs and 60 epoch for the remaining mixed type of noise (type 5-10). The model generated after this training is evaluated by comparing the performance of noise classification for the generated 10 types of noise. We selected LeNet [49] and AlexNet [50] to evaluate our generated classification models. The noise classification comparison is given in Fig. 7 for Mnist and Fig. 8 for Chars 74K dataset. AlexNet network performance has a marginallyaccuracy in classification for specific types of noises obtained from Char74 dataset than our proposed network. However, our noise classification network perform better in general cases. The noise classification network is not yet the results of the intended works. Our denoising network's performance outperforms the above, which is given in Fig. 10. The confusion matrix for the training noise classification network is given in  The performance of our denoising network is analyse and evaluated in the section. To visualize the contrast before and after applying denoising methods, we use both Mnist and Chars 74K dataset. 10 different types of noise, as described in Table 1, were used a varying noise level of gamma = 30 and 60. The input images size is adjusted to 16X16 pixel to fit our denoising Network.
To collect ample information for denoising the images, we set the depth of our proposed network to 15 and the epoch to 60 for both Mnist and Char 74K dataset. We used Stochastic Gradient Descent (SGD) [51] with weight set to 0.01 in the training session, 0.6 as momentum and batch size equivalent to 256. The denoising performance for four types of generated noise is given in Fig.  10 below. Our Denoising Network generates 10 types of model for each noisy image types. Type 1 to Type 4 are single noise types, and Type 5 to Type 10 are mixed noise type. Denoising a greyscale image using noise type 1 to type 4 achieved an excellent performance. However, when denoising a colour image using a mixed noise (type 5 to type 10), we see a slide degradation in our denoising performance. To counter this performance degradation on the colour image (Chars 74K dataset), we conducted an additional experiment by splitting the colour image into its corresponding band RGB and perform image denoising on each separate bands. After denoising, each band are aggregated to retrieve back the colour image. Image denoising by band channel splitting technique sacrifice by losing isolated pattern information compared with the original noisy image. However, Losing an isolated pixel pattern does not impact character recognition performance. A comparison of existing denoising methods such as BM3D [52] and MemNet [53] is performed to evaluate our denoising network. Our proposed technique for image denoising achievedbetter PSNR values. Despite,MemNet tends have achieved strong PSNR values in certain image, it has its own limitation. One being because they are incredibly dynamic and not ideal for simultaneous computation. The other is that they are suited to a particular noise frequency and are not suitable for generalized noise. A comparison detail is given in Fig.  11. The central objective of this study is to increase the accuracy of character recognition. Image enhancement with noise removal technique increases the recognition accuracy by an average of 66% for different noise types in our experimental data. A real-world application could undoubtedly enhance the overall image recognition process. In this experiment, we implement character recognition using a multi-layer perceptron Neural Network architecture known as backpropagation. The Neural Network is trained using a single font type known as Times New Roman with 12 points in size [54]. The trained network is evaluated using both before and after noise removal from both MNIST and Char74K dataset. Fig. 12 below gives the character recognition accuracy for type1 to type10 noise. The proposed generated models achieve remarkable results in image noise type classification and image denoising. Color image, as compared to grayscale, tends to suffer more in image restoration as it has to work on 3 channel (RGB) in parallel. Increasing the sample training size may improve the performance-also, multiple types of noise from type 5 to 10 increase the complexity of the optimization process. However, overall performance shows remarkable results. Our denoising model also achieves higher PSNR values in comparison to the existing techniques. In future, we plan to develop a universal denoising model irrespective of the types of noise patterns, which will reduce time and space complexity.

Availability of data and material: Not applicable
Code availability: Not applicable