4.1.1 Data Gathering and Dataset Creation
As proposed in the previous chapter, a sheet of A4 size which bore printed Twi characters with spaces provided to enable the writers to write the various corresponding characters in their own writing styles was presented as shown in Figure 2; and these writers mounted to three-hundred. Since, there are forty-four (44) characters thus both capital letters and small letters a total of 13200 characters were obtained. These 300 sheets were scanned in a grayscale format for onward processing. Various characters were then cropped out of each scanned paper to obtain images in a jpeg format for further processing. After successfully cropping out all the images, a software tool called Light Image Resizer v6.0.6.0 was used to resize all the images into a uniform size of 32 x 32 pixels. Other effects were applied to enhance these images before training. The colors of each image were inverted, added auto-enhancement and also adjusted the brightness of the image to a 100 percent.
After a successful effect on the image, this is the outcome of sampled images as shown is Figure 3.
This was done to generate a dataset to be ready for the neural network. Then the entire dataset was split into two (2) major folders, thus training and testing. All letters of the same caliber were placed in a particular folder, for example, all cropped handwritten image characters containing capital “A” were put into a common folder and this was the same for all other characters. This gave a total of 44 folders, hence giving us a total of 44 classes (thus, 22 upper case letters and 22 lower case classes). Now, each subfolder in the training folder contained two hundred and forty (240) images as well as each subfolder in the testing folder contained sixty (60) images. This made the training and testing folders to bear a total of ten thousand, five hundred and sixty (10,560) images and two thousand, six hundred and forty (2,640) images respectively. From here, the dataset is ready to undergo training and testing. The characters in the training dataset are different from that in the testing and this is done to help analyze the training percentages of trained images and tested images. The aids to see if neural can rightfully find similarities in classified folders based on percentages in accuracies and losses to make good predictions. Note, the higher the percentage in accuracy of the training and testing datasets the better the predictions.
4.2 System Simulation
During the implementation session the following tools were installed onto the machine to compute the model and the programming language used is the python to carry the operation. These are:
i. Anaconda Integrated Development Environment: this contains packages like the jupyter notebook which serves as a platform for training of datasets.
ii. Jupyter Notebook: this package contains lots of libraries for performing deep learning model implementation of which we made use of the Tensorflow, Numpy, PIL (Python Image Library) and OpenCV2 libraries.
Step 1: Importing the tensorflow libraries as shown in Figure 4.
Step 2: Creating CNN’s architecture
The neural network contained the following parameters as shown in Figure 5, it harbored three (3) convolutional layers:
First Convolutional Layer: the initial convolutional layer of 32 neurons and an image’s input shape of size 32x32 pixels and a 5x5 kernel for the feature extraction. It utilized the ReLU as its activation function and used the max-pooling layer of 2x2 and the results of this was transmitted unto the second convolutional layer.
Second Convolutional Layer: it has a layer of 64 neurons and a kernel size of 3x3, which also applied the ReLU activation function and the maximum-pooling layer of 2x2. The computational results of this were then sent to the third convolutional layer.
Third Convolutional layer: this also has 128 neurons, and a kernel of size 3x3 and still maintained the ReLU activation function and used a max-pooling layer of 2x2. Then the convolutional results are then sent to the fully-connected layer 1.
Fully-Connected Layer 1: the network flattened the output into a one-dimensional array and used a dense layer of 128 and also applied the ReLU activation function.
Fully-Connected Layer 2: this layer maintained the fully-dense layer of 128 neurons, and the ReLU activation function with a dropout 0.5.
Fully-Connected Layer 3: this is the final year for the prediction of the various characters which bore dense layer of 44 neurons due the 44 classes we generated from our dataset and it is a rule that the final dense layer size must match the number of classes in a multi-categorical problem. This layer used the softmax activation function for the prediction.
Step 3: Compile the CNN model
After defining the model and stacking the layers, there is a need to configure it. During the compilation step, it is necessary to perform this configuration process. Before training, the model must be built, with the loss function, optimizers, and prediction metrics defined. CNN's architecture used the Adam optimizer, the loss argument used the categorical cross entropy and the metrics used to measure the accuracy, since we are interested in the accuracy levels of both training and testing datasets during training.
Step 4: Compile the CNN model
Step 5: Training dataset and testing dataset performances
The Figure 8 shows the accuracies and losses for both datasets, the first two (2) columns indicate the losses (loss) and accuracies (accuracy) for the training dataset respectively and the last two (2) shows the losses (val_loss) and accuracies (val_accuracy) for the testing dataset respectively with just 15 epochs. The CNN model achieved an average of 88.15% for the training and obtained 79.31% for the testing. This shows that our model is doing well with predictions due to high percentages for both training and testing datasets.
Step 6: Saving the Model
The trained model and its performance is saved for further testing of new characters. These characters were not a part of the trained and tested datasets. So, a json file was created to harbor the saved model as shown in the Figure 9.
Step 7: Creating a graphical user interface to draw a character for prediction
The Figure 10 shows a code for creating an interface that serves as a platform to draw any preferred letter to test the CNN model to see if it can rightfully predict new hand drawn characters which are not a part of neither the trained dataset nor the tested dataset. Setting the parameters for the interface and calling the paint interface to allow drawing of the characters.
Figure 11 shows a hand drawn letters in Twi, ready to be learned by the CNN model to predict the letters respectively. This figure shows the target size of the drawn image which is a jpeg format and it’s being saved into a folder called “Singleprediction” and then the model will pick the image saved, analyze it based on learnt or trained patterns and then predicts the character.
Figure 12 shows a successful prediction of the ɔ and Ɛ letters respectively.
4.3 Discussion of Results
The convolutional neural network architecture achieved an average accuracy of 88.15% on a dataset with few gray-scaled images, which is 13,200. These images were of size 32 by 32, height and width respectively. The network architecture harbored three (3) convo-layers for feature extraction alongside with the ReLU activation function and three (3) fully-connected layers in which one of them is a flatten layer and final fully connected layer had a softmax activation function for prediction. As shown in figure 8 there was a steady increment from one epoch to the other with just 15 epochs during training. The training accuracies and validation accuracies have a close gap in each step of the training periods. In two separate datasets, tested the accuracy of the proposed model (see Figure 5, the neural network used to classify Twi handwritten letters). To classify the Twi handwritten numbers, 80 percent of the 13,200 data points were used for training and 20 percent for testing. It took into account the rate of learning, quantity of hidden neurons, and a batch size of 32 when training the convolutional neural network with three (3) convolutional layers for the tests and three (3) fully connected layers. The results showed that raising the network scale increased the efficiency of the convolutional neural network, but there was one big drawback: over-fitting due to the prolonged training period and large gaps between training and testing accuracies. On the other hand, by adjusting the batch size, it is feasible to achieve the model's ideal state. When the batch size was raised to a specific amount, such as 50, the model cannot be trained due to the rule of thumb. Moreover, the batch size was determined by the amount of memory available. The impact of batch size is depicted in Figure 7. To prevent the problems highlighted above, the trained model was fine-tuned to produce this good outcome.
4.3.1 Model Summary
Each layer in traditional neural networks is made up of a group of neurons. The input to these networks is transformed through a series of hidden layers coupled to previous and subsequent layers by neurons. It was chosen to apply the CNN on the proposed Asante Twi dataset because of its notable and increased performance feature. Below is the model’s summary:
The CNN model was given input images with a size of 32x32 pixels each from the Twi dataset. The model's 1st layer was a 2D convolutional layer with a 5x5 kernel size. Pixels of every input image is used by this convo-layer. A 28x28 feature map was used to implant the result of this layer. To create a feature vector, this feature map was integrated with geometrical features. Every output of the convolutional layer was activated using a rectified linear unit as an activation function. ReLU was chosen because of its capacity to solve the gradient vanishing problem. For stimulation, the ReLU utilizes the intrinsic threshold invariant, which is analogous to the brain of human’s routine. The output results of the initial convo-layer were sent to the maximum-pooling layer bearing a stride of 2 for nonlinear down-sampling. A 14x14 feature vectors were then embedded onto the 2nd convo-layer as a result of this layer's output. On the input, a 3x3 mask of the kernel was applied. This layer's output was routed back into a max-pooling layer for non-linear down-sampling with a stride of 2. The output results of this layer were 6x6 pixels, which was sent into the third 2D convolutional layer. Flattening was used to create a 1D feature vector from the output of these layers. After the convo-layer and maximum-pooling layer, the flattened layer was required to utilize a fully-linked layer. The two completely connected (dense) layers are employed towards the end of the model. With a dense of 44, the final fully-connected layer utilizes a softmax activation function to classify output based on the distribution of probabilities for each class. The Adam optimizer is utilized to optimize adaptive learning as shown in figure 6. The Adam optimizer was chosen for our model because it scales the learning rate with square gradients. During this implementation, the optimizer's default value was used to produce the best results. Various layers, learning rates, optimizers, and datasets of size 32 × 32 were examined throughout the experimentation phase. The finest outcomes, however, were attained using the model provided in Table 1. With fewer parameters, the model performed better, computation and resource usage were minimal.