Deep Convolutional Neural Network Algorithm for Prediction of the Mechanical Properties of Friction Stir Welded Copper Joints from its Microstructures

Convolutional Neural Network (CNN) is a special type of Articial Neural Network which takes input in the form of an image. Like Articial Neural Network they consist of weights that are estimated during training, neurons (activation functions), and an objective (loss function). CNN is nding various applications in image recognition, semantic segmentation, object detection, and localization. The present work deals with the prediction of the welding eciency of the Friction Stir Welded joints on the basis of microstructure images by carrying out training on 3000 microstructure images and further testing on 300 microstructure images. The obtained results showed an accuracy of 80 % on the validation dataset.


Introduction
Images have become ubiquitous in all the elds which basically means that vast amount of information can be extracted from imagery. While image classi cation has now become prevalent in elds like computer vision, self-driving cars, robotics, etc., it is fairly new in the eld of microstructures [1][2][3].
Although the above-mentioned applications differ by various factors, yet they share the common process of correctly annotating an image with one or a probability of labels that correlates to a series of classes or categories. This is known as image classi cation. The process of identifying the type of microstructure in diverse tasks linked to image-based scene perspectives has taken advantage of the combination of machine learning techniques applied to the development of neural networks.
Deep learning is basically a class of machine learning algorithms that uses multiple layers to progressively extract higher-level features from the raw input. It allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction [4][5]. It discovers intricate structure in large data sets by using the back propagation algorithm to indicate how machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. These methods are extracted by neural networks. CNN models is one of the oldest deep neural networks hierarchy that have hidden layers among the general layers to help the weights of the system learn more about the features found in the input image.
The basic architecture of a CNN consists of a convolutional layer as shown in Fig. 1 that separates and identi es the various features of the image for analysis in a process called as feature extraction and a fully connected layer that utilizes the output from the convolution process and predicts the class of the image based on the features extracted in previous stages.
The image is composed of various pixels which possess a different numerical value to yield the density within the RGB color space spectrum. The key idea is that there will be the existence of a relationship between which will further act as different features. It should be noted that the spatial arrangement of those obtained features will have no impact on the considered model. So, it can be concluded that there is not any relationship between these individual features. The Convolutional Neural Network model has variables in the form of pixels that have a natural topology. There should be a translation invariance too so that the orientation and size of the given object in an image do not affect the working of the Convolutional Neural Network architecture.
Kernels are used to capture the relationship between the different features i.e. different pixels composing the image. Kernels are considered as a grid of weights that are overlaid on the particular portion of the given image centered around a single pixel. Once the kernel is overlaid, each weight from the kernel is multiplied by the given pixel which is some number. As shown in Eq. 1, the output of the central pixel is the sum of all those multiplications between the kernel and its respective pixel.

1
There are three types of layers that make up CNN: 1. Convolutional Layer: This layer is the rst layer which is used to extract various features from the input images. In this layer, the mathematical operation of convolution is performed between the input image and a lter of a particular size MxM which is slided over the input image and the dot product is taken between the lter and the parts of the input image with respect to the size of the lter MxM. The output generated is known as the feature map which gives us the information about the image. This feature map is then fed to other layers.
2. Pooling Layer: A convolutional layer is followed by a pooling layer which is basically used to decrease the size of the convolved feature map to reduce the computational costs. This is performed by decreasing the connections between layers and independently operating on each feature map. There are several pooling techniques like max pooling (largest element is taken from the feature map), average pooling (average of elements is taken) etc. 4. Activation function: It usually follows a fully connected layer and introduces a non-linearity into the network. There are several commonly used activation function such as ReLu, Softmax, tanH and the Sigmoid function.
In Computer Vision, datasets are divided into two main categories: a training dataset used for training the algorithm learnt to perform the desired task and a testing dataset that algorithm is tested on. The dataset used in present study consists of two classes i.e. the microstructure images which have welding e ciency less than 80 percent and the microstructure images which have welding e ciency greater than and equal to 80 percent. The microstructure images for both classes are shown in Fig. 2a) and Fig. <2b).

Results And Discussion
The basic architecture of the model used in present study is shown in Fig. 3. Firstly, it should be noted that the convolution operation is performed between two tensors in the Neural Networks. In this operation, two tensors are taken into account as inputs which result in output as a tensor. The convolution operation is denoted by the "*" operator. The Eq. 2 is used to carry out the convolution operation.

2
The input microstructure image can be considered as (X\) and the lter can be denoted by (f\). So the convolution operation can be obtained by Eq. 3

3
The convolution operation is shown in Fig. 4.
The mathematical operations are as follows:  If the dimension of an input image is n x n and the dimension of applied lter is f x f then the dimension of an output image is given by (n-f+1) x (n-f+1).
The features which are extracted from the data by a given convolution layer are sent to the fully connected layer which generates the nal output. Generally, the fully connected layer in a Convolutional Neural Network represents a traditional Neural Network.
The convolution layer results an output in the form of two-dimensional matrix but it should be noted that the fully connected layer can only operate with one-dimensional data. So, the value generated by applying Eq. 3 is rstly converted into one-dimensional format as shown in Fig. 5.
After the conversion into one-dimensional array, the values are forwarded to the fully connected layer. The individual values obtained are treated as a separate feature representing an image. The incoming data is further subjected to two operations i.e. a non-linear transformation and linear transformation by a fully connected layers. Firstly, the data is subjected to linear transformation as shown in Eq. 4. Where, b is the bias, W is the weight and an input image is represented by X. It should be noted that weight is the matrix of randomly initialized numbers. Putting the values of Eqs. 5, 6 and 7 in Eq. 4, the following Equation is obtained: Now, in order to capture the complex relationship, non-linear transformation architecture in the form of an activation function is incorporated. In the present work, the sigmoid activation function has been used which is represented by Eq. 9.

9
Deep networks need large amount of training data to achieve good performance. In order to achieve a large dataset, image augmentation was required. Image augmentation arti cially creates training images through different ways of processing or combination of multiple processing, such as random rotation, shifts, shear and ips, etc. This can be easily done with the help of ImageDataGenerator API in Keras.
Both the classes of the dataset were rstly divided into train and test. The datasets were then augmented to increase the dataset size. The following base code was used to generate images by varying the various parameters: Following the above procedure, around 3000 images for train and 300 images for test were generated. Further, for training the model, CNN algorithm was used and Keras library was used for writing the code for the same.  In python programming, the model type that is most commonly used is Sequential type. It is the easiest way to build a CNN model and permits us to build a model layer by layer. The add () function is used to add layers to the model. In the above model, there are three convolution and pooling pairs followed by a atten layer which is usually used as a connection between the convolution and dense layer. Further, two dense layer was added with a dropout layer. Dense layer is the regular deeply connected neural network layer. A dropout layer is basically used for regularization and reduces over tting. The optimizer used was adam.
The batch size was chosen to be 16 and the model was trained with class mode as binary as there are two classes to be classi ed. The number of epoch taken was 50 for training purpose. One epoch refers to one cycle through the full training dataset. The model was then evaluated using the test dataset. The classi cation report is used for determining the accuracy of the model. The model was further checked by taking an image from test dataset and predicting its class label with the help of following code: The basic architecture for the model is shown in Fig. 3. In the model, there were three convolution layer followed by pooling layer.
Convolutional layers in a CNN systematically apply learned lters to input images in order to create feature maps that summarize the presence of those features in the input. A pooling layer is added after a convolutional layer to apply nonlinearity to the feature maps output by the convolutional layer. The pooling layer operates upon each feature map separately to create a new set of the same number of pooled feature maps. This involves selecting a pooling operation. The size of the pooling operation or lter is smaller than the size of feature map (we have used 2x2 pixels with a stride of 2 pixels). This means pooling layer will always reduce the size of each feature map. The pooling operation used was maximum pooling which basically takes the maximum value for each patch of feature map.
Further, a atten layer was added which is usually used as a connection between the convolution and dense layer. Flattening is basically converting the data into 1-d array for inputting it to next layer. The output of convolutional layer is attened to create a single long feature vector which is then connected to the nal classi cation model, which is called a fully-connected layer. A dense layer was added followed by a dropout layer and another dense layer. A dense layer is connected deeply which means each neuron in the dense layer receives input from all neurons of its previous layer. It basically performs a matrixvector multiplication and the values used in the matrix are actually parameters that can be trained and updated with the help of back propagation. The activation function used for this layer was 'relu' which is helpful in applying the element wise activation function. The dropout layer used was basically for reducing the over tting.

Conclusion
Convolution Neural Network is a powerful tool for image recognition and it has a great capability of complex problem-solving. While training the Arti cial Neural Network models, few transformations should be kept in mind. There is a nal layer with a sigmoid activation function and a single node for binary classi cation problems. The present research work basically focuses upon building a basic CNN model for the classi cation of the two classes of microstructures. The model was made with an accuracy of 80% which can be improved. Further improvement in the model can be made by: • Using more images for the dataset • Adjusting the learning rate • Adjusting the batch size which will allow the model to recognize the patterns better. If the batch size is low, the patterns will repeat less and hence convergence will be di cult whereas if the batch size is high, learning will be slow. Representation of the basic convolution operation Figure 5 Conversion of two-dimensional matrix to one-dimensional matrix Plot of accuracy with respect to the number of epochs