Convolutional neural network has input layers, output layers and hidden layers. The hidden layers are consists of convolutional layer, flattened layer and a fully connected layer. The Fig. 2 shows the architecture of the proposed convolutional neural network.
Deep learning refers to the shining branch of machine learning that is based on learning levels of representations. Convolutional Neural Networks (CNN) is one kind of deep neural network. It can study concurrently. The proposed detailed analysis of the process of CNN algorithm both the forward process and back propagation. Then we applied the particular convolutional neural network to implement the typical diabetic dataset problem by java with weak. In addition, by measuring the actual time of foirard and back Ward évaluation, analyse the maximal speed up and parallel efficiency theoretically.
4.1 Role of Convolutional Neural Networks
In general, the structure of CNN includes two layers, one is feature extraction layer, the input of every neuron is connected to the local except fields of the previous layer, and extracts the local feature. Once the local features are extracted, the positional relationship between it and other features will be evaluated. The other layer is feature map layer; all calculation layers of the network is mapped of a multitude of feature map. Every feature is a level, and the weights of the neurons in the level are close.
The construction of feature applies the sigmoid function as stimulation function of the convolution network, which makes the feature shift in-variance. The numeral of free parameters of the network is reduced. All convolution layers in CNN by calculating the layer which is applied to calculate the local mean and the second extract. This particular two feature extraction constructions reduce the record’s matrix level size. Multi-dimensional input vector of heart disease data sets can exit the network, which avoids the quality of data reconstruction in feature extraction and classification procedure.
4.2 Feature Selection Algorithm
Several feature ranking and feature selection algorithms have been projected in the machine learning study. The purpose of these algorithms is to makes unfit or unnecessary features from a feature vector. For the implementation of this research work, it applied feature ranking and selection modes by two initial steps of overall architecture: subset creation and subset calculation for the ranking of all features in every data set. Filter mode was applied to measure all subsets.
4.3 Information Gain
The proposed feature choice both class membership and the presence/absence of a specific period are observed as random variables and one evaluates how more information around the class membership is increased by finding the presence/absence statistics as applied in decision tree induction. So, if the class membership is taken as a random variable C with two values, positive and negative, and a word is similarly observed as a random variable T with two values, present and absent, it is by applying the information practical statement of ordinary information specified as,
IG(T) = H(C) H(C/T) = Ʃτ, cP(C = c,T = τ) In [cP(C = c,T = τ)/cP(C = c).P(T = τ)] (1)
Here, τ ranges over {present, absent} and c ranges over {c+, c−}. As pointed out above, this is the measure of information about C (the class label) that increases by finding T (presence or absence of a word).
4.4 Back Propagation Algorithm
CNN technique is a multilayer perceptron, which is the specific system for recognition of two-dimensional data. It has many layers: input layer, convolution layer, sample layer and output layer. The CNN algorithm has two primary procedures: convolution and sampling. Convolution procedure applies a predictable filter Fx, re-convolution of the input data (the initial phase is the input data, the input of the later convolution is the feature data of every layer, namely, Feature Compose). Then, add a bias bx, get convolution bx layer Cx. A selecting procedure: n points of each neighborhood by pooling steps, get a point, and then by scalar weighting Wx+1 weighted, add bias bx+1, and then by an activity function, make a narrow n times feature Sx+1.
The central engineering of CNN is the local tract, jointing of weights, sub selecting by time or space, and hence the training parameters extract feature and reduce the size. The benefit of CNN technique is prevention of explicit feature extraction, and learning from the training data. The neuron weights on the surface of the feature composing, thus, the network can see parallels, and reduce the multilevel of the network, Adapting sub sampling structure by the time or space, can attain a few degrees of robustness, scale and modification replacement. Input information and network topology can be a very good match, It has specific benefits in speech identification and data sets processing.
\({\text{O}}_{\text{x},\text{y}}^{1,\text{k}}\) = tanh\(\sum _{\text{t}=0}^{\text{f}-1}\sum _{\text{r}=0}^{{\text{k}}_{\text{h}}}\sum _{\text{c}=0}^{{\text{k}}_{\text{w}}}{\text{W}}_{(\text{r},\text{c})}^{(\text{k},\text{t})}{\text{O}}_{(\text{x}+\text{r}, \text{x}+\text{c})}^{(\text{l}-1, \text{x})}\) + \({\text{B}\text{i}\text{a}\text{s}}^{\left(\text{l},\text{k}\right) }\) (2)
Among them, f is the numeral of convolution cores in a feature pattern, output of neuron of row x, column y in the lth sub sample layer and kth feature pattern:
\({\text{O}}_{\text{x},\text{y}}^{1,\text{k}}\) = tanh ( \({\text{W}}^{\text{k}}\sum _{\text{r}=0}^{{\text{s}}_{\text{h}}}\sum _{\text{c}=0}^{{\text{s}}_{\text{w}}}{\text{O}}_{(\text{x}\times {\text{s}}_{\text{h}}+\text{r}, \text{y}\times {\text{s}}_{\text{w}}+\text{c})}^{\text{l}-1, \text{t}}\) + \({\text{B}\text{i}\text{a}\text{s}}^{\left(\text{l},\text{k}\right) }\) (3)
The output of the jth neuron in lth hides layer H:
\({\text{O}}_{(\text{i},\text{j})}\) = tanh \(( {\text{W}}^{\text{k}}\sum _{\text{k}=0}^{\text{S}-1}\sum _{\text{x}=0}^{\text{S}}\sum _{\text{y}=0}^{{\text{S}}_{\text{w}}}{\text{W}}_{(\text{x},\text{y})}^{(\text{j},\text{k})}{\text{O}}_{(\text{x},\text{y})}^{(\text{l}-1, \text{k})}\) + \({\text{B}\text{i}\text{a}\text{s}}^{\left(\text{l},\text{k}\right) }\) (4)
Among them, s is the number of feature patterns in sample layer output of the ith neuron lth output layer F:
\({\text{O}}_{(\text{i},\text{j})}\) = tanh (\(\sum _{\text{j}=0}^{\text{H}}{\text{O}}_{(\text{l}-1,\text{j})}{\text{W}}_{(\text{i},\text{j})}^{\text{l}}\) + \({\text{B}\text{i}\text{a}\text{s}}^{\left(\text{l},\text{i}\right) }\) (5)
4.5 Modified Back-Propagation
A speed matrix depends on the technique to evaluate the output from an NN. Especially, it is an excellent mode of acquiring the notation used in back-propagation. Back-propagation is an NN learning algorithm. The neural networks field was addressed by psychologists and neuro-biologists who wanted to create and test evaluation analogy of neurons. An NN is a set of input/output units in which every attachment has a weight jointed by it. During the learning stage, the network learns with changing the weights so far capable to predict the exact class label of the input tuples.
Output deviation of the kth neuron in output layer O:
d(\({\text{O}}_{\text{k}}^{0})= {\text{y}}^{\text{k}}- {{\tau }}^{\text{k}}\) (6)
Input deviation of the kth neuron in output layer:
d(\({\text{I}}_{\text{k}}^{0})=\) (\({\text{y}}^{\text{k}}- {{\tau }}^{\text{k}}\)) φ(\({\text{v}}_{\text{k}}\)) d(\({\text{O}}_{\text{k}}^{0})\) (7)
Weight and bias variation of kth neuron in output O:
∆\({\text{W}}_{\text{k},\text{x}}^{0}\) = d(\({\text{I}}_{\text{k}}^{0})\) \({\text{y}}_{\text{k},\text{x}}\) (8)
∆\({\text{B}\text{i}\text{a}\text{s}}_{\text{k}}^{0 }=\)d(\({\text{I}}_{\text{k}}^{0})\) (9)
Output bias of kth neuron in hide layer H:
d(\({\text{O}}_{\text{k}}^{\text{H}})\) = \(\sum _{\text{i}=0}^{\text{i}++}\text{d}\left({\text{I}}_{\text{k}}^{0}\right){\text{W}}_{\text{i},\text{k} }\) (10)
Input bias of kth neuron in hide layer H:
d(\({\text{I}}_{\text{k}}^{\text{H}})\) = φ(\({\text{v}}_{\text{k}}\)) d(\({\text{O}}_{\text{k}}^{\text{H}})\) (11)
Where, weight and bias variation in row x, column y, in the mth feature pattern, a previous layer in front of k neurons in hide layer H.
∆\({\text{W}}_{\text{m},\text{x},\text{y}}^{\text{H},\text{k}}\) = d(\({\text{I}}_{\text{k}}^{\text{H}})\) \({\text{y}}_{\text{x}.\text{y}}^{\text{m}}\) (12)
∆\({\text{B}\text{i}\text{a}\text{s}}_{\text{k}}^{\text{H} }=\)d(\({\text{I}}_{\text{k}}^{\text{H}})\) (13)
Output bias of row x, column y in mth feature pattern, sub sample layer S
d(\({\text{O}}_{\text{x},\text{y}}^{\text{S},\text{m}})\) = \(\sum _{\text{k}}^{\text{i}++}\text{d}\left({\text{I}}_{\text{m},\text{x},\text{y}}^{\text{H}}\right){\text{W}}_{\text{m},\text{x},\text{y}}^{\text{H},\text{k}}\) (14)
Input bias of row x, column y, in mth feature pattern ,sub sample layer S:
d(\({\text{I}}_{\text{x},\text{y}}^{\text{S},\text{m}})\) = φ(\({\text{v}}_{\text{k}}\)) d(\({\text{O}}_{\text{x},\text{y}}^{\text{S},\text{m}})\) (15)
Weight and bias variation of row x, column y, in mth feature pattern sub sample layer S:
∆\({\text{W}}^{\text{S},\text{m}}\) = \(\sum _{\text{x}=0}^{\text{f}\text{h}}\sum _{\text{y}=0}^{\text{w}}\text{d}\)(\({\text{I}}_{\frac{\text{x}}{2},\frac{\text{y}}{2}}^{\text{S},\text{m}})\) \({\text{O}}_{\text{x}.\text{y}}^{\text{C},\text{m}}\) (16)
Among them, C represents convolution layer:
∆\({\text{B}\text{i}\text{a}\text{s}}^{\text{S},\text{m}}= \sum _{\text{x}=0}^{\text{f}\text{h}}\sum _{\text{y}=0}^{\text{f}\text{w}}\text{d}\)(\({\text{O}}_{\text{x},\text{y}}^{\text{S},\text{m}})\) (17)
Output bias of row x, column y in kth feature pattern, convolution layer C:
\({\text{d}(\text{O}}_{\text{x},\text{y}}^{\text{C},\text{k}})\) = d(\({\text{I}}_{\frac{\text{x}}{2},\frac{\text{y}}{3}}^{\text{S},\text{k}})\) \({\text{W}}^{\text{k}}\) (18)
Input bias of row x, column y in kth feature pattern, convolution layer C:
\({\text{d}(\text{I}}_{\text{x},\text{y}}^{\text{c},\text{k}})\) = φ(\({\text{v}}_{\text{k}}\)) d(\({\text{O}}_{\text{x},\text{y}}^{\text{C},\text{k}})\) (19)
Weight variation of row r, column c in mth convolution core, corresponding to kth feature pattern in lth layer, convolution C.
∆\({\text{W}}_{\text{r},\text{c}}^{\text{k},\text{m}}\) = \(\sum _{\text{x}=0}^{\text{f}\text{h}}{\sum }_{\text{y}=0}^{\text{f}\text{w}}\text{d}\)(\({\text{I}}_{\text{x},\text{y}}^{\text{C},\text{k}})\) \({\text{O}}_{\text{x}+\text{r}.\text{y}+\text{c}}^{\text{l}-1,\text{m}}\) (20)
Total bias variation of the convolution core:
∆\({\text{B}\text{i}\text{a}\text{s}}^{\text{C},\text{k}}= \sum _{\text{x}=0}^{\text{f}\text{h}}\sum _{\text{y}=0}^{\text{f}\text{w}}\text{d}\)(\({\text{I}}_{\text{x},\text{y}}^{\text{C},\text{k}})\) (21)
4.6 CNN Algorithm Process Steps
Step 1: Take the arff data set
Step 2: Feature selection using information gain and ranking
Step 3: Classification algorithm
Step 4: Every Feature calculates fx value of input layer
Step 5: Bias class of every feature calculation
Step 6: After giving the feature map, it goes to forward pass input layer
Step 7: Evaluate the convolution cores in a feature pattern
Step 8: Gives sub sample layer and feature value
Step 9: Back-propagation input deviation of the kth neuron in o/p layer
Step 10: Lastly, produce the chosen feature and classification results