In the proposed model, unsupervised algorithm deep auto encoder was implemented on a Core i3 Laptop with 2.30 GHz CPU and 4 GB RAM using keras library with tensorflow as backend in python version 3.7 software environment. The performance analysis of the designed model is measured using the following performance monitors
True Positive (TP) denotes that fake dosages are correctly predicted as fake.
True Negative (TN) denotes that genuine dosages are correctly predicted as genuine.
False Positive (FP) denotes that genuine dosages is wrongly detected as an fake.
False Negative (FN) denotes that fake dosages is wrongly detected as genuine.
1) Accuracy:
Accuracy can be defined as the ratio between the number of correctly predicted samples to the total number of samples and it is calculated using Eq. (1)
$$Accuracy=\frac{TP+TN}{TP+TN+FP+FN}$$
2) Precision
Precision can be termed as the ability of the classifier to correctly label fake dosage as attacks. Eq. (2s) is used to calculate the precision of the classifier.
$$Precision=\frac{TP}{TP+FP}$$
2
3) Recall or Detection rate
Recall or detection rate can be defined as the number of correctly detected fake dosaages. Eq. (3) is used to calculate the recall of the classifier.
$$Recall=\frac{TP}{TP+FN}$$
3
4) F-measure
F-measure can be defined as the weighted harmonic mean of precision and recall. Eq. (4) is used to calculate the F-measure of the classifier.
$$F-measure=2*\frac{Precision*Recall}{Precision+Recall}$$
4
5.1. Dataset Description
The Dataset has several logs of insulin pump values for 70 different patients and this dataset is publicly available in the UCI repository [23]. Each patient record possess nearly 1000 recorded samples either by system or manually. As mentioned earlier in section (),the dataset has only four attributes those are considered as inputs and the output attribute ‘Label’ has to include manually by assigning values either ‘0’ or ‘1’.
For binary classification, the assignment of values is done by referring the values given in the attribute ‘code’. That means if the value of code is equal to 48, 57, 72 is belongs to unspecified category so it has the label value of ‘1’ (fake dosage) and the remaining code values is considered to be genuine dosages which has label value of ‘0’.
For multilabel classification, the adversarial samples were introduced to simulate the four types of attack namely long resume, Single acute overdose, single acute underdose and chronic overdose. In this regard, the input attribute ‘value’ is changed according to the behaviour of below mentioned attack. The Table 1 explores the sample distribution of the generated dataset
Long Resume
Long resume is a kind of attack in which attacker tends to send same insulin value to the insulin pump for over a period of month or week. This phenomenon leads the patient’s life to the serious illness.
Single acute overdose/underdose
This attack sends the manipulated insulin value to the insulin pump, that value will be either underdose or overdose. This dosage will be injected to the patient once in a while but not continuously for particular duration.
Chronic overdose
This attack is carried out by the injecting overdose to the patient for over the period of one month or one week. This attack seems to be very serious than the above mentioned attacks and it will bring life threat to patient.
Table 1
Sample Distribution of Dataset
Type
|
Number of Samples
|
Training
|
Testing
|
Fake
|
2318
|
579
|
Genuine
|
20757
|
5189
|
5.2. Evaluation of Binary Classification
The above Fig. 5 illustrates about the model loss occur at training and testing phase. The blue line indicates the training loss whereas the red line indicates the validation loss accordingly. These two lines convergence nicely when the number of epochs keeps on increasing. The values of both loss and validation loss are very close to each other. The Fig. 6a demonstrates about the distribution of reconstruction error calculated for fake dosage alone whereas Fig. 6b describes about both genuine as well as fake. These two diagrams shows that reconstruction error for fake dosage is higher than the genuine dosages.
The Fig. 7 explores the ROC curve plots between true positive rate and false positive rate. For this binary classification using deep auto encoder, around 0.646 is obtained as Area under curve (AUC) value. This shows how perfectly this model predicted the normal instances as normal and attack instances as attack in an unsupervised manner.
Figure 8 shows the precision-recall curve of deep auto encoder for the task of binary classification. This curve explores the relationship between recall and precision generated for various threshold values correspondingly. The larger area under the precision-recall curve denotes the highest precision and recall value achieved by the classifier. This curve shows the value of average precision is equal to 0.887.
Table 2
Comparative analysis for various parameters (Binary Classification)
1-layer
|
No of Neurons
|
Accuracy
|
Loss
|
Val.Acc
|
Val.loss
|
16
|
26.959
|
4375.794
|
26.116
|
4625.173
|
24
|
26.959
|
4375.794
|
26.116
|
4625.173
|
32
|
26.959
|
4375.794
|
26.116
|
4625.173
|
64
|
26.959
|
4375.794
|
26.116
|
4625.174
|
2-layer
|
16
|
99.97
|
2.984
|
99.97
|
2.967
|
24
|
99.97
|
2.984
|
99.97
|
2.967
|
32
|
99.97
|
2.485
|
99.98
|
2.465
|
64
|
99.97
|
9.648
|
99.97
|
9.411
|
3-layer
|
16
|
99.98
|
3.536
|
999.8
|
3.524
|
24
|
99.97
|
2.981
|
99.97
|
2.965
|
32
|
99.97
|
2.476
|
99.97
|
2.456
|
64
|
99.97
|
9.640
|
99.97
|
9.403
|
4-layers
|
16
|
9997
|
3.528
|
99.97
|
3.516
|
24
|
99.97
|
2.978
|
99.97
|
2.962
|
32
|
99.98
|
2.477
|
99.98
|
2.457
|
64
|
99.98
|
9.647
|
99.98
|
9.419
|
In Table 2, the performance of autoencoder is evaluated with various number of hidden layers as well as number of neurons for binary classification. At the end of the evaluation, the most optimized number of layers is 3 with 16 neurons.
5.3 Evaluation for multilabel classification
This Table 3 shows comparative analysis for multi-label classification using various values assigned for the parameters, the optimized number of encoding layer is 64 with neurons and it is highlighted in table. From this observation it proves that for both kind of classification, there is only slight variation in accuracy and loss is exist, even if the layers and neurons got changed.
Table 3
Comparative analysis for various parameters (Multilabel Classification)
3-layers
|
No of Neurons
|
Accuracy
|
Loss
|
Val.Acc
|
Val.loss
|
32
|
95.349
|
0.699
|
94.913
|
0.765
|
64
|
95.349
|
0.146
|
94.913
|
0.171
|
128
|
95.304
|
0.581
|
94.913
|
0.671
|
4-layers
|
32
|
95.349
|
0.229
|
94.913
|
0.264
|
64
|
95.349
|
0.702
|
94.913
|
0.766
|
128
|
95.278
|
0.160
|
94.720
|
0.189
|
5-layers
|
64
|
95.323
|
0.210
|
94.913
|
0.241
|
128
|
95.053
|
0.184
|
94.354
|
0.215
|
256
|
95.284
|
0.180
|
94.913
|
0.208
|
6-layers
|
128
|
95.349
|
0.197
|
94.913
|
0.230
|
256
|
95.349
|
0.200
|
94.913
|
0.231
|
512
|
95.362
|
0.167
|
94.913
|
0.191
|
The above graph (Fig. 10) explores the performance comparison between existing solutions designed for insulin pump system using machine/deep learning techniques with the proposed solution. This shows that the proposed model outperforms the other existing solution at high rate. In Table 4, the performance metrics of proposed DL model is compared with the existing machine learning model. This analysis shows that the proposed DL model outperforms the existing machine learning classifiers.
Table 4
Comparative analysis of Proposed model with ML models
Methodology
|
Accuracy
|
Precision
|
Recall
|
F1-measure
|
Autoencoder (Binary classification)
|
99.98
|
89.85
|
91.05
|
93.45
|
Autoencoder(Multi label Classification)
|
Normal
|
95.701
|
0.95
|
1.000
|
0.97
|
Long resume
|
94.223
|
0.95
|
1.000
|
0.96
|
Single acute overdose
|
95.113
|
0.95
|
1.000
|
0.98
|
Single acute underdose
|
95.212
|
0.94
|
99.99
|
0.97
|
Chronic overdose
|
94.123
|
0.94
|
99.99
|
0.96
|
Support Vector Machine(SVM)
|
89.851
|
0.898
|
1.000
|
0.946
|
Decision Tree
|
95.052
|
0.972
|
0.972
|
0.972
|
Naïve Bayes
|
43.019
|
0.926
|
0.397
|
0.556
|