Fog can be considered as a computing paradigm which performs IoT applications at the edge of the network. The Fog improves the QoS metrics such as (bandwidth efficiency and energy consumption) and reduces latency. The main mission of fog is to deliver data and place it closer to the user.
4.3.1. The proposed GDM module
The proposed GDM module is composed of two main sub-modules: (i) Data Finding Methodology (DFM), and (ii) Explainable Prediction Algorithm (EPM) using DNN.
- Data Finding Methodology (DFM)
The DFM is used to replace the unused data to free the cache space for the new incoming data items. The cache replacement is very important in the case of healthcare system as the incoming vital signs are frequent and must be replaced continuously. Caching in a fog environment is constrained by bandwidth limitations, power limitations, and cache space limitations. To discriminate between data items that should be preserved in the cache and those that should be discarded when the cache is full, a decent replacement mechanism is necessary.
The network is divided into fog regions and each region has a Master Node (MN) that manages the communication in each fog region. The MN collects the required features about each fog node such as: (i) Existing Data (ED), (ii) Time-To-Live (TTL), and (ii) Cache Size (CS). The MN periodically checks each data features to delete the data items with zero TTL. If the cache of the Fog sever is full and there is an incoming data, the MN can decide to remove a data item according to some criteria. Each FN has a table called Data Cache Table (DCT) which contains information about each di in its cache memory such as: (data item (di), Access Time (TA), Size of data (S), Access Frequency (FA), Access Count (AC), Time-To-Live (TTL), and Cache Free Size (CFS) as shown in Table 1.
Table.1. Data Cache Table (DCT)
Data item (di)
|
Access Time (TA)
|
Size of data (S)
|
Access Frequency (FA)
|
Access Count (AC)
|
Time-To-Live (TTL)
|
Cache Free Size (CFS)
|
Refers to data item's number.
|
The time at which data item enters the cache of the FN.
|
Size of data item.
|
The value of how many times di was accessed. It indicates the importance of the di.
|
Each data item maintains a count which gives the number of FNs having the same data.
|
TTL is a time period given to each di when it is located at this cache.
|
The free space in the cache.
|
Using PNN, algorithm can decide to remove a data item and replace it with new incoming data according to its features. The input to the PNN is: TA, AC, and FA. The output of PNN is Data Replace (DR). DR can be Yes or No. The steps of PNN based cache replacement strategy are shown in Algorithm 1.
- Explainable Prediction Algorithm (EPM) using DNN
This section proposed EPA model that aimed to detect the incidence of GDM among pregnancies. In addition to provide understandable explanation to the predicted output. We evaluated our model based on MIMIC III dataset. As shown in Figure 3, The proposed EPM consists of four main steps: (a) Data Collection: collecting the required dataset using PostgreSQL, extracting data from various tables include (patients, chartevents, D_itmes, lab-events, and input_events), (b) Data Preprocessing: The output from the first step cleaned and preprocessed using different steps include (removing outliers, standardization and balancing), (c) Feature Extraction: utilizing DNN to build classification model that could detect the incidence of GDM. (d) Developing DL model: The output decision then utilized SHAP explainer to provide understandable explanation to the developed decision. The performance of our model evaluated using unseen data to ensure the efficiency the proposed model is promising, accurate and explainable.
a) Data Collection
Medical Information Mart for Intensive Care III (MIMIC III) is a benchmark dataset that developed by MIT Lab. It includes HER data for patients inside ICU. MIMIC III accessible by getting confirmation from Physionet Organization. MIMIC III includes the data for 53.422 distinct patients. 4750 measurement and 390 laboratory tests included in MIMIC III dataset. As shown in Figure 3, in this study we extract the data from MIMIC III dataset include patient’s demographics (i.e., age, gender, BMI), vital signs (i.e., heart rate, respiratory rate, glucose level, etc.) and laboratory test (i.e., Albumin, Creatine, Cholesterol, sodium, etc.) The present study was conducted on 8740 pregnant women according to inclusion criteria includes: (i) female gender that was adult (age>20). (ii) Recorded as pregnant in mimic iii database (item_id (pregnant= 225082, pregnant due date= 225083). Gestenail age between 6 to 26 weeks. Existing of required vital signs and laboratory tests. Features used in EPM is detailed in Table2.
Table.2. Features used in EPM
Feature_ID
|
Feature_Name
|
UOM
|
Average for GDM
|
Average for non-GDM
|
P_Value
|
|
BMI
|
-
|
28 ± 6.2
|
21.66 ± 3.2
|
<0.05
|
3692
|
Weight Change
|
kg
|
12±12.8
|
10±7.9
|
<0.05
|
3583
|
Previous Weight
|
Kg
|
75 ±15.3
|
66 ±7.2
|
<0.05
|
3446
|
gestenail age
|
|
24.4 ±1.2
|
18 ± 2.3
|
<0.05
|
1127
|
WBC (4-11,000)
|
(*103 /µm)
|
9.48 ±2.6
|
8.87 ±1.3
|
<0.05
|
626
|
Neutrophil
|
%
|
69.21 ±8.9
|
71± 8.7
|
<0.05
|
220635
|
PCT
|
%
|
0.20 ±0.05
|
0.17±0.61
|
<0.01
|
220645
|
Sodium
|
mEq/L
|
142±3.2
|
135±4.2
|
<0.05
|
223830
|
PH (Arterial)
|
-
|
7.45±0.2
|
7.35±2.2
|
<0.01
|
223751
|
Non-Invasive Blood Pressure
|
mmHg
|
125±5.8/90±5.6
|
115±3.8/75±3.4
|
<0.05
|
2381, 220045
|
Heart rate
|
Bit per M
|
70±23
|
60±22
|
<0.05
|
646, 5820
|
Spo2
|
%
|
95 ±4.2
|
95 ±5
|
|
1126
|
Platelet
|
(×103/µm)
|
231.0±62.6
|
198.0±62.6
|
<0.05
|
783
|
lymphocyte
|
%
|
25.9±7.4
|
24.8±6.9
|
<0.05
|
772,227456
|
Albumin (>3.2)
|
(g/L)
|
44±9.8
|
3412.2
|
<0.05
|
1529
|
Glucose
|
mg/dL
|
100±25.3
|
90±22.7
|
<0.05
|
1525
|
Creatinine
|
mg/dL
|
0.7±0.5
|
. 0.6 ±0.2
|
<0.02
|
1523
|
Chloride
|
(mEq/L)
|
100±3.2
|
96 ±4.4
|
<0.05
|
3684
|
Vitamin E
|
mg/l
|
9.20±2.37
|
10.80±5.01
|
<0.05
|
1522
|
Calcium
|
mg/dL
|
9.3 ±1.8
|
8.6 ±2.2
|
<0.05
|
b) Data preprocessing
The output from the first step cleaned and preprocessed using different steps include removing outliers, standardization and balancing [65]. The steps of data preprocessing are as follow: (i) Data balancing: Class imbalance is a common problem, especially with medical dataset. In MIMIC III a minor number of pregnant women have GDM which may lead to the problem of imbalanced dataset. Two main techniques commonly used to handle this issue include oversampling [66] and under-sampling[67]. Oversampling techniques used to increase number of samples in the minority class such as synthetic minority oversampling technique, where under sampling used to remove samples from the majority class such as Tomek link and random under sampling. In this study we used the random under sampling technique to keep the data balanced. The main advantage of using under sampling technique is that it doesn’t add any noise to the dataset.
(ii) Handle missing values: MIMIC dataset includes about 15-20% of missing data. Several statistical techniques used to impute the missing values such as expectation maximization [68], hot decking encoding [69], etc. in this study we removed data with more 50% missing data. We only selected patients that have at least one record for each vital signs per day. Then, forward and backward filling used to fill patient’s data. (iii) Scaling data: The extracted features have different values which may vary in their value. These variations usually affect classifier performance. Therefore, in this study we scaled all features to be ranged from 0 to -1 using Minmax scaling [70]
c) Feature Extraction
In this section we extracted two feature subset A, and B as shown in Table 3. Feature set A: include the main vital signs include (heart rate, glucose level. SPo2, blood pressure, etc.), and some lab tests include (PCT, total burlibun, etc.). feature set B: include all features in feature A, in addition to some features related to pregnancy such as Gestenail age, weight change and other lab features such as Lymphocyte, Sodium, Vitamin E, Neutrophil, etc. these features have a critical effect of GDM detection. For example, Vitamin E is a critical measure to maintain the metabolic of the body and scavenging radical activities. The deficiency of Vitamin E among pregnancies may lead to vascular endothelial, incidence of GDM and hypertension, in addition to placental and premature birth [51]. Therefore. Considering vitamin E is important in GDM prediction. The same for Lymphocyte, the count decreases during the first and the second trimesters and increased during the third one. Increasing lymphocyte may also contribute to irregular glucose level.
Table.3. Features used in model A and model B
Model
|
Features
|
Model A
|
Age, BMI, Respiratory rate, Heart Rate, Glucose level, SPO2, blood pressure, Calcium, Sodium, PH (Arterial), Total Bilirubin, PCT
|
Model B
|
Age, BMI, Weight change, Gestenail age, Respiratory rate, Heart Rate, Glucose level, blood pressure, BUN, PH (Arterial), SPO2, PTT, Vitamin E, Neutrophil, Lymphocyte, Glucose, Creatinine, Creatinine, Calcium, Sodium, PCT
|
d) Developing DL model
Dl model includes 20 input dimensions using dense and dropout layers. Dense layers considered a neural network that connected deeply. Each neuron in each layer receives the output from the previous layers. Dense layers also utilized to change the vector dimension. Dropout layer is regularization approach that used to randomly ignoring some neurons during training process to avoid overfitting [71]. As shown in figure 4, in the hidden layers, we used activation function rectified linear activation function or “ReLU”, it is a liner activation function that out the input directly if it is positive, otherwise, it will output zero. In the last layer, we utilized the sigmoid activation function for binary classification[72]. This result in a robust network that have a good generalization ability and less likely to overfit.