## 2.5 Fusion of the multi-regression models

This paper proposes a method for fusing the multi-regression models together based on the histogram information of the blood glucose values in the training set. Figure 3 shows the procedures of our proposed algorithm. The details of the algorithm are as follows.

Step 1: Fig. 4 shows the histogram of the blood glucose values in the training set. The centers of the second column, the third column, the fourth column, the fifth column and the sixth column in the histogram are found. For this training set, these centers are 68.1mg/dl, 96.8mg/dl, 126mg/dl, 154mg/dl and 183mg/dl.

Step 2: The random forest regression model is established using the training set.

Step 3: The feature vectors in the first validation set are used for performing the blood glucose estimation. Let be the Cartesian coordinate of the pair of the \({i^{th}}\) reference blood glucose value and the \({i^{th}}\) estimated blood glucose value in the first validation set. Figure 5 shows these coordinates. At the same time, the ideal coordinates based on the centers of these 5 columns in the histogram of the blood glucose values in the training set found in Step 1 are also plot in the figure. In particular, these 5 ideal coordinates are (68.1, 68.1), (96.8, 96.8), (126, 126), (154, 154) and (183, 183). They are on the straight line with the slope equal to one and passing through the origin as shown as the black dots in Fig. 5.

Step 4: Let a1, a2, a3, a4 and a5 be 5 non-overlapped subsets in the first validation set defined based on these 5 ideal coordinates. In particular, the Euclidean distances between each and these 5 ideal coordinates are computed. The \({i^{th}}\) feature vector and the \({i^{th}}\) reference blood glucose value are assigned to one of these 5 subsets using the minimum Euclidean distance rule.

Step 5: An individual random forest regression model is established using each subset of the first validation set. Hence, 5 random forest regression models are established in total.

Step 6: For each feature vector in the second validation set, the blood glucose values are estimated using these 5 random forest regression models established in Step 5. Hence, they are 5 estimated blood glucose values for each feature vector in the second validation set.

Step 7: Let b1, b2, b3, b4 and b5 be 5 non-overlapped subsets in the second validation set based on these 5 random forest regression models established in Step 5. In particular, the Euclidean distances between the reference blood glucose value and these 5 estimated blood glucose values for each measurement in the second validation set are computed. The \({i^{th}}\) feature vector and the \({i^{th}}\) reference blood glucose value are assigned to one of these 5 subsets using the minimum Euclidean distance rule.

Step 8: A random forest classification model is established using all the feature vectors in the second validation set and all the classification labels obtained in Step 7. Here, the classification label refers to the index of the regression models established in Step 5.

Step 9: For each feature vector in the test set, the index of the regression models is found using the random forest classification model established in Step 8.

Step 10: For each feature vector in the test set, the blood glucose value is estimated using the regression model defined in Step 5 indexed by Step 9.