Longitudinal modeling of fasting blood sugar variation over time among adult diabetic patients in case of Adama hospital medical college

Background: Diabetes Mellitus (DM) is a chronic, progressive disease characterized by elevated levels of blood glucose. Despite the fact that most international association/organization gave attention toward diabetes control and prevention by healthy professional, still diabetes and its complication such as cardio vascular, blood vessels, eyes, kidneys and nerves become a major cause of premature death and disability across the world. The overall aim of this study is assessing fasting blood sugar variation over time and its determinant among diabetic patients. Methods: Data were obtained from Adama Hospital Medical College diabetic patients who have been active in the follow-up treatment from September 1, 2018 to August 30, 2019. The data consists of basic demographic and clinical characteristics of 312 DM patients were selected using simple random sampling techniques and of whom 177 were males and the rest 135 were females. The linear mixed effect model for longitudinal data analysis was used by taking the correlation between Fasting Blood Sugar (FBS) level of patients into account. Linear mixed model, random intercept and slope models were used for feting the data. Results: The results from the linear mixed model with unstructured co-variance structure showed that for one-month change in time decreases log FBS level by 0.0111267 mg/dl. While a unit increase in Body Mass Index (BMI) of a patient on treatment, the log FBS level was increased by 0.0434 mg/dl. Similarly, a unit increase in Diastolic Blood Pressure (DBP), the log FBS level was increased by 0.0004749 mg/dl. Being tertiary and secondary level of education decreases logFBS level by 0.0058844 and 0.0055161 respectively compared with patients with no education. Conclusion: Age, Educational status, Dietary type, Drug type, History of hypertension, BMI, DBP, Time, interaction effect of Age, history of hypertension, Dietary type, other comorbidity at baseline with time were the significant determinant for the change in mean FBS level of the diabetes patients over time. Based on the findings of our study and WHO recommendation, maintaining of healthy body

weight, by taking healthy diet along with lower blood glucose level is essential to control blood sugar in body and to prevent long term complication.

Background
Diabetes mellitus is a serious, chronic disease that occurs either when the pancreas does not produce enough insulin (a hormone that regulates blood glucose), or when the body cannot effectively use the insulin it produces. DM is a chronic, progressive disease characterized by elevated levels of blood glucose. Diabetes can lead to serious damage in many parts of the body such as heart, blood vessels, eyes, kidneys and nerves and can increase the overall risk of dying prematurely (WHO, 2018;Nazir et al, 2018). Diabetes mellitus may exist with characteristic symptoms include excessive exertion of urine, thirst, constant hunger, weight loss, vision changes and fatigue.
Early diagnosis can be performed through relatively in expensive testing of blood sugar (Ogurtsova et al, 2017).
There are three major categories of diabetes such as type I diabetes, type II diabetes and gestational diabetes based on etiology and cause of disease. Type I diabetes is happened when the body failed to produce the insulin required and type II diabetes is occurred when the body produce high level of blood sugar or glucose resulting from defects of insulin production or ineffective insulin. The gestational diabetes is a type of diabetes that occur during pregnancy (WHO, 2006). This study focused on the two type I and type II diabetic categories. The number of people with diabetes has steadily risen over the past few decades, due to population growth, the increase in the average age of the population, and the rise in prevalence of diabetes at each age (WHO, 2018). Now a days there are 463 million people between 20 and 79 years of old living with diabetes world wide (Roglic, 2016). In 2019, 50% or 231.9 million of the 463 million adults living with diabetes, (overwhelmingly type 2 diabetes) are they didn't know that they have the disease. About 79.4% of diabetic patients in the world they are living in low and middle income countries and most of them are within the age group of 40 up to 59 years. In 2019 there were an estimation that around 4.2 million adults between the age of 20 and 79 years were died because of diabetes. Which means in every eight seconds there is a death of individuals due to diabetes. Among all death causes in the world diabetes scored 11.3%. Almost half (46.2%) of deaths associated with diabetes are in people under the age of 60 years the working age group (Federation, 2019 (Ogurtsova et al, 2017).
Essentially, the development of diagnostic and therapeutic for diabetes patients has led to increasing the survival rates and decrease Diabetes related death and its complications such as heart disease, stroke, retinopathy, loss vision, chronic kidney disease and hypertension by controlling the blood sugar. However, the complete prevention of complications is not possible (Mahmoudi, 2006 Where, N is the total sample size, d is effect size sample (0.8), m is number of time points repeated measurement was taken, ρ is correlation between repeated measurements and σ 2 is variance of outcome variables all taken from previous study. In our case assuming significance level of 0.05, power of the study 0.8, m = 9 , which is the number of time points repeated measurements was taken, ρ = 0.5 and effect size of 0.8, from study conducted in Jimma we have σ 2 = 12.72 (random intercept model) (Aniley et al, 2019) Using 2 α z = 1.96, Zβ = 0.842 and inserting all quantities in the formula:

Outcome Variable
The outcome variable is continuous longitudinal repeated measurement per individuals, which is fasting blood sugar level measured in milligram per deciliter.

Independent Variables
The predictor variables which assumed to influence the fasting blood sugar of diabetes patients were age, sex, marital status, residence, educational status, exercise activity, frequency of meals, dietary type, history of hypertension, alcohol use, body mass index of patients, family history of DM, systolic blood pressure, diastolic blood pressure, co-morbid condition and initial drug given for patients (Islam, 2017;Taylor & Lobel, 1989;Ikezaki et al, 2002;Baltazar et al, 2004;Miller et al, 2002;Alemu, 2015).

Data Collection Procedure
Data for this study was secondary data routinely recorded from patients' chart in the study area. The health management information system (HMIS) card number used to identify individual patient cards.

Method of Data Analysis
Longitudinal data is defined as the data in which repeated measurements are taken on a subject through time (Carey & Wang, 2011). For longitudinal data, there are two sources of variations: within-subject variation' the variation in the measurements within each subject and this variation allows studying changes over time, and between-subject variation' the variation in the data between different subjects (Laird, & Ware, 1982). In longitudinal data analysis, observations of individual over time are correlated of each other this is because multiple measurements are taken on the same subject at different points in time (Diggle et al, 2002) and; from clustering, where measurements are taken on subjects that share a common category or characteristic that leads to correlation. Longitudinal data can be either continuous or discrete binary (Diggle et al, 2002). This study, focused on both cases of binary and continuous repeated measurements.

Exploring Mean Structure
The average evolution describes how the profile for a number of relevant sub populations (or the population as a whole) evolve over time and the results of this exploration are used to choose a fixed-effects structure for the linear mixed model and for the detection of inclusion of different form of time effects like linear, quadratic and so on (Shek & Ma, 2011).

Exploring the Correlation Structure
It helps to describe how measurements within an individual correlate. Pair-wise scatter plots matrix and pair-wise correlation matrix are used for exploring the correlation structure (Pusponegoro, 2017;Liu et al, 2012).

Exploring the Variance Structure
To explore the variance structure of the data three plots are used. The first one shows the average evolution of the variance as a function of time and the second produces the individual profile plots of the data which shows whether there is a considerable within and between subject variability. The third is the interaction plot which is used to plot the variance functions separated for different groups (sex: male, female etc.) as a function of time. The covariance structure selected reflects the correlation between successive FBS levels. The four most commonly used covariance structures are compound symmetry (CS), Toeplitz (TOEP), unstructured (UN) and autoregressive (1) (AR (1)) (Liu et al, 2012;Shek and Ma, 2011).

Linear Mixed Model for Longitudinal Data
Mixed models provide a flexible and powerful tool for the analysis of data with complex covariance structure, such as longitudinal correlated data (Guo, 2004).
Linear mixed effects models for repeated measures data formalize the idea that an individual's pattern of responses is likely to depend on many characteristics of that individual, including some that are unobserved, where these unobserved effects are then included in the model as random variables, or equivalently called, random effects (Der & Everitt, 2005;Van Montfort, et al., 2010). The name mixed model indicates that the model contains both the fixed or the mean model component and the random component. It can use for variable effects either fixed or random depending on how the levels of the variables that appear in the study are selected (Laird and Ware, 1982). It can also use for data with unequal number of measurements per subjects. A fixed effect is considered to be a constant which we wish to estimate, but the random effect is considered as just an effect coming from a population of effects (Fitzmaurice et al, 2008).
Generally, the linear mixed model is defined as Where Yi is the response vector for the i th subject, Xi is the ni*p fixed between-subject design matrix, β is the p × 1 vector of fixed effects (population averaged) assumed common for all subjects, Zi is the ni*q random within-subject design matrix (subject specific design matrix). bi is a q-dimensional vector of random-effect parameters bi~N(0, D), b1,…,bN are independent, ɛi is ni x 1 vector of random errors and ɛi~N(0, Ʃi), D and Ʃi are variance components (Hallahan, 2003).
In case of our data, if we model with response variable fasting blood sugar (FBS) over time, it can be modeled as

Random intercept model:
A random-intercept model, consists of augments, the linear predictor and a single random effect for subject i (Kuznetsova et al., 2017), it is expressed as Random intercept and slope model: The vector of random effects are assumed to follow normal distribution with mean vector 0 and variance covariance matrix D, the linear predictor is written as (Kuznetsova et al., 2017;Zeger and Huang, 2014).
Fixed effects are the same as before in the random intercept model and b10 i & b11i assumed to have a bivariate normal distribution as follows Fixed Effect Variables: Fixed effects are the covariate effects that are fixed across subjects in the study sample. These effects are the ones of our particular interest for example, Sex where male and female, Age group, Marital Status, Baseline fasting blood sugar, baseline hemoglobin level and Baseline BMI are considered as fixed effect.

Random Effect Variables:
Random effects are the covariate effects that vary among subjects and the model parameters are random variables. A random effect model is generally something that can be expected to have a nonsystematic, unpredictable, or "random" influence in our data.

Methods of Parameter Estimation
The estimation methods in longitudinal studies are maximum likelihood (ML) and restricted maximum likelihood estimation (REML) techniques (Guo, 2002). The two estimations are asymptotically equivalent and often give very similar results. Both ML and REML are based on the likelihood principle, which has the properties of consistency, asymptotic normality, and efficiency. The difference between ML and REML is the construction of the likelihood function. The REML estimation method applies ML estimation techniques to the likelihood function. The difference is that, the REML estimation method is associated with a set of "error contrasts" rather than associated with the original observations. Therefore, it will lose degrees of freedom and give less biased estimates of the variance components. The bias issue cannot be neglected, especially when the number of parameters is not small relative to the total number of observations (Verbeke, 1997).

Missing Data in Longitudinal Studies
Missing data is common in longitudinal studies (Ibrahim & Molenberghs, 2009). In longitudinal measurements, missing data are the main problem of analysis. Since missing data may hide the true values that are important in making an actual estimation. Therefore, ignoring missing value leads to biased estimate.

Methods for handling missing data
There are different methods and strategies available to handle missing data. Graham (2009) defines three conditions that should be satisfied in a good method. First, the method should yield unbiased estimates of a variety of different parameters. Second, the method should include a way to assess the uncertainty about the parameter estimates, and third, the method should have good statistical power.
Multiple imputation is the popular method for handling missing values in a data. It replaces each missing value with five or more acceptable values, representing a distribution of possibilities (Rubin, 2000). The multiple imputation inference involves three distinct phases (Rubin, 2000): The first step is the missing data are filled m times to generate m complete data sets, then the m complete data sets are analyzed by using appropriate procedures and finally the results from the m complete data sets are combined for the inference.

Model Comparison Techniques
The aim of model comparison is to choose the most parsimonious model that provides best fit to the data. Even though, there are different techniques that uses to select the best-fitted model on this paper we used Akaike's information criterion (AIC), Bayesian information criterion (BIC) and Likelihood ratio test .
Akaike's information criterion (AIC) and Bayesian information Criterion (BIC) are given by the formula Where, k denotes the number of parameters in the model and N the total number of observations used to fit the model. If we use AIC and BIC to compare two or more models for the same data, we prefer the model with the lowest AIC and BIC values (Zhang and Davidian, 2001;Pourahmadi and Daniels, 2002;Lin and Lee, 2008;Akaike, 1974;McQuarrie & Tsai, 1998;Schwarz, 1978).
Likelihood-ratio test is constructed by comparing the maximized log likelihoods for the Saturated (full) model and reduced models, respectively, and the test statistic is expressed as: Where, L0 and Lm are the maximum likelihood estimates that maximize the likelihood functions of the reduced and full or saturated model, respectively (Shek and Ma, 2011).

Model Diagnostics
In linear mixed effects model, the assumption is that, residuals and random effects are normally distributed and uncorrelated with the error term and violation of this assumption does affect the parameter estimates and standard errors of the residual effects. Residual plots can be used visually to check normality of these effects and to identify any outlying effect categories. Examining the plot of the standardized residuals versus fitted values by any covariates of interest can give a better feeling (Lange & Ryan, 1989;Zhang & Davidian, 2001). The assumption of normality for the within-group error was assessed with the normal quantile plot of the residuals by covariates.

Distribution of Fasting Blood Sugar Level Data
The normality of the data was checked by the boxplot. This study was found that the actual fasting blood sugar level were not normal at all-time points as the test revealed that significant deviation from the assumption of normality. From the below Figure 1, boxplot of the actual FBS data shows that distribution of FBS is skewed to the right (high FBS level) especially for minor points. Hence, transformation of actual fasting blood sugar is needed. After the logarithmic transformation, right-side plot, the outlier observations observed were minimized in logFBS than actual fasting blood sugar level, hence the data attained slight normality for logarithmic transformation when we compare to actual fasting blood sugar. Therefore, logarithmic transformed fasting blood sugar level is better than actual FBS.

Exploring Variance Profiles
The variability structure plays a great role in identifying the pattern of outcome variable and gives information on what variance-covariance structure to be expected.
The variance structure for log FBS shows an irregular pattern over time. As Figure 4 showed there was high variability at the baseline and low variation between patients

Exploring Correlation Structure Matrix
The correlation matrix is used to show the dependence between repeated measurements and the correlation between repeated measurements of the responses over time. The correlation matrix shown in Table 1 revealed a positive correlation between any two repeated measurements which relate the correlation deemed decreasing over time. Since the off-diagonal correlation has no constant value over time, it gives the clues on correlation structure may be unstructured.

Variance Covariance Structure
The variance covariance structure is the important issues to be included in the longitudinal data analysis. Comparing with Akaike information criteria (AIC) and Bayesian Information criteria (BIC) the correlation structure with the smaller value of AIC and BIC was chosen, so, the unstructured correlation structure best explained the model under the study which is displayed on Table 2. Therefore, unstructured variance covariance was used in identifying the correlation structure

Random effect model
In Longitudinal data analysis, random effect should be included to the model in order to account between individual variability. The need for model with random intercept and slope were checked by AIC and likelihood ratio test. So, as we can see from Table 3 the inclusion of random intercept and random slope is reasonable, so in the final model both random intercept and random slope with linear time effect was considered for this study.

Multi-variable Linear Mixed Model
After we chose the reasonable model, the variables are selected in the univariate and fixed effects model using backward elimination method were incorporated into multivariable linear mixed effect model and then the model with estimated value of significant covariates were fitted. The restricted maximum likelihood estimates of covariates and standard errors with its cross-ponding significance value (P-value) is found in the following Table 4.   subjects. This means that the highest variability came from the random intercepts. It also shows that the standard deviation of the random intercepts was higher than that of the random slopes, showing to higher between patient variability than the within patient variability. The value of the intra-class correlation which is the ratio of the between-patients variance to the total variance is presented in Table 4. It tells us the proportion of the total variance in log FBS level (68.7%) that is accounted by variance among patients. The value of the intra class correlation revealed that mixed model is necessary to fit the data.

Model Diagnostic Checking
In model diagnostic checking for longitudinal data analysis, we use residual plots to evaluate the validity of model assumption as shown on Figure 5. The normal quantile plot shows that the residuals do not exhibit departure from normality. The "Residual fit" concentrated around zero implies that linear mixed effect model were well fitted the FBS data.

Discussion
From linear mixed model the study revealed that people who mostly eat meat were exposed to DM disease and could not control the FBS to attain a normal fasting blood sugar due to lack of mineral contents for disease protection. To control and manage blood sugar, Diet which related with cereals, vegetables, and fruits are recommended while other dietary types related with eggs, meat, proteins are increases blood sugar level which is related to diabetes and its complication. This indicated that the combination of the two drug has more important in reduction and control of fasting blood sugar as compared to insulin and OHA drug. When compared with patients who had no history of hypertension, those having history of hypertension increases fasting blood sugar level. The interaction effect of history of hypertension with time was also significant effect on fasting blood sugar variation over time that means that there were significant difference in variation of log FBS level between patients who had history of hypertension and had not. The interaction effect of comorbidity at the baseline with time was found to be significant effect on fasting blood sugar variation over time. There is significant difference of fasting blood sugar change between patients who comorbid with other disease and who are free from comorbidity of other disease at baseline. The rate of reduction in fasting blood sugar level change was higher for those patients with comorbidity at baseline when compared to those who had no other comorbidity diseases at baseline. Being a retrospective study, this study has limitations of secondary data collection; unequal number of measurement per subjects and we were unable to find some variable like income, side effect from other medication, illness and stress that may have influenced the rate of change in FBS level. Therefore, more public health and epidemiology researchers are needed to examine the impact of those predictors on population health, people living with diabetes to avoid its complications over time and to identify new risk factors for diabetes.

Conclusion
In this research, we demonstrate that the use of linear mixed effect model including (random intercept and random slope) with unstructured covariance structure was preferred for estimation of the rate of change of log (FBS) level experienced by patients over treatment time. With respect to time, we found that the pattern of average of fasting blood sugar level revealed a linear decrement over time that was also confirmed with the model that the estimate of time was negative. This study illustrated that, 68.7% of variability in log FBS levels were accounted by variance among patients.