Exploratory Analysis Using Machine Learning of Predictive Factors for Falls in Persons with Type 2 Diabetes: A Longitudinal Study

We aimed to investigate the status of falls and to identify important risk factors for falls in persons with type 2 diabetes (T2D) including the non-elderly. Participants were 316 persons with T2D. They were assessed for medical history, laboratory data and physical capabilities during the hospitalization and given a questionnaire on falls one year after discharge. Two different statistical models, logistic regression and random forest classier were used to investigate important predictors of falls. The response rate to the survey was 72%; of the 226 respondents, there were 129 males and 97 females (median age 62 years). The fall rate during the rst year after discharge was 19%. Logistic regression revealed that knee extension strength (β= -0.698, P = 0.002), fasting C-peptide (F-CPR) level (β= 0.492, P = 0.009) and dorsiexion strength (β= -0.432, P = 0.047) were independent predictors of falls. The random forest classier placed knee extension strength, grip strength (0.234), F-CPR level and dorsiexion strength in the top 4 important variables for falls. Lower extremity muscle weakness as well as elevated F-CPR levels and reduced grip strength was shown to be important risk factors for falls in T2D.


Introduction
Approximately one-third of older adults experience episodes of falls each year, and, of the fallers, onefourth suffer serious injury 1 . Fall-related injuries among older adults often result in a bedridden condition.
Even without a serious injury falls are independently associated with functional decline in older adults 2 .
As a result, falls impose a substantial economic burden on individuals, society and the healthcare system 3,4 .
Major risk factors for falls are previous falls, balance impairment, decreased muscle strength, visual impairment, speci c medications, gait disturbances and cognitive decline 5,6 . Diabetes is also a risk factor for falls 7 . The risk of falls in older adults with diabetes was reported to be 1.5 -3 times higher than in those without diabetes 7 .
Although diabetic complications and anti-diabetic medications are associated withmany of these major risk factors for falls 8 , only a few studies have undertakencomprehensive analyses of the associations between those risk factors and falls in persons with diabetes [9][10][11] . In addition, the extent to which each risk factor contributed to falls has varied from report to report [9][10][11] . Despite our understanding of the seriousness of falls and the resultant high cost of medical care, identifying the few most critical factors that are strongly predictive of falls in high risk populations is lacking, partly because over 400 factors are linked with falls in adult populations 12 . Risk factors for falls are diverse and interact with each other. Not all of those risk factors contribute to the same degree to falls. Moreover, the risk of falls may vary depending on what risk factors are adjusted for. It is important to comprehensively evaluate risk factors for falls and identify those with the highest contribution because interventions should be targeted to groups at high risk for falls to effectively prevent falls. However, as the number of predictors increases, especially with a small sample size of study participants, traditional statistical models such as logistic regression could become unsuitable for identifying important variables as predictors. One of the more appropriate approaches is the nonlinear and nonparametric random forest algorithm, which is one type of machine learning based on the ensemble learning method 13 . In random forest a large number of decision trees are created using bootstrapped data, where a splitting variable at each node is selected from a randomly chosen subset of predictors. Final prediction is determined by a majority voting in the tree ensemble.
In this study, we collected data on muscle strength, balance capability, body composition, diabetic microvascular and macrovascular complications, laboratory examinations and medications as risk factors for falls and aimed to identify important predictors of falls in type 2 diabetes using traditional logisticand random forest.

Study design and participants
We conducted a questionnaire survey on falls one year after discharge of 316 persons with type 2 diabetes who had been admitted to the University of Tsukuba Hospital for the treatment of diabetes. We also evaluated their physical capabilities from February 2014 to May 2018. Study participants could walk alone and were independent in the activities of daily living. Exclusion criteria included: i) vitreous hemorrhages or detached retina; ii) class II or higher in the New York Heart Association Functional Classi cation; iii) being treated for malignancies; iv) taking glucocorticoids, Cushing's syndrome or acromegaly; v) post-gastrectomy; vi) unable to walk independently without assistive devices resulting from disorders of the hip or knee joints, paralysis or paresis caused by central nervous system disorders; vii) neuropathy due to causes other than diabetes and vii) di culty in understanding instructions. This study was approved by the Ethics Committee of the University of Tsukuba Hospital (H27-31) and conducted according to the Declaration of Helsinki. Written informed consent was obtained from all participants.

Clinical data and laboratory tests
We collected socio-demographic information, medical history, anthropometric data and information on medications that participants took on admission and at discharge. Body composition was measured using bioelectrical impedance analysis (InBody 720, Biospace, Tokyo). Skeletal muscle mass index (SMI) was calculated by dividing the limb skeletal muscle mass (kg) by the square of the height (m 2 ) 37 . Diabetic retinopathy was evaluated by ophthalmologists. Ankle-brachial pressure index (ABI) and brachial-ankle pulse wave velocity (baPWV) were measured (BP-203RPE, Colin Medical Technology, Tokyo, Japan). Patients were classi ed as having peripheral artery disease (PAD) when the value for either of the lower limbs was less than 0.9. For baPWV, the higher values on both sides were adopted. Cardiac autonomic nervous function was assessed measuring the coe cient of variation of R-R intervals (CVR-R) at rest. Participants with CVR-R <2.0% were classi ed as having cardiac autonomic neuropathy 38 .
Blood samples were collected in the morning after an overnight fast within 3 days after admission.
Plasma glucose and serum total cholesterol, high-density lipoprotein-cholesterol (HDL-C), triglycerides (TG) and creatinine levels were determined using an automated analyzer (Hitachi High-Technologies, Tokyo, Japan). HbA1c was measured by high-performance liquid chromatography (Tosoh, Tokyo, Japan).
Serum LDL-C levels were measured by a homogeneous assay (Sekisui Medical, Tokyo, Japan). Albumin excretion rate (AER) was measured using a turbidimetric immunoassay (Nittobo Medical, Tokyo, Japan).

Follow-up of falls
Fall was de ned as "coming in contact with the ground (or oor) from a standing or sitting position with a body part other than the foot in contact with the ground ( oor surface) against the patient's intention" 40 . The reliability of a survey of the occurrence of falls over a previous one-year period using the recall method was con rmed 40,41 . The fall history for the previous year was obtained on hospital admission.
One year after discharge, we mailed study participants a questionnaire asking about the number of falls (never, once or twice or more) experienced during that one-year period.

Diabetic polyneuropathy
Diabetic polyneuropathy (DPN) was diagnosed based on two or more of the following four criteria from The Diabetic Neuropathy Study Group in Japan 42 and Michigan Neuropathy Screening Instrument 43 : decreased vibration perception using a 128-Hz tuning fork at the bilateral medial malleoli (<10 seconds); loss of tactile sensation using a 10 g mono lament at the bilateral foot; decreased or loss of bilateral Achilles jerk re ex; and having numbness, pain, paresthesia or hypoesthesia in the bilateral lower limbs or feet.

Physical activity
Physical activity during hospitalization was measured using an accelerometer (Mediwalk: Terumo Co., Tokyo, Japan). Information was collected on the average number of steps, average of moderate-tovigorous physical activity time (3 metabolic equivalents or more) and energy expenditure through exercise.

Muscle strength
Knee extension strength and knee extension endurance Measurements of knee extension strength and knee extension endurance were performed on the dominant foot side using a torque machine (Biodex System3: Sakai Medical, Tokyo, Japan). For evaluation of knee extension strength, the participant performed three consecutive knee extension operations with maximum effort in isokinetic muscle strength measurement (60°/s), and the maximum torque value (Nm/kg) was used as a representative measurement value. Knee extension endurance was determined by measuring the total work (J) from 20 continuous knee extensions with maximum effort by isokinetic muscle strength measurements (300°/s).

Dorsi exion strength of the ankle joint
Dorsi exion strength of the ankle joint was measured using a hand-held dynamometer (Anima Co., Tokyo, Japan). The participant adjusted the height of the chair so that the angle between the knee joint and the ankle joint was 90° while the participant was seated in a chair with the heel placed on the oor and the foot was raised toward the anterior shin. The joint was placed in the maximum dorsi exion position, and preparations were made for measurement. The examiner applied the attachment of the hand-held dynamometer to the back of the participant's foot and applied maximum pressure in the ankle plantar exion direction in order to break the participant's maximum ankle dorsi exion. The average value (kgf) of left and right sides of the isometric dorsi exor muscle strength of the ankle joint was used. The test was performed twice on the left and right sides, respectively, and the best results for the left and right feet were averaged.

Toe pinch force
Toe pinch force was measured using a pinch force dynamometer (Checker-kun, Nisshin Sangyo Inc., Saitama, Japan). The participant sat on a chair with arms crossed over the chest. The dynamometer was attached to the foot between the great toe and second toe while the participant remained in the sitting position with the hip and knee joints at 90° of exion 44 . The test was performed twice on the left and right sides, respectively, and the best result for the left foot and right foot were averaged.

Grip strength
Grip strength of the dominant hand was measured using a Smedley analog grip meter (ST100 T-1780, Toei Light Co., Tokyo, Japan). The maximum value (kgf) was taken as the measured value.

Balance capability
Balance capabilities were assessed by the one-leg standing time with eyes open 45 and the index of postural stability (IPS). IPS was measured using a gravicorder (GP-6000, Anima Co., Tokyo, Japan) as described elsewhere 46 . First, the participants stood in a resting position with the inside of the foot at a distance of 10 cm on the gravicorder to measure the instantaneous uctuations in the center of pressure (COP) at a sampling frequency of 20 Hz. Then, participants were instructed to incline the body to the front, rear, right and left keeping the body straight and without moving the feet. The instantaneous uctuations in COP were measured at each position. IPS was calculated as "log [(area of stability limit + area of postural sway) / area of postural sway]". Area of stability limit was calculated as the "front and rear center movement distance between anterior and posterior positions × the distance between right and left positions". Area of postural sway was calculated as "average measurement value in 10 seconds under anterior, posterior, right, left and center positions". The area of postural sway was calculated as the mean sway area of the 5 positions. In addition, IPS was measured while the participant stood with closed eyes on the gravicorder which was covered with foam rubber (AIREX Balance-pad Elite, Airex AG, Switzerland) as modi ed IPS (mIPS) 46 .
Flexibility Truncal exibility was assessed by measuring the nger-oor distance (FFD). Participants standing on a 20 cm high platform exed the trunk and dropped both arms toward the oor 47 . The length from the top of the stool to the tip of the third nger was measured as the FFD.

Statistical analysis
The sample size for the study was determined by considering the number of cases in previous studies of falls 14 and the number of participants that could be enrolled annually in our study. Study participants were those without missing data.
Continuous variables were checked for normal distribution using the Shapiro-Wilk test and were expressed as mean ± SD or median (25th percentile, 75th percentile) based on distribution. Unpaired t-test and Mann-Whitney's U test were used for continuous variables with normal distribution and those with non-normal distribution, respectively. Categorical variables were presented as numbers (percentage) and analyzed using the chi-squared test. Classi cation analysis was performed based on the identi ed covariates that were signi cantly different between faller and non-faller groups. In addition, we used covariates that were not signi cantly different between our study groups but have been reported as risks for falls in previous studies 5 48 49 . We applied two types of classi cation models including logistic regression and a random forest classi er to investigate relationships between fall risk and clinical parameters from different perspectives. Models were t with a stepwise variable selection procedure where explanatory covariates in the models are sequentially added or removed by evaluating prede ned criteria. In logistic regression, we used the likelihood ratio test for coe cients and considered p <0.05 as its selection criterion. For the random forest classi er, a variable subset with the best estimated predictive performance via 10-fold cross-validation was selected. In order to compare the accuracy of the predictive ability of logistic regression and the random forest classi er, receiver operating characteristic (ROC) analysis was performed for each, and accuracy, sensitivity, speci city and the area under the curve (AUC) were determined.
For analysis, Scipy based on Python (programming language) was used, R (GLM) for logistic regression, and imbalanced-learn for the random forest classi er. The criterion for determining statistical signi cance was 5%.

Results
The collection rate of the questionnaire was 72% (226/316, 129 men and 97 women, median age 62 years (49, 68) ( Figure 1). There were no missing data. Forty-four of the 226 participants had a fall experience in the year after hospital discharge, for a fall rate of 19%. Fall rates by age group were 17% for those aged <60 years, 20% for those aged 60 -69 years and 24% for those aged ≥70 years.
The faller group had a signi cantly higher rate of females, impaired vibratory sensation, proliferative retinopathy, and history of stroke and higher levels of F-CPR compared with the non-faller group (Table 1). However, age, duration of diabetes, DPN and nephropathy were not signi cantly different between the two groups ( Table 1). Although medications both on admission and at discharge were not signi cantly different between the faller group and the non-faller group, the proportion of sulfonylurea users tended to be higher in the faller group than in the non-faller group ( Table 2). Physical activity during the hospitalization period, body composition and physical capabilities of participants are shown in Table 3.
Physical activities did not differ signi cantly between the faller group and the non-faller group. Skeletal muscle percentage in the faller group was signi cantly lower than in the non-faller group. On the other hand, SMI and body fat percentage were not signi cantly different between the two groups. Variables related to muscle strength including knee extension strength, knee extension endurance, dorsi exion strength of the ankle joint, toe pinch force and grip strength in the faller group were signi cantly inferior to those in the non-faller group. In balance capabilities, IPS levels in the faller group were signi cantly lower than in the non-faller group. Other balance capability-related variables, including mIPS and one-leg standing time, tended to have lower values in the faller group.
We performed logistic regression and a random forest classi er with sex, fasting C-peptide level, impaired vibratory sensation, presence of proliferative retinopathy, past history of stroke, taking sulfonylureas at discharge, skeletal muscle percentage, body fat percentage, knee extension strength, knee extension endurance, dorsi exion strength, toe pinch force, grip strength and IPS as covariates in order identify important risk factors for falls. In the logistic regression, knee extension strength, F-CPR level and dorsi exion strength were independent predictors of falls ( Table 4). The random forest classi er showed that knee extension strength, grip strength, F-CPR levels and dorsi exion strength were the top 4 important variables related to falls (Table 5). In the ROC analyses, AUC, accuracy, sensitivity and speci city of the logistic regression were 0.75, 0.77, 0.52 and 0.82, respectively, and those of the random forest classi er were 0.76, 0.67, 0.71 and 0.66, respectively.

Discussion
In the current study, we prospectively conducted a follow-up questionnaire survey of falls that occurred during the rst year after discharge from hospital in persons with type 2 diabetes and analyzed data using machine learning to investigate the relationship between various indices measured during hospitalization and falls after one year. There were three signi cant ndings in the current study. First, 19% of the participants experienced falls within one year after discharge. Second, muscle strength-related variables were the most important predictors for falls according to both logistic regression and random forest classi er. Third, the F-CPR level was an important predictor of falls, and, to the best of our knowledge, is a novel nding in terms of risk factors for falls.
The annual rate of falls was reported to range from 17 -32% in community-dwelling older adults 1 9 14 15 and 22 -40% in older adults with diabetes 10 11 . In the current study, 19% of participants experienced falls within one year after discharge, even though our cohort had younger participants with a median age of 62 years compared with previous reports. In addition, the fall rate for those aged < 60 years was 17%, suggesting that middle-aged persons with type 2 diabetes have a risk for falls equivalent to older adult non-diabetic individuals. These ndings indicate that early measures are needed to prevent falls in patients with type 2 diabetes.
In this study, the faller group had a signi cantly higher proportion of females, impaired vibratory sensation, proliferative retinopathy and stroke. This group also had signi cantly higher F-CPR levels and signi cantly lower values for skeletal muscle percentages, muscle strength-related variables and balance capability-related variables compared with the non-faller group. To investigate the important risk factors for falls, we used logistic regression and random forest 16 . Most of the important variables were shared between these analyses, including knee extension strength, F-CPR level and dorsi exion strength. In addition, grip strength was the second most important predictor of falls in the random forest analysis. A meta-analysis showed that muscle weakness in the upper or lower extremities has been associated with future falls 17 . Persons with type 2 diabetes, especially those with diabetic polyneuropathy, were shown to have decreased muscle mass and strength in lower limb extremities compared with non-diabetic controls 18 19 20 . Although the current study did not have non-diabetic controls, the participants in this study had an approximately 30% lower knee extension strength than middle-aged obese non-diabetic Japanese in a previous study 21 . Another study also showed that the muscle strength of knee extensors and the ankle plantar exor in type 2 diabetic participants were about 30% lower than in healthy controls 18 . In a crosssectional study of aging and muscle function of a general population in Belgium, the mean level of knee extensor muscle strength in the group aged >70 years was about 30% lower in men and 40% lower in women compared with the group aged 50 -60 years 22 . These data suggest that faster progression of decreases in lower body muscle strength in diabetic individuals compared with healthy individuals is a major reason for the high prevalence of falls in diabetic individuals.
Unexpectedly, the F-CPR level was an important risk factor for falls. This result is robust because both logistic regression and the random forest classi er showed that F-CPR was among the most important risk factors for falls. Although the results of logistic regression and random forest do not indicate a causal relationship, insulin resistance may be associated with falls. The Third National Health and Nutrition Examination Survey (NHANES III) showed that elevated homeostasis model assessment of insulin resistance (HOMA-IR) was independently associated with lower SMI 23 . Furthermore, it was reported that abdominal circumference and metabolic syndrome, which are associated with insulin resistance, are associated with the risk of falls 24 . However, F-CPR levels were independent of lower limb muscle strength in this study. Since insulin resistance in skeletal muscle has been reported to inhibit skeletal muscle proliferation 25 , high serum CPR may be associated with lower future skeletal muscle mass and skeletal muscle strength. Another possibility regarding the link between falls and CPR is that insulin resistance increases the risk of falls by affecting cognitive function. Insulin resistance was shown to be associated with a cognitive decline in several prospective studies 26 27 28 29 . Decreased executive function was associated with the occurrence of falls 30 , which is impaired in persons with type 2 diabetes from middle age.
In the current study, balance capability, diabetic polyneuropathy and insulin treatment were less important risk factors for falls than muscle strength and F-CPR. It is not conclusive whether diabetic neuropathy and balance capability are independent risk factors for falls 10 . Because lower limb muscle strength, balance capacity and peripheral neuropathy are interrelated, and because measurement methods and participant backgrounds vary from study to study, it is di cult to assess the extent of the impact of these risk factors on falls in diabetic patients. However, the previous reports that showed a signi cant association between falls and DPN or balance capability did not assess lower extremity muscle strength and included participants with more severe DPN compared with those that did not show such associations 10,31 32 33 . In addition, the reports that evaluated both balance capability and DPN showed that balance capability was a stronger risk factor for falls than DPN 10 11 . The participants in the current study may have had less severe DPN than those in the study that found that DPN was an independent risk factor for falls. Furthermore, since the median (25th percentile, 75th percentile) age of the current study population was 62 (49, 68) years, which is younger than in previous reports, it is possible that their balancing capabilities were better than those in the previous studies. For these reasons, the in uence of DPN and balance capability on falls may have been small.
Insulin treatment has been shown to be an independent risk factor for falls, but in this study, that was not the case. The Health ABC Study reported that insulin-treated individuals with HbA1c ≤6.0% were approximately 4.4 times more likely to fall 10 . Hypoglycemia may be involved in the high incidence of falls in older adult diabetic patients on insulin therapy. In particular, older adults are less aware of hypoglycemia and have a higher blood glucose threshold for impaired consciousness than healthy persons. The participants in this study may have been more likely to be aware of hypoglycemia due to the large number of non-older adult patients compared with previous studies. In addition, the hospitalization in the previous year may have resulted in fewer hypoglycemic incidents due to appropriate education and optimization of treatment.
In the current study, the AUCs of logistic regression and the random forest classi er were comparable. This result was in line with the results of previous studies, which showed that the accuracy of the predictive models did not change when comparing machine learning and traditional statistical methods 34 35 36 . However, in the current study, the random forest classi er newly selected "grip strength" as a covariate, which was different from the results of logistic regression. This suggests that machine learning has the advantage of discovering new risk factors that cannot be found by logistic regression.

Limitations of this study
As for limitations of the study, rst, the tracking rate was not as high as desired. The follow-up rate of a cohort survey is usually required to be at least 80% whereas the currentfollow-up rate was 70%. This may have resulted in a selectivity bias. Second, since we mailed the questionnaire one year after discharge, recall bias may have been present. A previous study showed that 13% of participants with con rmed falls could not recall having a fall at the end of the study (12 months) [18]. Therefore, we may have underestimated the rate of falls. Third, the follow-up period was limited to one year. Extending the followup period might have further increased the rate of falls and uncovered new predictors of falls. Fourth, the number of participants in our study was too small to create a useful predictive model of falls. Future studies with larger sample sizes are needed to con rm our ndings and to develop a useful predictive model for falls. Fifth, study participants were hospitalized patients with poor glycemic control. Therefore, it is unclear whether the results can be extrapolated to patients with type 2 diabetes who were attending an outpatient clinic and not recently hospitalized. The results of this study can be veri ed by including a similar group who only attended outpatient clinic. Finally, the study included patients on insulin therapy, and fasting serum CPR levels may be underestimated in these patients. Moreover, we did not include nondiabetic individuals in this study. Therefore, it is unclear whether the results of this study, especially that for F-CPR, can be applied to predict falls in non-diabetic individuals.
In conclusion, We investigated the frequency of falls and their risk factors 1 year after discharge from the hospital in patients with type 2 diabetes. The results showed that falls occurred in 19% of the participants, with knee extension strength, F-CPR and dorsi exion strength selected as predictors of falls according to logistic regression analysis, and knee extension strength, grip strength, F-CPR and dorsi exion strength were important predictors in the random forest analysis. In particular, we newly found that F-CPR is an important predictor of falls in persons with type 2 diabetes. Muscle weakness and the presence of insulin resistance may be strongly associated with falls in type 2 diabetic patients.
Declarations interpretation of ndings. All authors approved the nal manuscript. YaS and HSu guarantee the work and have full access to the data. All authors read and approved the nal version of the manuscript.  Covariates: sex, fasting C-peptide levels, impaired vibratory sensation, presence of proliferative retinopathy, past history of stroke, taking sulfonylureas at discharge, skeletal muscle percentage, body fat percentage, knee extension strength, knee extension endurance, dorsiflexion strength, toe pinch force, grip strength, index of postural stability, and one-leg standing time.  Figure 1