Predicting the dropout risk after treatment of obesity: Logistic regression analysis and deep neural network analysis

Severely obese patients must follow strict regimens of diet, exercise, and medical therapy. However, such comprehensive weight-loss programs have high dropout rates. In this study, we developed a machine learning prediction model to aid in the early detection of high-risk-dropout patients. 102 severely obese patients were monitored for 3 years to assess their risk of dropout from a comprehensive weight-loss program. The program targeted a 5% weight loss. It consisted of three main components, which include behavioral modification (goal setting and charting weight four times daily), diet, and exercise. A machine learning model was developed to predict dropout risk based on a 1-year dropout event. To extend the prediction ability past 1 year, we plotted a 3-year Kaplan-Meier survival curve using a deep learning (DL) algorithm and logistic regression (LR) classifications.


Introduction
Obesity has rapidly become a worldwide epidemic and has nearly tripled since 1975 (1). In 2016, more than 1.9 billion adults, 18 years and older, were overweight. Of these over 650 million were obese. (1) Obesity and type 2 diabetes both increase the risk of cardiovascular disease and other comorbidities (2). A J-shaped relationship between body mass index (BMI) and the average life expectancy was demonstrated (3). Therefore, it is important to treat severe obesity, which significantly increases the risk of mortality. A slight weight loss (3%-5% of body weight) can result in clinically meaningful benefits of reducing triglycerides, blood glucose, glycated hemoglobin, and decreasing the risk of type 2 diabetes. An increased weight loss (> 5%) reduces blood pressure, further improves the lipid profile (both low-density and high-density lipoprotein cholesterol), and decreases the need for medications to control blood pressure, blood glucose, and lipids (4).
Although many obese people successfully lose weight by dieting or behavioral therapy (using their preferred methods or one that is popular at the time), most of them subsequently regain the weight that they lost (5,6). Long-term adherence to treatment is essential for obese patients; however, it is not easy to predict the discontinuation of treatment. Therefore, it is vital to control the progress of the severe complications caused by treatment interruption.

5
In this study, we aimed at developing an accurate prediction model on the risk of subsequent dropout from treatment by patients with severe obesity. All these patients had already completed a comprehensive weight-loss education program without bariatric surgery. We also developed a machine learning model to predict dropout and compared it with standard logistic regression by ROC analysis.

Subjects
First, we screened 2,178 successive patients who were admitted to the University of Tokyo Hospital between 2009 and 2012. Among them, we selected patients with severe obesity who had a BMI of more than 35 kg/m 2 and enrolled them in a comprehensive weight loss program (5,6). We excluded patients with severe cardiovascular disease, heart failure, infectious diseases, stroke, or peripheral artery disease, as well as patients with type 1 diabetes, pregnant women, patients with dementia, patients who had orthopedic diseases that could interfere with exercise (walking), perioperative patients, patients taking anti-obesity medications, patients who had undergone bariatric surgery, patients without pertinent data, patients who were transferred to another hospital immediately after discharge, patients who were readmitted, and those under 18 or over 80 years old. The remaining patients were observed for 3 years after discharge to assess subsequent weight gain and dropout. The comprehensive inpatient obesity treatment program targeted a 5% weight loss; Its details have been reported previously (7). It consisted of three main components, which include behavioral modification (goal setting and charting weight four times daily), diet, and exercise (patients with diabetes received appropriate anti-diabetic therapy together with this weight loss 6 program, and the treatment of diabetes was decided by each attending physician).
Behavioral modification (goal setting and charting weight four times daily) At admission, all patients were given the goal of achieving 5% weight loss while in the hospital. They also weighed themselves four times daily (immediately after waking, taking breakfast and dinner, and before going to bed) and recorded the data on a weekly graph. A daily record of weight fluctuations reveals irregular intake of food and fluid that reflects dysfunctional eating patterns and other behavioral abnormalities and assists in achieving weight loss (5,8,9). Patients were recommended to continue recording their weight after discharge. Patients were weighed by using AD-6107NWTM scales (A and D Co. Ltd., Tokyo, Japan).

Diet
A balanced low-calorie diet (20-24 kcal/day/kg of ideal body weight) was provided to the patients in the hospital, consisting of 50%-60% carbohydrate, 20% protein, and 20%-30% fat. Hospital dietitians used food samples and a food exchange table (10) to educate patients about nutritional guidelines. The dietician initially gave each patient information for 1 hour, with subsequent 30-minutes sessions being held at least twice a week until discharge.

Exercise
All patients were given a pedometer and were instructed to walk more than 10,000 steps/daily (5-7 km) for approximately 1.5 hours. The exercise program was tailored to accommodate health problems (e.g., morbid obesity, hypoglycemia, joint pain, or diabetic retinopathy) and specific needs (e.g., exercise by walking or training on a bicycle ergometer). The target pulse rate and schedule for each exercise session 7 were set.

Outcome measures
The patients attended our hospital outpatient department every 2 months after discharge to continue their weight loss program and for treatment of other diseases (diabetes, dyslipidemia, and hypertension).
The body weight of the patients was measured at each visit. The objective of this study was to assess the dropout from the weight loss program after discharge.
Dropout from the program was defined as missing outpatient appointments. (If the patient presented again within 6 months of the specified appointment, this was not considered as dropout.) Patients were defined as having diabetes if their medical records listed a diagnosis of type 2 diabetes, and they were using an oral hypoglycemic agent or insulin. If a 75 g oral glucose tolerance test was performed, a diagnosis of diabetes, impaired glucose tolerance, or impaired fasting glucose was made according to the American Diabetes Association criteria (11). Antidepressant medications were classified as selective serotonin reuptake inhibitors, serotonin and norepinephrine reuptake inhibitors, tricyclic antidepressants, tetracyclic antidepressants, serotonin receptor antagonists and reuptake inhibitors, monoamine oxidase inhibitors, and noradrenergic and specific serotonergic antidepressants. All demographic and clinical data were collected from secure electric medical records. Nurses or physicians confirmed the accuracy of the bodyweight measurements of each subject Feature engineering We used binary variables and continuous variables as prediction features. The binary variables included gender, use of oral antidiabetic drugs, use of insulin, use 8 of Glucagon-like peptide-1 Receptor analogs (GLP1-Ras), diabetes, hypertension, psychiatric disease, depression, insomnia, and antipsychotic drug use. The continuous variable included sequential body weight data from baseline to day 14, discharge body weight, age, waist circumference, systolic blood pressure, and HbA1C.

Data preparation
Overall, 102 patients (82 non-dropout and 20 dropout) were enrolled in this study.
Due to the original imbalance sample between the dropout and non-dropout population, we used the Adaptive Synthetic Sampling (ADASYN) method to deal with the imbalanced data (12). In the balanced sample generated by the ADASYN method, there were 79 dropout and 82 non-dropout events. We randomized 85% of this balanced sample to a training cohort (for model training) and the rest of 15% as a validation cohort (for hyperparameter optimization). Then, we applied the deep Model building process of deep learning A machine learning model to predict the dropout rate was developed with deep neural networks. The variables derived from feature engineering (Table 1) were used as predictors and were set as input layers. For better performance of deep learning, we performed batch normalization (to mean 0 or variance 1) for the selected variables (features) and set this as the input layer.

Statistics
The independent t test and χ2 test were used to compare continuous and   Figure 2A shows the area under the receiver operating characteristic curve (ROC) between LR and DL in the prediction of the 1-year dropout. DL showed a much better AUC than the LR method (0.97 vs. 0.77, p for difference < 0.001). During clinical practice, physicians tried to identify more people at high risk of dropout.
Therefore, we tried to identify an optimal threshold to keep the sensitivity > 80% and yield the highest value of specificity. We used this concept of threshold selection rather than the traditional methods, such as the Yuden index (14). Finally, a predicted probability greater than 0.4 was used as the threshold to define a high risk of dropout after one year. The AUC using binary classification between LR and DL was plotted in Fig. 2B. The DL showed better AUC than the LR (0.86 vs. 0.68. p for difference = 0.001). Figure 3A shows the overall survival curve between the true dropout and nondropout subgroups. Figure 3B showed the 3-year survival curve of the subgroups classified by LR predicted the 1-year dropout and non-dropout. Similarly, Fig. 3C showed the predicted survival curve categorized by DL. Both LR and DL methods showed significant differences between the predicted dropout and non-dropout after the first year and continued estrangement for 3 years.
Generally, the dropout rate was 0.9/100 patients per month. However, the dropout rate was 14.3, 4.1, and 1.9 /100 patients per month for subgroups with the truedropout, DL, and LR predicted dropout. The subgroup of DL predicted dropout showed a significantly higher dropout incidence rate than the subgroup predicted by LR (p = 0.03, data not shown). Meanwhile, the corresponding dropout rate was 0, 0.2, and 0.3/100 patients per month for subgroups of true non-dropout, DL, and LR predicted non-dropout. In comparison with the true non-dropout subgroup, the LR and DL predicted non-dropout did not show higher dropout incidence rates (p = 0.982 and 0.983, respectively).

Discussion
This study demonstrates the superiority of our neural network model over logistic regression for predicting the dropout from a comprehensive program of severe obesity after discharge from the hospital.
There is notable clinical significance from these analyses. Firstly, dropout is a critical issue; however, it is challenging to predict it using a single parameter, especially in a small sample size study. Secondly, the deep Learning model showed higher AUC than logistic regression. The deep learning model showed a higher dropout prediction than logistic regression; however, it still had a lower prediction as compared with the true dropout (p = 0.01). Deep learning and logistic regression showed almost similar predicted non-dropout rates, compared with the true nondropout rate.
There were some limitations to this study. Firstly, DL tends to model overfitting. We had used two methods (dropout and L1 Regularization) to reduce overfitting; however, external validation will be required in future studies. Secondly, the small sample size used will require external validation.

Conclusions
We demonstrated a higher precision with machine learning than with the standard logistic regression, based on limited sample size and information available during

Consent for publication
Not applicable

Availability of data and materials
The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

Competing interests
The authors declare that they have no competing interests.

Funding
There was no funding in this study.  Area under the receiver operating characteristic curve (ROC) between LR and DL in the predi Figure 3 Survival curves of true dropout and non-dropout subgroups.

Supplementary Files
This is a list of supplementary files associated with the primary manuscript. Click to download. STROBE_checklist_cohort.pdf