Patterns and Related Inuencing Factors of Multiple Chronic Diseases Among the Elderly Based on Data Mining in Shanxi, China

Background: The presence of multiple chronic diseases has become a public health issue worldwide. The study aimed to explore the patterns and related inuencing factors of multiple chronic diseases among the elderly so as to provide a scientic reference for the health management strategy, which is dedicated to the prevention and intervention for multiple chronic diseases. Methods: A cross-sectional study, which used a multi-stage random sampling way, was conducted among 3266 elderly aged 60 years and above in Shanxi. A chi square test was used to compare the sociodemographic factors. The data mining methods, including association rule and decision tree, were performed to explore the patterns and related inuencing factors of multiple chronic diseases in the elderly. Results: The prevalence of multiple chronic diseases was 30.31%. According to the results of the association rules, there were 10 association patterns screened out. The rst ve multiple chronic disease patterns were atherosclerosis and hypertension, coronary heart disease and hypertension, diabetes and hypertension, stroke and hypertension, eye disease and hypertension. The decision tree model selected 5 related inuencing factors of multiple chronic diseases, including age, smoking, family genetic history, fresh fruit intake frequency and education level, simultaneously generating 9 inuence rules. Conclusions: Multiple chronic diseases were prevalent among the elderly, which patterns mainly consisted of two or three chronic diseases. Regarding the unmodiable related inuencing factors, including an older age, a family genetic history, and low education level, relevant departments should help the elderly to establish prevention knowledge and raise awareness of strongly related chronic diseases, so as to reduce the incidence and adverse consequences.


The prevalence and harm of multiple chronic diseases
In recent decades, life expectancy has increased and mortality has declined steadily [1], leading to the prevalence of chronic diseases worldwide. What is particularly worrying is that many people have more than one chronic disease. Approximately one-third of all adults worldwide are diagnosed with multiple chronic diseases [2]. Multiple chronic diseases are de ned as two or more chronic diseases within one person in a speci c period of time [3]. Compared with single chronic disease, the treatment di culty, medical consumption, economic burden, and risk of death of patients with multiple chronic diseases increase, driving intense demand for effective clinical interventions and health management support for these patients [4][5][6][7]. Although the established interventions for the management of chronic diseases have the potential to improve the care of the elderly, they are only focused on a single disease [8]. At present, the health management and clinical treatment of elderly patients with multiple chronic diseases have become one of the major challenges faced by the medical and health system worldwide [9].

Strengths and limitations of previous studies
Previous studies had focused on comorbidities of various chronic diseases, including type 2 diabetes, chronic rheumatic, musculoskeletal diseases and so on [10][11][12]. In addition, Donna et al. [13] have pointed out a taxonomy to inform care for patients with multiple chronic conditions, which is generalizable across multiple types of comorbidities. Although comorbidities have been the focus of research in the past few decades, multiple chronic diseases have gradually become research hotspots in recent years. Research on multiple chronic diseases mainly focuses on prevalence, health care utilization and health-related outcomes [14][15][16][17]. Recently, there has been an effort to describe the full patterns of multiple chronic diseases to provide complete information. Noe et al. [18] described multimorbidity patterns in low, middle, and high-income countries. Nevertheless, there were limited studies on how chronic diseases existed in some patterns of multiple chronic diseases, which may be meaningful for improving the management of multiple chronic diseases. This study, however, was limited to the 12 chronic diseases included. Bauer et al. pointed out that chronic disease burden largely results from a short list of risk factors [19]. To the best of our knowledge, few studies were identifying potential factors underlying multiple chronic diseases.

Association Rule and Decision Tree
Data mining technique can extract potential knowledge and patterns that will be of interest from an extremely abundance of data. As the data mining technology, association rule and decision tree all can help researchers extract valuable knowledge from huge data. The applied rule mining approach was the Apriori algorithm, which mined valuable patterns in large unordered data as association rules. Another applied data mining technology was the decision tree, which was easy to implement and interpret. The major purpose of this method is to make a predictive model for the target variable [20]. The main advantage of decision tree analysis is that it can transform the complex interaction of various variables into an organized ow chart, which can clearly identify related in uencing factors. The CHAID algorithm in the decision tree is based on the self-strati cation of dependent variables, which automatically merges the independent variable categories to maximize their signi cance. In clinical and practice, it is of great signi cance for obtaining a strati ed tool of related in uencing factors to better manage multiple chronic diseases. Therefore, this study adopted the CHAID algorithm to de ne new models for predicting the related in uencing factors of multiple chronic diseases in the elderly.

Aim and hypotheses
To the best of our knowledge, there is few study using data mining methods to simultaneously explore the patterns and their related in uencing factors in combination with multiple chronic diseases. Therefore, this study aimed to explore the common patterns and related in uencing factors further of 26 chronic diseases, employing association rules and the decision tree, which provided a scienti c reference for the health management strategy.
Based on the above, we proposed two hypotheses: 1. The most common pattern of 26 diseases is a combination of two or more conditions, among which the most prevalent dyads and triads include hypertension, diabetes and coronary heart disease.
2. At the same time, age, sex or other factors are the main related factors affecting the presence of multiple chronic diseases.

Sample and participants
The selected survey site was Shanxi Province in Central China. Using the strati ed random cluster sampling method, we recruited older adults (aged 60 years and above) from Shanxi Province's 11 cities. The sampling method was as follows: In the rst stage, according to the order of districts (counties) on the government's website, each district (county) in every city was numbered. In the second stage, two (districts) counties in each city were selected using the random number table, and then two communities (administrative villages) were drawn from each district (county) in the same way. In the fourth stage, considering the different scales of each community (administrative village), once again select 1 to 2 residential communities (natural villages) from each community (administrative village). Finally, in the selected residential communities (natural villages), we randomly selected the elders who meet the criteria in this study.
The inclusion criteria for this study were (1) being 60 years and older; (2) having clear awareness and barrier-free communication skills with the investigator; and (3) volunteering to participate in the survey. Meanwhile, those who have cognitive dysfunction, unwilling to cooperate, were excluded. The study was conducted from June to August, 2019, involving 3266 elders, of which 3250 completed the questionnaire effectively, so the effective response rate was 99.51%.
All study procedures were approved by the university ethics committee. All participants were informed of the purpose and procedure of the research upon their recruitment, and assured of their right to refuse to participate. Their anonymity and con dentiality were guaranteed. After signing the consent, participants were invited to conduct questionnaires to collect data by trained investigators.

Instruments
The questionnaire comprised two sections: the self-made general information and prevalence of chronic diseases. The self-made general information includes (1) socio-demographic characteristics of the research subjects, including living area, gender, age, body mass index (BMI), occupation, education level, monthly family income, marital status and empty nest situation; and (2) lifestyle behaviors, including dietary habits and smoking and drinking status. The questionnaire on prevalence of chronic diseases includes 26 chronic diseases diagnosed by doctors, including obesity, hypertension, diabetes, coronary heart disease, stroke, arrhythmia, atherosclerosis, tuberculosis, respiratory diseases, Parkinson's disease, chronic obstructive Pneumonia, sciatica, rheumatoid or rheumatoid arthritis, hypothyroidism, hyperthyroidism, gout, osteoporosis, hearing impairment, eye disease, hepatitis, chronic nephritis, tuberculosis, mental illness, dementia, digestive system diseases, and cancer.

Statistical analysis
The data were analyzed by SPSS Version 24.0 Statistical software. Descriptive statistical analyses were carried out for basic socio-demographic characteristics. The chi-square test was used to compare the differences in the prevalence of multiple chronic diseases under different general demographic information among the elderly. Con dence level with P < 0.05 was considered statistically signi cant.
The Apriori algorithm in SPSS Modeler Version 18.0 software was employed to analyze common patterns of multiple chronic diseases. Three kernel values are involved with association rule analysis, including support, con dence, and lift [21]. Based on this study, the support of A→B was the probability of the simultaneous occurrence of chronic disease A and B. The con dence was the conditional probability of suffering from chronic disease B under the premise of suffering from chronic disease A. The degree of lift re ects the in uence of the consequent B by the antecedent A compared to the overall. Therefore, when the degree of lift L A→B >1, A→B can be considered as a directional association. In this study, set the minimum conditional support to 3.0%, the minimum rule con dence to 30%, and the maximum number of preceding items to 5.
With the help of the CHAID algorithm of the decision tree in SPSS Version 24.0 Statistical software, the related in uencing factors affecting multiple chronic diseases of the elderly were analyzed. Taking the prevalence of multiple chronic diseases as the dependent variable (0=without chronic disease, 1=having 1 chronic disease, 2=having 2 or above chronic diseases), and the variables with statistical signi cance of univariate analysis as independent variables, built a decision tree model. The maximum tree depth of the decision tree was set to 3. The minimum sample size of the parent node was set to 100 and the minimum sample size of the child node was set to 50. The growth direction of the decision tree was from top to bottom. .17%, respectively. The top ten prevalent chronic diseases were hypertension, diabetes, rheumatoid or rheumatoid arthritis, hearing impairment, digestive system diseases, osteoporosis, coronary heart disease, eye diseases, respiratory diseases (bronchitis, emphysema, asthma) and stroke. Table 1 showed the gender differences in the top 10 chronic diseases.

Analysis of association rules for multiple chronic diseases
The most common algorithm for association rule mining is the Apriori algorithm [22]. We employed the Apriori algorithm to obtain association rules between common chronic diseases in this study. Finally, 10 association patterns with strong association strength were selected. They were both patterns of coexistence of two chronic diseases, 8 of which were related to hypertension ( Table 2). The most strongly associated comorbid pair was hypertension and atherosclerosis. Using the network graphics provided in SPSS Modeler 18.0 software, drew a network graph of the correlation strength between the 11 chronic diseases, which nally included 9 chronic diseases ( Figure 1). The thickness of the line between the two points re ected the strength of the chronic disease [23]. The thicker the line between the two points, the stronger the correlation between chronic diseases was.

Univariate analysis of related in uencing factors of multiple chronic diseases
The chi-squared test showed that there were signi cant differences in multiple chronic diseases between the elderly in age, gender, education level, living area, family genetic history, empty nest situation, marital status, smoking, drinking, diet, fresh fruit intake frequency, BMI (P < 0.05). However, there was no statistically signi cant difference in family monthly income (P=0.125), regularity of three meals (P=0.524), and fresh vegetable intake frequency (P=0.127) in multiple chronic diseases.

Results of Decision tree analysis
The decision tree model had a total of 15 nodes with 9 terminal nodes, which depth was 3 (Fig 2). Five main variables were affecting the presence of multiple chronic diseases, among which the rst variable was age, and followed by smoking, family genetic history, fresh fruit intake frequency, and education level, with the signi cance of P < 0.05. The result had important practical implications, which suggested that it would be important to target these ve factors in health management aimed to reduce the presence of multiple chronic disease. Modi able lifestyle factors including smoking, and fresh fruit intake frequency affected the presence of chronic diseases. Unmodi able factors included age, family genetic history and education level. Other variables did not reach the signi cance of 0.05 and were not included in the model such as gender, living area, empty nest situation, marital status, drinking, diet, and BMI.
In this study, the following 9 decision rules associated with different prevalence of multiple chronic diseases were extracted through decision tree model. Sort by multiple chronic disease rate from high to low as follows: Rule 1: If the elderly aged 70-79 years old, had never smoked/were smoking and had a family genetic history, then the multiple chronic disease rate was 62.7%.
Rule 2: If age ≥80 years old, fresh fruit intake frequency was not daily, but at least once a week/not weekly, but at least once a month/not monthly, but sometimes / rarely or never eat fresh fruit, then the multiple chronic disease rate was 55.6%.
Rule 3: If the elderly aged 60-69 years, had a family genetic history, then the multiple chronic disease rate was 50.6%.
Rule 4: If the elderly aged 70-79 years old, had quit smoking, then the multiple chronic disease rate was 45.5%.
Rule 5: If age ≥80 years old, fresh fruit intake frequency was almost every day, then the multiple chronic disease rate was 31.8%.
Rule 6: If the elderly aged 70-79 years old, without family genetic history, had never smoking/was smoking, then the multiple chronic disease rate was 31.5%.
Rule 7: If the elderly aged 60-69 years, without family genetic history, and the education level was primary school/no formal education, then the multiple chronic disease rate was 28.1%.
Rule 8: If the elderly aged 60-69 years, without family genetic history, and education level was junior high school/high school/secondary school/high vocational/junior college, then the multiple chronic disease rate was 20.4%.
Rule 9: If the elderly aged 60-69 years, without family genetic history, education level was undergraduate and above, then the multiple chronic disease rate was 6.9%.

Prevalence of multiple chronic diseases
In this study, the prevalence of multiple chronic diseases was 30.31% in the elderly aged 60 and above. In the United States, approximately one-third of the elderly suffer from multiple chronic diseases [24]. The high incidence of multiple chronic diseases of the elderly not only endangered the physical health, but also imposed a heavy threat to the mental health [25]. Therefore, it is of great signi cance to mine patterns and related in uencing factors of multiple chronic diseases through the cross-sectional study data.

Patterns and association rules of multiple chronic diseases
This study found that patterns in 10 association rules all coexisted with two chronic diseases. For example, hypertension was associated with atherosclerosis, coronary heart disease, diabetes, stroke, and eye disease (cataracts, glaucoma). While sometimes hypertension was reported as genetic and early onset, for others it came on later as a result of another illness [26]. As mentioned above, there was a wide range of associations between various common chronic diseases. It suggests that the elderly may suffer from hypertension at the same time as they suffer from the above diseases. Additionally, this study also found that there were association rules between rheumatoid or rheumatoid arthritis and osteoporosis.
Rheumatism or rheumatoid arthritis was a chronic disabling disease that was associated with increased localized and generalized osteoporosis [27]. This study also found association rules between coronary heart disease and atherosclerosis, suggesting that elderly people with atherosclerosis should screen for coronary heart disease as early as possible to reduce their incidence. In health management, staff should strengthen health education on the concept and risk of multiple chronic diseases, and take timely intervention measures as early as possible to address the challenges of multiple chronic diseases. In clinical, medical staff while treating patients who have been diagnosed with chronic diseases, should advise patients to screen for chronic diseases that are strongly associated with them as soon as possible, so that early detection, early diagnosis, and early treatment can be achieved. In other words, the knowledge of comprehensive chronic disease patterns should be taken into account when providing care for multimorbid patients.

Related in uencing factors of multiple chronic diseases
The decision tree model showed that age was the root node, indicating that age was the primary and most important related in uencing factor for multiple chronic diseases among independent variables. Among them, the prevalence rate of multiple chronic diseases (47.3%) aged 80 years and above was the highest.
In the elderly aged 80 and above, the main related in uencing factor selected was the fresh fruit intake frequency. The multiple chronic disease rate (31.8%) of the elderly who ate fresh fruits almost daily was signi cantly lower than that of the elderly who consumed other frequencies (55.6%). It suggests that eating fresh fruit every day is a protective factor against multiple chronic diseases, which is consistent with the previous study [28].
Among the elderly aged 70-79 years, the multiple chronic disease rate (45.5%) who had quit smoking was higher than the combined elderly who had never smoked or were smoking. This study found that smoking cessation was associated with multiple chronic diseases in the elderly. It may be that some elderly people surveyed appeared to quit smoking after they were diagnosed with multiple chronic diseases. We speculate that the occurrence of chronic diseases will promote the establishment of healthy behavioral lifestyles for the elderly.
The elderly who had never smoked or were smoking, were divided into two categories according to family genetic history. The multiple chronic disease rate (62.7%) with a family genetic history was almost twice that without a family genetic history (31.5%). It suggests that the elderly population with a family genetic history is a key target for the prevention and treatment of chronic diseases. Some studies have reported the association between the family genetic history and risk of chronic disease [29,30]. Among the elderly aged 60-69, without family genetic history, the higher the education level of the elderly, the lower the risk of disease was. The people with more education are more likely to receive health knowledge and develop a healthy behavioral lifestyle, and thus the rate of multiple chronic diseases is also lower [31,32].
Drinking is a recognized risk factor for chronic diseases [33][34][35]. However, it was worth noting that this study did not nd that drinking was associated with multiple chronic diseases in the elderly. The rst reason was that this study was a cross-sectional study, which cannot prove causality. It only understood the current drinking status of the elderly at the time of the survey, and could not observe the relationship between them. The second reason was that the number of females was higher than males in this study.
However, the drinking population was mostly male, and only accounting for only 16.4%. The choice of research sample may lead to a weak association between drinking and multiple chronic diseases. Based on this, this study still believed that drinking alcohol was a risk factor for multiple chronic diseases. However, it was speculated that the occurrence of multiple chronic diseases would prompt the elderly to actively change unhealthy behaviors and lifestyles. The health management department needs to make people aware of the harm caused by these modi able life behavior factors for multiple chronic diseases through health education and health promotion activities, so as to actively change their behaviors and lifestyles. Regarding the unmodi able related in uencing factors, including age, family genetic history and education level, in clinical, clinicians should focus on such elderly people and take preventive measures.

Limitations
Although this study had certain value for the prevention and control of multiple chronic diseases in the elderly, it also had limitations. It was because that cross-sectional studies may cause information deviations when investigating and analyzing behavioral lifestyles. Some subjects had changed their behavior and lifestyle after suffering from chronic diseases. To a certain extent, the in uence of the behavioral lifestyle on multiple chronic diseases was weakened. In addition, the prevalence of chronic diseases in this study was self-reported, and thus there may be information bias. On this basis, this study would further explore the association and related in uencing factors of multiple chronic diseases through longitudinal research to obtain more scienti c and objective research results.

Conclusion
Generally speaking, the challenges of multiple chronic diseases are serious, which has threatened their physical and mental health and placed a signi cant strain on society. Therefore, identifying the patterns and related in uencing factors of multiple chronic diseases by data mining method is of great signi cance to address the challenges of multiple chronic diseases. The most common patterns of 26 diseases are two condition combinations and three condition combinations. The health management department needs to make people aware of the harm caused by the modi able life behavior factors for multiple chronic diseases through health education and health promotion activities, so as to actively change their behaviors and lifestyles. Regarding the unmodi able related in uencing factors, including an older age, a family genetic history, and low education level, relevant departments should help them to establish prevention knowledge and raise awareness of strongly related chronic diseases, so as to reduce the incidence and adverse consequences.

List Of Abbreviations
BMI: Body mass index Declarations Ethics approval and consent to participate All study procedures were approved by the Ethic Committee of Shanxi Medical University.

Consent for publication
Not applicable.

Availability of data and materials
Please contact author for data requests.

Competing interests
The authors declare no competing interests. The sponsor had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.  Decision tree model of related in uencing factors for multiple chronic diseases