3.1 Sociodemographic characteristics
In total, 3250 participants completed the questionnaire effectively. Of all elderly in this survey, there were 1515 males (46.62%), and 1735 females (53.38%); 1453 (44.71%) lived in urban areas and 1797 (55.29%) in rural areas ; 577 (17.75%) had no formal education, 1062 (32.68%) with primary school education, 850 (26.15%) with junior high school education, 474 (14.58%) with high school/secondary school education, 144 (4.43%) with junior college/high school education, 143 people (4.40%) with bachelor degree or above; 166 (5.11%) had a family genetic history. The average age of the respondents was 69.65 years old. In this study, age was categorized into three categories (60 to 69, 70 to 79, 80 and above years). Among them, there were 1769 (54.43%) in the 60-69 age group, 1164 (35.82%) in the 70-79 age group, and 317 (9.75%) in the 80 and above age group.
3.2 The prevalence of chronic diseases in the elderly
Among the 3250 elderly people surveyed, 1901 (58.49%) had chronic diseases and 985 (30.31%) had multiple chronic diseases. The number of multiple chronic diseases ranged from 2 to 9. The coexistence of 2, 3, and 4 chronic diseases was relatively common, accounting for 54.11%, 26.19%, and 11.17%, respectively. The top ten prevalent chronic diseases were hypertension, diabetes, rheumatoid or rheumatoid arthritis, hearing impairment, digestive system diseases, osteoporosis, coronary heart disease, eye diseases, respiratory diseases (bronchitis, emphysema, asthma) and stroke. Table 1 showed the gender differences in the top 10 chronic diseases.
3.3 Analysis of association rules for multiple chronic diseases
The most common algorithm for association rule mining is the Apriori algorithm [22]. We employed the Apriori algorithm to obtain association rules between common chronic diseases in this study. Finally, 10 association patterns with strong association strength were selected. They were both patterns of coexistence of two chronic diseases, 8 of which were related to hypertension (Table 2). The most strongly associated comorbid pair was hypertension and atherosclerosis. Using the network graphics provided in SPSS Modeler 18.0 software, drew a network graph of the correlation strength between the 11 chronic diseases, which finally included 9 chronic diseases (Figure 1). The thickness of the line between the two points reflected the strength of the chronic disease [23]. The thicker the line between the two points, the stronger the correlation between chronic diseases was.
3.4 Univariate analysis of related influencing factors of multiple chronic diseases
The chi-squared test showed that there were significant differences in multiple chronic diseases between the elderly in age, gender, education level, living area, family genetic history, empty nest situation, marital status, smoking, drinking, diet, fresh fruit intake frequency, BMI (P < 0.05). However, there was no statistically significant difference in family monthly income (P=0.125), regularity of three meals (P=0.524), and fresh vegetable intake frequency (P=0.127) in multiple chronic diseases.
3.5 Results of Decision tree analysis
The decision tree model had a total of 15 nodes with 9 terminal nodes, which depth was 3 (Fig 2). Five main variables were affecting the presence of multiple chronic diseases, among which the first variable was age, and followed by smoking, family genetic history, fresh fruit intake frequency, and education level, with the significance of P < 0.05. The result had important practical implications, which suggested that it would be important to target these five factors in health management aimed to reduce the presence of multiple chronic disease. Modifiable lifestyle factors including smoking, and fresh fruit intake frequency affected the presence of chronic diseases. Unmodifiable factors included age, family genetic history and education level. Other variables did not reach the significance of 0.05 and were not included in the model such as gender, living area, empty nest situation, marital status, drinking, diet, and BMI.
In this study, the following 9 decision rules associated with different prevalence of multiple chronic diseases were extracted through decision tree model. Sort by multiple chronic disease rate from high to low as follows:
Rule 1: If the elderly aged 70-79 years old, had never smoked/were smoking and had a family genetic history, then the multiple chronic disease rate was 62.7%.
Rule 2: If age ≥80 years old, fresh fruit intake frequency was not daily, but at least once a week/not weekly, but at least once a month/not monthly, but sometimes / rarely or never eat fresh fruit, then the multiple chronic disease rate was 55.6%.
Rule 3: If the elderly aged 60-69 years, had a family genetic history, then the multiple chronic disease rate was 50.6%.
Rule 4: If the elderly aged 70-79 years old, had quit smoking, then the multiple chronic disease rate was 45.5%.
Rule 5: If age ≥80 years old, fresh fruit intake frequency was almost every day, then the multiple chronic disease rate was 31.8%.
Rule 6: If the elderly aged 70-79 years old, without family genetic history, had never smoking/was smoking, then the multiple chronic disease rate was 31.5%.
Rule 7: If the elderly aged 60-69 years, without family genetic history, and the education level was primary school/no formal education, then the multiple chronic disease rate was 28.1%.
Rule 8: If the elderly aged 60-69 years, without family genetic history, and education level was junior high school/high school/secondary school/high vocational/junior college, then the multiple chronic disease rate was 20.4%.
Rule 9: If the elderly aged 60-69 years, without family genetic history, education level was undergraduate and above, then the multiple chronic disease rate was 6.9%.