Cardiovascular disease, a highly heritable trait, causes major deaths worldwide. Though the prevalence of CVD has been found high worldwide, the awareness rate has been found very low [37]. CVD is concerned as highly heritable trait, but the micronutrients intake, age, socio-economic condition, and environmental toxic metal condition can also cause severe risk of CVD [38, 39, 40]. Throughout our study, we have tried to identify the risk factors and predict CVD using different machine learning models. The hypertensive patients from BDHS-2011 and BDHS-2017-18 datasets have been used throughout the study for analysis.
Several recent studies have used ML algorithms to predict the cardiovascular diseases which indicate the reliability and the feasibility of this method in this case [41, 42, 43]. Chandralekha & Shenbagavadivu compared supervised and unsupervised ML models and found that Decision Tree has more classification accuracy, and precision with 73%, and 91% respectively. Another study conducted by Arunachalam found Support Vector Machine and Multilayer Perceptron with the highest accuracy score (91.7%). It also identified chest pain type, thalassemia, age, depression, cholesterol, gender, blood pressure as the most effective factors for CVD.
In this study, some statistical analysis such as frequency distribution and chi-square test were conducted to identify the patterns and also the significant factors. A slight increment of CVD was found from BDHS 2011 to BDHS 2017-18 with the prevalence of 22.3% and 23.1% respectively. Chi-square analysis determined division, work status, age, wealth index, place of residence and bmi as significant factors. Besides statistical analysis, 8 different ML classifier models were used to predict CVD. Among these, Random Forest was identified with the highest accuracy, precision, sensitivity, and F1 score with 78%, 78%, 74%, and 78% respectively. The features division, age, highest education level, bmi, wealth index, place of residence and work status has been identified as most important.
The highest prevalence of CVD had been occurred in Dhaka that might be the result of rapid urbanization, dietary changes, increased consumption of tobacco, limited physical activity, low level of awareness, and also the poor detection and control rate [44]. This study has also shown that bmi is also working as a significant factor for CVD because obesity irritates plaque in the arteries and predisposes, releases substances in the blood that make plaque rapture, and also develops atrial fibrillation, increases triglyceride levels which triggers heart attacks, plaque rupturing, and stevens notes [45]. Moreover, age is also an important feature for CVD, since it has been linked to obesity, persistent inflammation, and oxidative stress which may increase the risk of heart diseases [46]. Our study has also been found that the prevalence of CVD is higher in rural areas that may happen because of the low level of awareness among people and also the inadequate health qualities [47]. Another important risk factor determined by our study is wealth index which has also been found positively correlated with CVD. This may happen because of the accessibility of high-calories food from well-off families and also related with less physical activities [48].
Working status has also been found positively correlated with CVD that means that less physical activity as well as intaking high-calories food and also stress may increase the risk of CVD [49]. Moreover, the ML models determined education level as the most important significant factors for CVD. This explains the fact that low education may lead to low awareness and knowledge of healthy lifestyle, and also the risk of CVD [50].
The findings of this study provide valuable insights and practical implications for addressing cardiovascular disease (CVD). The statistical analysis identified several significant factors associated with CVD, including division, work status, age, wealth index, place of residence, and BMI. These factors can help healthcare professionals and policymakers prioritize interventions and allocate resources effectively.
The study also employed machine learning (ML) models, with Random Forest achieving the highest accuracy, precision, sensitivity, and F1 score for predicting CVD. This suggests that ML models can be utilized as a reliable tool for early detection and risk assessment of CVD. The identified important features, such as division, age, highest education level, BMI, wealth index, place of residence, and work status, can guide the development of targeted interventions. For example, focusing on urban areas like Dhaka, where a higher prevalence of CVD was observed, interventions can address factors like rapid urbanization, dietary changes, increased tobacco consumption, limited physical activity, low awareness, and inadequate detection and control rates.
Promoting awareness and education about healthy lifestyles, especially among individuals with lower education levels, can help mitigate the risk of CVD. Targeted interventions in rural areas, aiming to improve health infrastructure and increase awareness, can contribute to reducing the burden of CVD in those communities. Addressing the correlation between wealth index and CVD requires strategies to promote healthy eating habits and physical activity among all socioeconomic groups. Workplace interventions focusing on reducing stress and promoting physical activity can also contribute to preventing CVD.
Limitations
Since there was a significant gap between the two BDHS datasets that were combined, this may have influenced the results. Respondents related to the topic were very limited, for which the sample size is very small. Fasting plasma glucose (FPG) readings are used to monitor diabetes in BDHS, but they do not constitute a clinical diagnosis of the disease because, according to the WHO, "FPG alone cannot be used to diagnose diabetes, as it fails to diagnose around 30% of cases of previously undiagnosed diabetes. However, there is still room for improvement in the method that is currently being used.