Among the twenty selected features in this study, BMI, SBP, TG, Cr, LDL-C, and glucose had a strong effect on hypertension prediction and were included among the top 10 in the ranking of the feature importance for all three models. Similar to the results of previous studies, features such as age25,29,30, BMI26,29, diabetes status27, Cr26, blood pressure28,30, WC29, smoking status29, LDL-C26,29, HDL-C26, drinking29, glucose30, TC26,27, exercise31, salt intake32, and TG27 were found to be predictors of hypertension in the risk assessment model of hypertension.
However, to the best of our knowledge, urinary protein, urea nitrogen, and EHSA entered the models as new components that have not been included in risk evaluation models of hypertension in previous studies.
A study collected data from three exams in the Strong Heart Study, explored the risk factors for hypertension by means of generalized linear models and demonstrated that systolic blood pressure was significantly and positively associated with albuminuria, age, and obesity and negatively associated with smoking. Moreover, participants with more severe albuminuria status or older age developed higher SBP, while DBP was not significantly affected by albuminuria status33. This study in American Indians revealed that having macro/microalbuminuria is a significant risk factor for hypertension, which can explain why urinary protein was selected as one of the features in our model to some extent. Urinary protein may also affect the development of hypertension in Chinese individuals or facilitate the risk assessment of hypertension in Chinese individuals. Furthermore, Kim et al. reported that subjects with high normal BP had an independently significant association with microalbuminuria by means of multiple logistic regression analysis, with an odds ratio of 1.692 and a 95% confidence interval from 1.097 to 2.61134. These results from a Korean population indicated that compared to individuals with normal BP, those with high normal BP have more risk factors for hypertension and cardiovascular diseases, for instance, albuminuria. Since the incidence of urinary protein was significantly higher in the prehypertensive population than in the normal population, urinary protein should receive attention in future predictive studies and intervention measures.
Although we rarely found urea nitrogen to be included as a predictive factor in the risk prediction models, it was found to be a significant risk factors for hypertension. A case-control study conducted among university staff found that staff with high serum urea levels had a higher risk of hypertension than those with normal urea levels (OR=1.452), which implies that the level of urea is also of great importance as one of the risk factors for hypertension35. Not coincidentally, this phenomenon has been found among middle-aged and elderly people. SBP was found to be positively correlated with the concentration of blood urea nitrogen (r=0.16424, P=0.0105) and the concentration of blood uric acid (r=0.16023, P=0.0126) among middle- and older-aged populations in Guangzhou, China, as well as DBP (concentration of blood urea nitrogen: r =0.13506, P =0.0358; blood uric acid: r=0.16562, P=0.0099)36. The results of stepwise regression analysis also indicated that there was still a significant positive correlation between SBP, DBP and concentrations of blood urea nitrogen and blood uric acid. The role of urea nitrogen, one of the features entered into our risk assessment model, in the occurrence and development of hypertension still needs to be further investigated.
EHSA was also one of the predictors entered into our model. Kaplan and Camacho have already demonstrated that the association between level of perceived health and mortality persisted in multiple logistic analyses controlling for age, sex, physical health status, health practices, social network participation, income, education, health relative to age peers, anomy, morale, depression, and happiness37. The results reminded us that self-assessment of health might serve as a comprehensive reflection of unmeasurable factors and as an indication of some underlying diseases or an early stage of the diseases. Evidence has shown that psychosocial factors have a strong influence on health status measures38. Zhang et al. revealed that the proportion of elderly individuals with poor or normal health self-assessments suffering from common chronic diseases was significantly increased39. The health self-assessment epitomizes the health concept and self-perception of health status of elderly individuals to some extent, which might have an underlying predictive value on the prediction of the risk of hypertension and should thus be given more attention in future research, as well as the practice in primary care.
Unlike traditional risk assessment methods, our study employed ML algorithms for model construction. XGBoost exhibited the best performance compared to random forest and logistic regression. Logistic regression assumes that every variable should be independent, and the model possesses only a linear partition surface. However, the associations between exposure factors and diseases are often affected by various confounding factors, which leads to the large deviation and low accuracy when fitting the model through logistic inference. In contrast, XGBoost and random forest are nonparametric algorithms40 that do not assume that there is a functional relationship between the features and outcomes, as required by logistic regression. A greedy algorithm is executed to determine the optimal splits in the data that reduce the entropy of the outcome to the utmost extent during every split. As a result, once a feature is selected, the significance of any highly related feature will decrease greatly due to the completion of the effective split done by the original feature previously. Consequently, the entropy of the outcome will no longer be reduced effectively by related features. Therefore, XGBoost and random forest are robust to related features. The reason why XGBoost outperforms the other methods may be that it introduces the regularized loss function41 and combines gradient lifting algorithms and decision trees, which preserves the correlation between features during the modelling process42.
After the risk assessment of hypertension, subsequent interventions and management to prevent or postpone the occurrence and development of hypertension are crucially important in high-risk populations. Continuous monitoring and management are imperative for high-risk patients. On the one hand, realtimeness and continuity monitoring can detect any problem without delay. On the other hand, early signs of detected symptoms can alert both general practitioners (GPs) and individuals in a timely manner. For high-risk populations, corresponding individual intervention strategies targeting the main risk factors should be prescribed by GPs in primary care. For instance, lifestyle factors such as exercise, eating habits, and drinking habits can be improved under the guidance of GPs after risk assessment. Evidence has revealed that a high concentration of parks or playgrounds in residential areas may reduce the risk of hypertension, mainly attributable to the cultivation and formation of exercise habits, which implies the importance of interventions in communities43.
However, there were several limitations in our study. One of the limitations of the study was that it had a cross-sectional design, and the results could not indicate causality in this situation. A prospective cohort study is needed to further identify the cause-and-effect relationships. Second, the risk assessment model was designed considering only variables available in the setting of primary care, and variables regarding mental health and hereditary factors were not included. Third, we measured several variables, such as age, urinary protein, BMI, and Cr, on only a single occasion and did not take changes in these variables into consideration.