The focus of this study was to explore the construction of simple model tools for predicting HUA. We have developed and verified two new and simple HUA prediction models. The nomogram model was selected as a parametric class model, and the classification tree model was selected as a non-parametric class model. We included four non-invasive physical examination indicators that were easy to obtain clinically, such as age, gender, BMI and hypertension, and the results indicated that they were all risk factors of HUA (P ༜ 0.05) and were included in the model. The final AUC of the nomogram model obtained was 0.806 and the AUC of the classification tree model was 0.802. The specificity of the nomogram model was higher at 63.68% and the sensitivity of the classification tree model was higher at 88.60%. Each model had its own advantages and disadvantages.
In previous studies, Lee et al.  used gender, BMI and PPARγgenes to establish a model to predict the potential risk of HUA. The final model had a sensitivity of 69.3% and a specificity of 83.7% and although its sensitivity was better than those of the two models in this study, the cost of HUA prediction through genetic testing methods was high, and they were not suitable for rural hospitals with outdated medical equipment and widespread population distribution. Zeng et al.  built a HUA prediction model based on dietary information such as the frequency of eating vegetables, meat and eggs, and obtained a neural network model with an AUC of 0.827, a sensitivity of 75% and a specificity of 86%. Its AUC and sensitivity were both better than those of the two models in this study, but the model structure was more complex and it required a higher statistical background knowledge from the operators, and more variables needed inclusion. This model was also time consuming and the subjects were prone to recall bias when it was applied to the clinic.
The nomogram model is based on logistic regression analysis. It is a traditional medical model with a certain amount of accuracy, repeatability, visualization and it does not require computer software intervention. It is often used to explore the risk factors that cause diseases and predict the incidence of diseases through these risk factors. But it is difficult to fully reflect the coupling relationship between variables, and cannot effectively deal with the problem of collinearity . The classification tree model is a non-parametric regression model with no special requirements for the variables included in the model, and continuous variables can also be used as independent variables into the model. It is a type of machine learning model that can effectively deal with the problem of missing independent variables in the data. Not only can the missing values be classified into the category of a mode, but it can also be set as a separate category, so that the result is not affected by the collinearity of the variables. It can also effectively, intuitively and hierarchically display the risk factors and the interactions between them .
In recent years, some scholars have successfully applied the classification tree model to predict myocardial infarction , multi-drug resistant tuberculosis , portal vein thrombosis in patients with acute pancreatitis  and other internal or surgical diseases, and its application in medical-related fields has proven its effectiveness. However, classification trees also have some shortcomings. First, the quantitative explanation of the individual effects of the classification tree on each factor is not as clear as the nomogram model. In this study, logistic regression analysis was used to obtain the OR value, which can clearly determine the specific risk factors that affect the risk of HUA. But the classification tree model can only obtain the importance of ranking for different variables and cannot allocate a specific degree of importance for a particular variable. Second, the classification tree model has poor stability when a small sample size is available. Fortunately, the large amount of data in this study allows the classification tree model to have good stability in its prediction capability.
In this study, the ROC and calibration curves as well as DCA of the nomogram and classification tree models in both the training and validation sets achieved satisfactory results, and they were relatively similar. However, some scholars have used machine learning algorithms such as support vector machines, decision trees and random forests to build HUA prediction models with traditional logistic algorithms. They found that the machine learning model was superior to the logistic model for HUA prediction, which is different from the results of our research. The reasons may be related to the large number of dependent variables in these models and the existence of collinearity . In this study the number of variables included was small and therefore the possibility of collinearity was small, and they were confirmed to be closely related to HUA, so the performance of the nomogram model was relatively stable. In previous studies, it was shown that the prediction accuracy and performance of the nomogram model was better than those of machine learning models such as artificial neural networks and classification tree models [21, 22]. Therefore, the quality of a certain model cannot be judged unilaterally, and the accuracy and stability of a single algorithm prediction model are relatively low, which cannot always meet the clinical decision-making needs of multi-sourced, high-dimensional medical data. In future applications, we should pay more attention to the combination of multiple models in order to provide support for the improvement of disease prediction methods.
In this study, the overall prevalence rate of HUA was 21.9% of which 34.9 and 4.1% were seen in male and female populations, respectively. The prevalence rate of HUA in male population was much higher than in females, which agreed with previous reports . However, the lower prevalence of HUA in the female population in this study was considered to be related partly to the latest Chinese guidelines for the diagnosis and treatment of HUA . In this study, the multivariate logistic regression analysis and classification tree model both showed that male gender male accounts for the largest proportion of all risk factors, as was shown previously [24, 25].
The risk of HUA in males is high, and this may be partly due to capacity of estrogen to regulate the levels of uric acid transporter in the kidney through gene expression and thus promoting the excretion of uric acid and reducing its production [26, 27]. Therefore, after menopause, the difference in prevalence rates of HUA between males and females gradually decreases . In the classification tree model in this study, the node of female age classification was 49.5 years old, and the prevalence of HUA increased after 49.5 years old, which was roughly consistent with the age of menopausal women. On the other hand, due to work demands and life pressures, men are often associated with unhealthy living habits (such as the consumption of high-purine diets and alcohol), and previous literature has reported that alcohol consumption was a risk factor for HUA . In the nomogram model, the score corresponding to the ages between 18 and 29 years was higher. In the classification tree model, the prevalence of HUA for men younger than 28.5 years was higher than that for men in an older age group, which was the same as that reported by Jin et al. .
In this study, the classification tree model suggested that there was an interaction between gender and BMI. For both men and women, the prevalence of HUA in people with high BMI was higher than that in people with low BMI, and this parameter was the second most important influencing factor in the classification tree. Previous epidemiological evidence has shown that obesity was closely related to HUA, and BMI and waist circumference were positively correlated with HUA[31, 32], especially visceral fat obesity, which could accelerate ribose-5-phosphate de novo synthesis of phospho-ribose pyrophosphate through the NADP-NADPH metabolic pathway, and eventually lead to an increase of uric acid and trigger HUA. Therefore, the results of this study suggested that obese young men and postmenopausal women are key health management subjects in attempting to prevention HUA.
There are several advantages attributed to this study. First, the predictive variables required in this model were non-invasive and easy to evaluate. In addition to saving medical expenses, limited information could be used to screen undetected and unsuspected high-risk groups of HUA, which ideal for large populations with limited health resources, especially in poorer areas. At the same time, not only China, Africa and India have poor sanitary conditions, they may also benefit from this model. In addition, we combined parametric and non-parametric methods to build two prediction models. For practical applications, this method allows the possibility to select different algorithms for comprehensive comparisons and complementary advantages based on the available data characteristics and sample size, so as to improve the reliability of prediction results obtained and expand the scope of application of the model.
However, this study also had several limitations. Our sample size originated from the database of the same medical examination center, and hence there was selection bias. Therefore, it may not be possible to determine whether it is applicable to populations in other regions, and further external verification is required. As a cross-sectional survey, this study was unable to confirm the causal and temporal relationships between age, gender, BMI, hypertension and HUA, which needs to be confirmed by a large number of further multi-center, prospective studies.