As far as we know, this is the first study explored the utility of genetic factors in the prediction of dyslipidemia in resource-limited area based on a prospective study. Results of this study suggested those in higher GRS quartiles displayed increasing risk of dyslipidemia onset compared to participants with the lowest quartile of GRS. Then, the conventional models were constructed with COX, ANN, RF, and GBM classifiers, and the model with GBM classifier significantly outperformed the other classifiers. More importantly, the accession of GRS convincingly improved the capability of the conventional models to predict dyslipidemia, implying the genetic factors perform a meaningful role in predicting the occurrence of dyslipidemia.
This study elaborated the correlation between the genetic factor (GRS) and dyslipidemia by dividing GRS into quartiles. A previous study divided all participants into 3 groups according to GRSs of LDL-C, HDL-C, and TG, the highest GRS groups all presented higher lipids levels than the lowest GRS groups in HDL-C, LDL-C, and TG[23]. Similarly, in this study, we found that the higher GRS was associated with a higher risk of dyslipidemia onset regardless of age, family history of diabetes, physical activity, BMI, TG, HDL-C, and LDL-C. Although not every HR was statistically significant, dyslipidemia risk increased within each quartile of GRS, and a similar trend was observed in training set and testing set. The above announced a statistically significant enhanced occurrence of dyslipidemia risk with incremental GRS in rural area population.
Results showed that the conventional model with GBM classifier presented the best predictive performance. Yet, 7 variables demonstrated statistical significance in univariate COX regression analysis and finally were included in the conventional models. Based on the results, univariate COX regression tagged baseline lipoprotein including TG, HDL-C, and LDL-C as predictors, which was a reasonable result that currently plasma lipoproteins leading to abnormal future blood lipids. Besides, HRs of predictors in the conventional model were comparable to those reported in other reasearshes[9, 25–29]. Correspondingly, the HRs of these 7 variables were also consistent with those in early published studies[16, 18, 19]. What is noteworthy is that the three serum lipid parameters showed no collinearity. The findings pointed out that GBM classifier could predict the incidence of dyslipidemia better, which had been confirmed in the previous study[30]. This might be due to the GBM classifier could deal with the intricate relationship between predictors and dyslipidemia.
Considering the moderate but strong association between GRS and dyslipidemia, the performance of GRS to predict the occurrence of dyslipidemia was then figured out. All the 4 classifiers (COX, ANN, RF, and GBM) manifested that the discrimination and calibration of the prediction model were moderately improved by adding GRS into the conventional models. The NRI and IDI were not significantly corrected with the inclusion of GRS (P > 0.05) in the ANN classifier though the number of NRI and IDI were slightly increased. Still, a major improvement was observed in COX, RF, and GBM classifiers. As is shown in an earlier study[22], in the transition from childhood to adulthood, the predictive power of GRSs on HDL-C, LDL-C, and TG had been proved to be valuable in predicting adulthood lipids level. Any abnormal lipid index can be defined as dyslipidemia; thus, GRS might have a predictive effect on dyslipidemia, and our results confirm this. Further, the result also suggested the application of the machine learning technic might have a better effect on disease prediction than the statistical method, which was consistent with the results of other studies[31, 32]. By the same token, the elevation of other statistical (Table S3) value exhibited that GRS played a relatively important role in dyslipidemia prediction. Principally, the results of this study revealed that GRS could be a crucial predictor to the occurrence of dyslipidemia.
As was demonstrated in a former study[33], the disclosure of coronary heart disease risk estimates indicated that the inclusion of genetic risk information resulted in lower levels of LDL-C compared to the disclosure based on conventional risk factors only. Genetic risk information for common diseases could be incorporated into the conventional predictive model and used to guide treatment. Considering how lipids level impressed CVD[34, 35], it’s reasonable to infer that the addition of the GRS into the prediction model of dyslipidemia might help individuals prevent abnormal blood lipid levels and thus contribute to the prevention of cardiovascular events.
Strengths and limitations: this research clarified the crucial impact of genetic information in predicting dyslipidemia in rural area, signifying the certain guiding role of the gene in the prevention and treatment of clinical dyslipidemia. To some extent, the research indicated that the machine learning method might have certain advantages in the construction of the disease prediction model. As well, a cohort study was used to construct the conventional model and to analyze the relationship between genetic factors and dyslipidemia, making the results more convincing. Yet, several limitations need to be remarked. The integration of the four lipid measurements (TC, TG, LDL-C, and HDL-C) into dyslipidemia might gloss over the ability of genetic information in each lipid indexes. But there was no denying that genetic information was impressive in blood lipids, providing a foundation for the follow-up studies about genetic factors and lipid levels. Another limitation concerns that the brier score failed to test statistically in assessing the calibration of models, though the value has declined. Thirdly, the extrapolation of the conclusions is restricted by the lack of external validation. However, 30% of subjects were randomly selected to conduct internal verification to increase the credibility of the study. Meanwhile, the representation might limit as a result of the recruited subjects only came from the rural area in China.