Coronary artery disease (CAD) is the major cause of morbidity and mortality worldwide (1). CAD is also very prevalent in Iran compared to other countries. Therefore, finding a national program for reducing the risk factors of CAD based on lifestyle is fundamental (2). The most prevalent CAD risk factors are smoking, male gender, age, ethnicity, family history of the disease, high blood pressure, high blood cholesterol, diabetes, poor diet, Lack of exercise, obesity, stress and blood vessel inflammation. These factors affect each patient variously (3).
The gold standard of CAD diagnosis remains invasive coronary angiography, but this procedure is associated with a risk of serious complication (4). Finding an appropriate, safe and non-invasive method to diagnosis is the aim of current diagnostic approaches (4). Evidence indicated significant association of a limited number of dietary factors and dietary patterns with CAD (5). Previous association studies reported mineral dietary intake such as sodium, potassium, magnesium and zinc as associated risk factors of CAD (6–11). While, many previous studies have shown that vitamin C, vitamin E and selenium interventions do not reduce the risk of CAD (12, 13). Furthermore, a study in China indicated that low fat and high fiber intake decrease CAD mortality (14). In regard to protein intake and risk of CAD, in a follow-up study by health professionals a significant relationship was found between total protein and increased risk of CAD (15), However, in a review article, Pedersen et al. concluded that there was no significant relationship between protein intake and strokes and coronary heart disease (16).
Hence the use of dietary and intake patterns and their application to novel algorithms to predict CAD remains a substantial approach to risk stratification (17).
Machine learning is rapidly used to predict healthcare issues, such as cost, utilization, and status. In machine learning, the purpose is to train the algorithm to learn how maps inputs features to an output. Generally, any machine learning method applies the following steps; data preparation, algorithm selection, training, regularization, and evaluation (18). Different methods of machine learning models for coronary artery disease were previously built and analyzed (19–21). Nevertheless, the circumstances may vary based on different situations, lifestyles, and accessible data and features. Thus, we believe that with constructing and validating prediction models, it becomes plausible to classify patients who have a high risk of disease from those who are at low risk. Consequently, a diagnostic model for predicting CAD is necessary.
In the present paper, among different methods of machine learning (artificial neural network, deep learning, etc.) we employed a well-known technique called decision tree (DT). A DT model is a graphical model that its structure is like a tree. One of the advantage of DT is that the produced model is a more interpretable model.
DT is a predictive technique which turns the fact about a disease into some conclusions about the disease purpose value. DT is one of the most significant algorithms applied in machine learning. C5.0, C&R Tree, CHAID, and QUEST (Quick, Unbiased, Efficient Statistical Tree) are some applying DT algorithms in machine learning modeling.
QUEST is a binary-split decision tree method of machine learning. In Quest, the association between the input features and the target is calculated by ANOVA F-test (ordinal features) or Pearson's chi-square (nominal features). The features that make the greatest agreement with the target is chosen to divide the node (22). The computation speed in this method is greater than those in other algorithms; the benefit of this algorithm is that it can avoid the bias that exists in other classification methods (23).
In this current study, QUEST is applied for models construction to recognize the importance of factors related to incidence of CAD, and detecting dietary intake as a major CAD risk factor.