According to reports recently published by Chinese Center for Cardiovascular Diseases, the number of people suffering from cardiovascular disease (CAD) in China is about 330 million, of which 13 million are cerebral stroke, 11 million are coronary heart disease, 8.9 million are heart failure, and 245 million are hypertension . As a heart and vascular disease, CVD is the leading cause of death worldwide . Common diseases such as angina pectoris, myocardial infarction, hypertension, and cardiac insufficiency are multiple heart diseases. In most people's understanding, CVD occurs suddenly. However, CVD has traces to follow before it occurs. The incidence rate of CVD is increasing year by year. The main cause is AS. AS is one of the most serious chronic diseases, which will affect human health. It is the main pathological basis of ischemic cardiovascular and cerebrovascular diseases such as coronary heart disease, cerebrovascular disease and thromboembolic disease .
Yi et al.  Found that smoking can lead to the prevalence of arteriosclerosis. In 1994, Vanderwal et al.  showed that T-cells and mast cells at the site of plaque rupture produce many types of molecules-inflammatory cytokines, proteases, coagulation factors, free radicals, and vasoactive molecules, which can make AS lesions unstable. In 1994, Moreno et al.  found that macrophages are markers of unstable atherosclerotic plaque and may play an important role in the pathophysiology of acute coronary syndrome. Studies by Hansson team  and Amento team  showed that the above reactions may lead to plaque activation and rupture, thrombosis and ischemia. Zhu et al.  showed that compared with healthy adults, cIMT in high-risk groups of AS was related to patient age and carotid artery. In addition, cIMT was related to age. In other words, patients with AS are related to age. Ross et al.  comprehensively described the development process of AS in literature and proposed that AS is an inflammatory disease. Literature [11, 12, 13] thought that C-reactive protein, the most iconic factor in the process of inflammation, is considered to be a highly sensitive detection index in the occurrence and development of AS. Nofer et al.  pointed out that HDL3 can reduce the production of IP3, thereby inhibiting thrombin, and then causing thrombosis. Literature [15, 16] showed that inflammation can oxidatively modify LDL, and the oxidized LDL further promotes the inflammatory process of the arterial intima. In addition to inflammation, hypertension and infection are also important causes of AS. Kranzhofer et al.  considered that angiotensin II in hypertensive patients will increase, and angiotensin II will stimulate the growth of vascular smooth muscle, thereby forming AS. Kuvin and Kimmelstiel  pointed out that infection can cause AS to form. Geisel et al.  found that compared with cIMT and ABI, coronary artery calcification provides the best risk identification, especially in the medium-risk group, IMT thickening can reflect the presence of early atherosclerosis. Ohkuma et al.  showed that Brachial-ankle pulse wave velocity was significantly associated with cardiovascular events and was independent of traditional risk factors. Kawada et al.  used the plaque score (PS) to describe the severity of plaque formation and found that PS can be used to assess the presence of advanced atherosclerosis. At the same time, Kawada et al. found that metabolic syndrome is significantly associated with carotid AS. Li et al.  found that the PWV-BS and PWV-ES values of hypertensive patients were significantly increased in the study of Ultra-Fast imaging technology to determine the pulse wave velocity of hypertensive patients and related influencing factors, which can be used as an index to evaluate the elastic function of carotid artery. Bos et al.  found that the traditional cardiovascular risk factors are related to intracranial carotid atherosclerosis by studying the prevalence and risk factors of intracranial carotid atherosclerosis in the general population, but the distribution of risk factors is different between men and women, and the risk of men is higher than that of women. Mirault et al.  found that ultrafast imaging can evaluate the carotid pulse wave propagation velocity and its changes in the cardiac cycle. The difference between the early and late pulse wave propagation velocity increases with age. Yang et al.  found that age, body mass index and blood pressure were the main factors affecting ufPWV. Carew et al.  found that antioxidant LDL antibody has pathogenic effect on aortic lesions when studying the role of LDL in the process of atherosclerotic lesions.
AS has a strong relationship with cholesterol content, so it can be judged by the content of total cholesterol (TC), high-density lipoprotein cholesterol (HDL-C), and low-density lipoprotein cholesterol (LDL-C) obtained by blood test . The formed AS damage can be observed by the following features in B-mode ultrasound images: the bulge of intimal medium; Total plaque area; Total plaque volume . The degree of arterial stiffness is the physical property of the arterial vessel itself, so it can be measured by the speed at which the pulse wave travels along the arterial vessel, that is, the pulse wave velocity. At present, a large number of studies have confirmed that PWV can accurately reflect the degree of arterial stiffness. Literature  studied more than 3000 people over 60 years old and concluded that there is a statistically significant and strong correlation between arterial stiffness and atherosclerosis. Therefore, clinical measurement of arterial stiffness can be used as a means to detect AS. For patients with known or unknown AS, the conventional methods are biochemical detection, image detection and physical detection, but these methods are not only time-consuming and expensive, but also cause harm to patients, which not only causes economic losses to patients, but also brings hidden dangers to patient's health.
The research team of Vienna Medical University in Austria found that April protein can reduce subendothelial lipid deposition and prevent the formation of AS . Yusuf et al.  pointed out that most patients with AS are obese. At the same time, smoking will aggravate AS, and people with too much pressure are also easy to cause AS. Therefore, you can control the total calories of food, eat more foods rich in vitamin C and plant protein, and do not overeat. Carry out sports appropriately and maintain a positive and optimistic attitude towards life and work
Nowadays, for many diseases, many scholars not only use traditional and conventional medical diagnosis methods, but also use machine learning methods for auxiliary diagnosis. Many scholars have applied machine learning algorithm to predict other diseases closely related to atherosclerosis, and achieved good research results.
In 2017, Xu et al.  used a logistic regression algorithm to detect 7360 CAD patients and non-CAD patients. They found seven factors that are closely related to CAD: age, gender, Serum creatinine (Scr), smoke, angina, diabetes, Low Density Lipoprotein (ldl). And gave its specific formula
According to the sigmoid function, the patient's disease probability can be obtained. Xu et al. set the probability threshold to 0.79. The specificity of the model is 0.709 and the sensitivity is 0.658, but the accuracy of the model is very low. Investigating the reason, the author only considered the linear relationship between the target variable and each factor and did not consider its non-linear relationship. In 2017, Tan et al.  used the CNN-LSTM model to detect 6120 CAD and 32000 non-CAD patients. They used two layers of CNN (two layers of pooling layer, two layers of convolution layer) and three layers of LSTM, and the last layer of full connection layer, a total of eight layers of cnn-lstm to extract signals from patient's ECG, then the extracted features are fitted. The sensitivity, specificity and accuracy of the model were 0.9985, 0.9984 and 0.9985 respectively. The generalization ability of the model is good, but it does not reveal the relationship between features and target variable. In 2017, Acharya et al.  used CNN model to fit 2-seconds segment ECG (model A) and 5-seconds segment ECG (model B) respectively. The model consists of five convolution layers, five pooling layers and one full connection layer. The sensitivity of model A is 0.9372, the specificity is 0.9518, and the accuracy is 0.9495. The sensitivity of model B is 0.9113, the specificity is 0.9588, and the accuracy is 0.9511. The model also fails to reveal the relationship between features and target variable. In 2017, Lih et al.  used wavelet packet decomposition to process the ECG of 12308 CAD patients and 3791 normal people and use K-Nearest Neighbor classifier to classify it. The sensitivity of the model is 0.9964, the specificity is 0.9971, and the accuracy is 0.9965. Olaniyi et al.  used KNN, DT, naive Bayesian WAC and BPNN to fit Cleveland dataset at the same time, and the accuracy rates were 0.8567, 0.8435, 0.8231, 0.84 and 0.85 respectively. Alizadehsani et al.  used SVM to perform CAD prediction on the new data set Z-Alizadeh Sani dataset they extracted in 2016.The researchers used four different kernel functions to fit the model, and the model with the highest accuracy used RBF core, its accuracy rate is 0.8185. In the next few years, scholars successively built different classification models to predict CAD based on the Z-Alizadeh Sani dataset. Arabasadi et al.  fused GA and ANN models—GA-ANN. The sensitivity of this model is 0.97, the specificity is 0.92, and the accuracy is 0.9385. The sensitivity of only using the ANN model is 0.86, the specificity is 0.83, and the accuracy is 0.8462.
This paper aims to apply the improved ensemble learning algorithm to the diagnosis of AS. As far as we know, this paper not only applies machine learning method to atherosclerosis detection, but also proposes a new model that ensemble algorithm based on strong classifier. This paper also improved the proposed new model. In addition, we will compare the results of our model with the results of other traditional machine learning methods, such as RF, eXtreme Gradient Boosting (XGboost) and Support Vector machines (SVM). This paper uses AS data to propose a Weighted-Ensemble model based on strong classifiers. Firstly, according to the correlation analysis in statistics, filter out the features in the data set that have no influence on the target variable. Secondly, put the remaining features in the data set into the RF, and screen out the important features according to the Gini index. Finally, we use the selected important features to build our model, and use the fitted model to predict disease. The main innovations and contributions of this paper are as follows. a) The improved machine learning method is applied to the prediction of AS. b) Factors that have important effects on AS were screened from the data set. c) A new model, Weighted-Ensemble model based on strong classifier, is proposed. d) Compared with other machine learning algorithms, our proposed model has higher quasi-prediction accuracy and better generalization ability.
In fact, the results of selecting important features will reveal the relationship between AS and various factors, and doctors and scientists can make more scientific decisions based on these results.