Machine learning algorithms to automate differentiating cardiac amyloidosis from hypertrophic cardiomyopathy

Cardiac amyloidosis has a poor prognosis, and high mortality and is often misdiagnosed as hypertrophic cardiomyopathy, leading to delayed diagnosis. Machine learning combined with speckle tracking echocardiography was proposed to automate differentiating two conditions. A total of 74 patients with pathologically confirmed monoclonal immunoglobulin light chain cardiac amyloidosis and 64 patients with hypertrophic cardiomyopathy were enrolled from June 2015 to November 2018. Machine learning models utilizing traditional and advanced algorithms were established and determined the most significant predictors. The performance was evaluated by the receiver operating characteristic curve (ROC) and the area under the curve (AUC). With clinical and echocardiography data, all models showed great discriminative performance (AUC > 0.9). Compared with logistic regression (AUC 0.91), machine learning such as support vector machine (AUC 0.95, p = 0.477), random forest (AUC 0.97, p = 0.301) and gradient boosting machine (AUC 0.98, p = 0.230) demonstrated similar capability to distinguish cardiac amyloidosis and hypertrophic cardiomyopathy. With speckle tracking echocardiography, the predictive performance of the voting model was similar to that of LightGBM (AUC was 0.86 for both), while the AUC of XGBoost was slightly lower (AUC 0.84). In fivefold cross-validation, the voting model was more robust globally and superior to the single model in some test sets. Data-driven machine learning had shown admirable performance in differentiating two conditions and could automatically integrate abundant variables to identify the most discriminating predictors without making preassumptions. In the era of big data, automated machine learning will help to identify patients with cardiac amyloidosis and timely and effectively intervene, thus improving the outcome.


Introduction
Cardiac amyloidosis (CA) is a part of systemic amyloidosis, in which misfolded amyloid proteins are deposited outside cardiomyocytes and lead to restrictive pathology of the heart, often denoting a poor outcome [1,2]. In recent years, several new therapies that significantly improve the prognosis of patients with CA have been developed, including bortezomib-based induction and consolidation strategies, autologous stem cell transplantation, immunomodulatory drugs, etc. [3]. Unfortunately, for patients with advanced cardiac involvement, current treatments are still limited.
Moreover, patients with CA could be easily misdiagnosed as hypertrophic cardiomyopathy (HCM) who have similar phenotypes that are difficult to distinguish on routine echocardiography, often leading to delayed diagnosis. However, CA has high mortality and poor prognosis, which makes early detection and differential diagnosis quite important.
Because of the advantages of wide application and superior diastolic function assessment, echocardiography has become the preferred screening method for CA. Advanced two-dimensional speckle tracking echocardiography (2D-STE) and strain, and strain rate imaging have been proven to differentiate CA from other causes of concentric cardiac hypertrophy [4]. Since supersonic inspection always produces lots of imaging data and the variables interact with each other to varying degrees, it is difficult to identify the most discriminative predictors through ordinary statistical analysis. Therefore, more powerful data processing 1 3 approaches are urgently needed to extract and analyze imaging data.
Machine learning (ML) utilizes computer algorithms to seek inherent patterns in datasets with massive variables without making preassumptions. It can learn from established datasets and facilitate the prediction of risk models on new data. In recent years, ML has become an effective means for prediction and intelligent decision-making [5][6][7] and has achieved commendable success in cardiovascular medicine, such as differentiation of constrictive pericarditis from restrictive cardiomyopathy [8], risk prediction of readmission of patients with heart failure [9], diagnosing different arrhythmias [10,11], etc. Given this, we proposed an intelligent identification study of CA and HCM based on ML.

Methods
A case-control study of 138 subjects, including 74 verified CA cases and 64 patients with verified HCM cases were referred to the First Affiliated Hospital of Zhejiang University School of Medicine from June 2015 to November 2018. The type of amyloid of all patients with CA assessed by immune histology was the light chain and patients were eligible for inclusion if they met any of the following criteria: (1) an endomyocardial biopsy confirmed amyloid deposits; (2) a positive non-cardiac biopsy for amyloidosis combined with cardiac magnetic resonance or non-strain-based echocardiography which presented typical characteristic of CA, with relevant clinical history and laboratory findings. The characteristics of CA are consistent with the Expert Consensus Recommendations for Multimodality Imaging [12]. Cardiac involvement of CA was assessed by imaging scans, of which forty patients involved the left ventricle, twenty-seven patients involved the left and right ventricles, one patient implicated two ventricles and the left atrium, and six patients involved all ventricles and atria. We further incorporated 64 patients with HCM as comparator groups whose diagnose were created according to recently published guidelines from the American College of Cardiology/the American Heart Association [13], and they underwent both echocardiography and cardiac magnetic resonance imaging to further assess HCM and exclude other pathologies. Three of them had also genetic analysis and all showed heterozygous mutation. An echocardiographic examination was performed in all patients with HCM who presented unexplained left ventricular asymmetrical hypertrophy with septal wall thickness ≥ 15 mm. In the case of positive family history (such as sudden death, cardiac hypertrophy, etc.), interventricular septal thickness ≥ 13 mm was also enrolled. Subjects with left ventricular ejection fraction < 45%, secondary cardiac hypertrophy caused by severe aortic valve disease, long-term uncontrolled hypertension, or thyroid disease were excluded from the study. Patients were also excluded if the relevant data were not available. The local institutional ethics committee approved the study.

Echocardiographic examination
All echocardiographic studies were conducted on GE Vivid E9 Color Doppler Ultrasound system (GE Medical, Milwaukee, Wisconsin, USA) equipped with a 2-dimension probe M5S with a frequency of 2.0-4.5 MHz and a frame rate of 50-70 frames per second. The grayscale dynamic images of the 4-chamber views, the long axis view of the left ventricle, the 2-chamber views, and the short axis section with 3 consecutive cardiac cycles were obtained and stored on the hard disk. M-mode and tissue Doppler ultrasound were used to collect ultrasonic parameters, which included: the left atrial volume index using an ellipse formula, end-diastolic left ventricular diameter, end-systolic left ventricular diameter, end-diastolic left ventricular volume, end-systolic left ventricular volume, ejection fraction using the biplane Simpson's method in 4-chamber views and 2-chamber views, septal wall thickness, posterior wall thickness. The eccentricity index was calculated as septal wall thickness divided by posterior wall thickness. Relative wall thickness was calculated as the ratio of 2 septal wall thickness divided by enddiastolic left ventricular diameter. The left ventricular mass index was calculated based on the Cube formula. Concentric hypertrophy was diagnosed in patients with relative wall thickness > 0.42 and a left ventricular mass index > 115 g/ m 2 . Diastolic parameters, including peak early (E) and late (A) diastolic mitral inflow velocity and the ratio of E/A, e′, and the ratio of E/e′ ratio were also measured ( Fig. 1).

2D-STE acquisition and analysis
Offline analysis of the video clips was based on Echo PAC Version 201 software (GE Company, Fairfield, Connecticut, USA), running on Windows 10 Version 1709 (Microsoft Corporation, Washington State, USA). Selecting clear dynamic images and using the 4-chamber views, the long axis view of the left ventricle, and the 2-chamber views, the left ventricular endocardial and epicardial myocardium were automatically tracked combined with manually adjusted frame by frame throughout the cardiac cycle, and divided into 16 segments to generate a 'bull's-eye' plot.
The strain data are gathered by time and space parameters. Each cardiac cycle was divided into 17 equal segments (T1-T17), and Tj represented the corresponding time points (j = 1,2…17); Strain measurements were included as follows: longitudinal strain (LS), global longitudinal strain, longitudinal strain velocity, longitudinal strain rate, longitudinal displacement, circumferential strain, global circumferential strain, circumferential strain rate, radial strain, global radial strain, radial strain rate, rotational rate, left ventricular twist, left ventricular twist rate. According to the above methods, 3791 (223 × 17) variables were systematically extracted for each patient (223 are strain-derived variables and 17 are time points). Average time strain-derived variables (223 variables) were used to train the models. Relative apical sparing was calculated as average apical LS divided by the sum of the average basal and mid-LS, septal apical to base ratio as apical septal LS divided by basal septal LS, and ejection fraction strain ratio as ejection fraction divided by global longitudinal strain.

Establishment and assessment of prediction models
Two ML-based prediction models were established respectively: one model was built using clinical characteristics, conventional echocardiography, and 2D-STE data; the other was to build models using only 2D-STE data.

Prediction models using clinical characteristics, conventional echocardiography, and 2D-STE data
We developed prediction models using four approaches: logistic regression, support vector machine, random forest, and XGBoost. These represent the comprehensive analysis from traditional logistic regression to classic ML algorithms (support vector machine, random forest), and then to advanced gradient boosting (XGBoost). To assess the validity of the models, we performed tenfold (or fivefold) crossvalidation by randomly dividing the entire data into 10 (or 5) parts for 10 (or 5) iterations. In each iteration, we selected 7 parts as training data and 3 parts as test sets. We reported average results for each model on 30% of unseen test sets.

Logistic regression
Logistic regression is the most commonly used risk prediction model. First, univariate logistic regression was used to screen out the variables that were meaningful to predict CA. Then, variables with p < 0.1 were enrolled in the multivariate regression analysis for modeling according to the previous research, clinical experience, and the multiple requirements between variables and outcome. In addition, Spearman correlation was used to exclude the influence of collinearity among variables.

Support vector machine
Support vector machine converts data into complex highdimensional space to look for the largest difference margin to realize the differentiation of diseases [14]. We applied linear basis kernel and cost function to build the model and tuning parameters to minimize the error classification.

Random forest
Random forest is a tree-based method, the essence of which is to continuously split variables at discrete cutting points, usually presenting in the form of a tree graph [14].
A separate tree is built from bootstrapped data and variables, and the final model is a collection of many trees.

Gradient boosting
The core idea of gradient boosting is to set up a series of initial models based on the decision tree, which is called base classifiers [15,16]. Subsequently, weaker base classifiers are iterated and adjusted the weights to create a single stronger classifier. Information gain (IG), a technique of feature selection, is defined as a metric of effective classification. It is measured in terms of the entropy reduction of the class, which reflects additional information about the class provided by the variables.

Prediction models of using 2D-STE data
Boosting-based algorithms are increasingly used because they involve the sequential creation of models, with each iteration attempting to correct errors in the previous models. LightGBM and XGBoost are two widely used algorithms. We developed predictive classifiers using 2D-STE data: (1) LightGBM; (2) XGBoost; (3) voting model based on Light-GBM and XGBoost. To evaluate the validity of models, a fivefold cross-validation was performed. We split the dataset into the training set and test set in a 4:1 ratio and reported the performance on the test data.

Statistical analysis
Categorical variables were expressed as the number of cases and percentages and were compared using the chi-square test or Fisher's test. Continuous variables were expressed as mean ± SD. Kolmogorov-Smirnov test was used to determine whether the data were normally distributed. If the data conform to the normal distribution, the independent sample T-test was used for comparison; otherwise, the Mann-Whitney U test was suitable. p < 0.05 was considered statistically significant. Sensitivity, specificity, positive predictive value, negative predictive value, accuracy, receiver operating characteristic curve, and area under the curve (AUC) were used to evaluate the performance of models. DeLong test was used to evaluate whether the AUC in different models was statistically significant. The data analysis was implemented on SPSS 23.0 (Version 23.0), R (Version 4.0.3), and Python (Version 3.7).

Study population
The clinical characteristics of both groups are summarized in Table 1. The age (60.9 ± 9.7 vs 50.3 ± 15.3, p < 0.001), There was no statistical difference in left atrial volume index, e′, global longitudinal strain, septal apical to base ratio, and ejection fraction strain ratio between the two groups, as shown in Table 2.

The models based on 2D-STE data
After training with the tuned hyperparameters, the feature importance of the voting model integrated LightGBM and XGBoost was obtained and ranked. The results indicated that RadStrain3 (the radial strain of the middle ventricular septum) was the most important predictor, followed by LongStrainEpi7 (the longitudinal strain of the anterior wall of the epicardial basement segment) and CirStrainR4 (circumferential strain rate of the posterior wall of the basement segment) ( Table 4).
Among all the three ML algorithms (XGBoost, Light-GBM, and voting model), the discriminant ability of the voting model was similar to LightGBM (AUC of both was 0.86), while the AUC of XGBoost was slightly lower, which was 0.84 (Fig. 3).
In the fivefold cross-validation, the mean AUC of Light-GBM, XGBoost, and voting models were 0.89 ± 0.19, 0.85 ± 0.43, and 0.87 ± 0.30, respectively. The voting model was globally more robust and outperformed to individual model on test sets (Fig. 4).

Discussions
ML combined with 2D-STE to carry out intelligent identification on CA and HCM discovered: that ML had a great performance in the differential diagnosis and could automatically integrate plentiful variables to identify the most discriminative predictors without preassumption.
CA is a rare and complex disease with high mortality and poor prognosis. Although new treatments which significantly improve the outcomes have been developed, the available management with advanced cardiac involvement is still very limited. In addition, clinical confusion about cardiac hypertrophy caused by other causes (e.g., HCM, hypertension, and aortic stenosis) often leads to delayed diagnosis, which prevents patients with CA from receiving an early and effective intervention. Echocardiography has become the preferred screening approach for patients with CA due to its wide application, low risk, low cost, convenience, and superior diastolic function assessment.

3
Therefore, many related studies had been done to distinguish CA from other causes of cardiac hypertrophy.
Cardiac deformation analysis of 2D-STE could reveal early systolic abnormalities. Early reports by Sun et al. [17] suggest that global longitudinal strain, global circumferential strain, and the global radial strain were significantly reduced in patients with advanced CA compared with HCM and hypertensive heart disease, and although there was some overlap between the groups, the three causes of cardiac hypertrophy could be distinguished to a certain extent. Di Bella et al. [18] displayed that the epicardial strain in patients with amyloid transthyretin was significantly lower than that in patients with HCM. Subsequently, Baccouche et al. [19] made use of 3-dimension speckle-tracking echocardiography to identify CA from HCM, presenting that most of the functional parameters in both groups were lower, while those in the CA group were the lowest. The radial strain of CA patients demonstrated the "reverse pattern" from base to apex, suggesting that two conditions could be distinguished based on functional patterns. Similarly, Phelan et al. [4] proposed that the "relative apical sparing" of longitudinal strain could well identify CA. Liu et al. [20] showed that septal apical to base ratio > 2.1 joint with deceleration time < 200 ms helps to differentiate CA from other causes of ventricular hypertrophy. In recent years, Pagourelias et al. [21] proposed that the ejection fraction strain ratio has the best CA differentiation effect (AUC 0.95; 95% CI 0.89-0.98). In the challenging subgroups (maximum wall thickness ≤ 16 mm and LVEF > 55%), ejection fraction strain ratio is still the best predictor of CA. Furthermore, Boldrini et al. [22] developed a scoring-based CA diagnostic model by analyzing morphological, functional, and strain-derived parameters. The results show that centripetal reconstruction and strain-derived parameters have the best diagnostic performance. The multivariate logistic regression model, which included relative wall thickness, E/e′, LS and tricuspid annular plane contraction deviation, had the best diagnostic effect on monoclonal immunoglobulin light chain amyloidosis (AUC 0.90; 95% CI 0.87-0.92).
The complexity of CA assessment has increased in terms of a large amount of data generated by supersonic inspection and increasing clinical variables. Traditional statistical analysis could only explore the relationships among limited variables and achieve a certain degree of predictive performance. However, in the era of big data, it is usually necessary to integrate abundant variables, which is a great challenge for clinicians. Therefore, we presented the study of differentiating CA and HCM based on ML. Combining clinical characteristics, routine echocardiography and 2D-STE data, support vector machine, random forest, and XGBoost manifested a favorable discriminative performance (AUC > 0.9). When based on 2D-STE data solely, the different gradient boosting models still performed well in the identification of CA patients. The voting model was more robust globally and superior to a single algorithm on some test sets. Although the difference in the AUC of ML was not statistically significant compared with the traditional logistic regression model (p > 0.05), it should be pointed out that this study was based on small sample data, and the performance of ML needs to be further discussed on larger data. Previously, Zhang et al. [23] employed ML to achieve automatic echocardiography interpretation. The algorithms can not only implement view recognition, image segmentation, structure, and function quantification but also realize automatic detection of CA, HCM, and pulmonary hypertension, which further reflects the effectiveness of ML. ML, a form of artificial intelligence that eliminates preassumptions, explores the unknown pattern with all useful data to avoid neglecting some important but not yet recognized predictors. Interestingly, ML also automatically identified traditional variables, such as ejection fraction, eccentricity index, and relative apical sparing, which further validates the potential  RadStrain3  1  BasalRotation3  11  LonStrainR10  21  CirStrainR8  31  CirStrainR11  41  LonStrainEpi7  2  LonStrainV8  12  PapiRotation6  22  PapiRotation2  32  GRSPAPI  42  CirStrainR4  3  LonStrainV13  13  LonStrainR1  23  CirStrainR16  33  LonStrainD2  43  LonStrain7  4  LonStrainEndo7  14  RadStrain4  24  GRSAPICAL  34  LonStrainEndo18  44  RadStrain2  5  LonStrainR7  15  PapiRotation3  25  CirStrain16  35  LonStrain13  45  RadStrain7  6  RadStrain8  16  ApicalRotationR4  26  CirStrainR13  36  CirStrainR17  46  LonStrainEpi16  7  LonStrainD9  17  LonStrainV14  27  CirStrainR5  37  LonStrain3  47  CirStrain10  8  RadStrainR12  18  LonStrainD10  28  RadStrain11  38  LonStrainR3  48  BasalRotation1  9  CirStrain4  19  LonStrainEpi15  29  CirStrain15  39  BasalRotation4  49  BasalRotation2  10  LonStrainEpi13  20  LonStrainV7  30  LonStrainD13  40  BasalRotationR3  50 scalability and practicability. In addition, unexpected interactions between several weaker predictors would not be overlooked. ML will not replace traditional statistical analysis, conversely, it provides a supplement and extension [24]. For rapidly growing data, ML explores non-linear patterns and automatically extracts important variables, thus simplifying feature selection and improving prediction, and facilitating the differentiation of similar phenotypic diseases. Also, ML seamlessly incorporates new data to continually update models and promote performance over time. Beyond that, ML possesses efficiency as it runs complex mathematical algorithms, such as gradient boosting, in a few seconds and produces easy-to-understand results with low variability and high accuracy.

Limitations of the study
There are some limitations to this study. First of all, the establishment of ML models was carried out in a small number of samples, and further validation needs to be conducted in a larger dataset. In addition, considering the imbalance of data among different areas, our study was single-center with certain specific population characteristics, further research needs to be trained and verified in multiple centers and regions to improve the generality of the models. Finally, our model was only evaluated in two-dimensional echocardiography with limited time and space, and further studies could include more ultrasonic sections or implement the models by other imaging methods. With the increment of samples, deep learning may improve the prediction of the models.

Conclusions
CA has a poor prognosis, and high mortality and is often misdiagnosed as HCM, leading to delayed diagnosis. If CA can be identified early and provided effective intervention in time, it is beneficial to improve the outcome for patients. We proposed intelligent identification of CA from HCM based on ML using 2D-STE data. The results indicated that the ML models had great discriminative performance, and could automatically integrate vast variables without making any preassumption, so as to identify the most important predictors. In the era of big data, automated ML will help Fig. 3 The ROC curve of different gradient boosting models. Among all the three ML algorithms (XGBoost, LightGBM and voting model), the discriminant ability of voting model was similar to LightGBM (AUC of both were 0.86), while the AUC of XGBoost was slightly lower, which was 0.84. A XGBoost; B LightGBM; C Voting model to identify patients with CA, so that timely and effective intervention can be carried out to improve the prognosis.