The baseline characteristics of cohorts.
The development and validation cohorts included records from 58,056 and 19,743 patients, respectively, and the characteristics and laboratory results are shown in the Table 1. Patients in the validation cohort were significantly older, had more comorbidities, had an impaired estimated glomerular filtration rate, impaired alanine aminotransferase, lower TnI, higher glucose and low-density lipoprotein cholesterol than those in the development cohort. The development/validation cohorts consisted of 860/191, 559/138, and 109,904/30,432 ECGs from STEMI, NSTEMI, and not-MI, respectively. The LAD and RCA were the most commonly identified IRA in STEMI. Patients with STEMI were more likely to be male, more overweight, had more prior coronary artery disease (CAD), had higher TnI, and more impaired lipid profiles than those in the not-MI group. Patients with NSTEMI were more likely to be male, older, had more prior CAD, and comorbidities, higher cardiac biomarkers, and more impaired lipid profiles than those in the not-MI group.
Prediction of STEMI, MI and not-MI
The results of the human-machine competition were summarized in Figure 1. The AUC of DLM in the human-machine competition involving 450 ECGs were 0.976 and 0.944 for the detection of STEMI and the discrimination of MI and not-MI, respectively. The corresponding sensitivities and specificities for STEMI/MI detection were 89.7%/83.7% and 94.6%/95.7%, respectively. By contrast, the sensitivities and specificities for STEMI/MI detection among human experts ranged from 57.7-93.1%/46.5-99.4% and 42.9-95.6%/5.8-97.8%, respectively, which were lower than those of DLM. We further reweighted the samples via the proportion of the hypothetical real-world settings, and the AUC of DLM in the detection of STEMI and MI were 0.995 and 0.916, respectively in the hypothetical real world. There were also no experts who were better than DLM in this setting. The precision-recall ROC (PRROC) curve analysis demonstrated the feasibility in an automatic ECG screening system, which revealed that the AUC of DLM for STEMI and MI detection were 0.586 and 0.300, respectively, in the hypothetical real world. DLM achieved 63.2% precision and 50.3% recall using the appropriate cutoff point; these values were significantly better than those of all participants in the discrimination of STEMI and not-STEMI.
Consistency analysis of the human experts and DLM and their performance rankings in the human-machine competition were conducted (Figure 2). DLM achieved the best global performance (kappa = 0.645) (Figure 2A). Intriguingly, among 6 medical students, one (M6) had the best performance (kappa = 0.438) owing to superior not-MI interpretation. Among 5 residents, one (resident in the ED) had the poorest performance (kappa = 0.258) owing to overdiagnosis of not-MI as MI ECGs. Most visiting staff had relatively good detection of STEMI but poor discrimination of NSTEMI and not-MI. The capacities of MI detection were divided into three clusters as shown in the heat map: visiting staff, residents and medical students (Figure 2B).
The analysis of infarct related artery of STEMI
DLM also achieved the best global performance (kappa = 0.629) in the IRA of STEMI (Additional file 1: Figure S1). DLM achieved the best global performance (kappa = 0.629) for the IRA detection of STEMI. Both the LAD and RCA were easily detected by the DLM and clinicians. The LCx had troublesome interpretation. The LMCA was only correctly detected by medical students.
Consistency assessments of MI ECGs
Selected STEMI ECGs in the human-machine competition were shown in Figure 3. A typical ECG of STEMI was consistently detected as STEMI with an IRA of the LAD by both DLM and the clinicians (Figure 3 Case A). A total of 10 ECGs were detected as not-STEMI by DLM. Five of ten misdiagnosed by DLM were correctly recognized as STEMI by the best cardiologists (Figure 3 Case B), and the remainder were misdiagnosed by both DLM and the best cardiologists (Figure 3 Case C). DLM could identify ECGs as STEMI that expert cardiologists had misdiagnosed (Figure 3 Case D). Among 138 NSTEMI ECGs in the human-machine competition, 58 cases were detected as not-MI by DLM, with an accuracy of 58.0%, which was worse than the 75.4% accuracy of the best cardiologists. This was due to a more conservative MI diagnostic strategy by DLM. The specificity of 96.4% of DLM in 138 not-MI cases was much better than that of 82.6% and 64.5% of the two best cardiologists. After adjustment of the specificity, the misdiagnosis of NSTEMI cases by DLM was obviously less than that by cardiologists (Table 2). Nevertheless, DLM still offered the best performance in the detection of MI by ECG under the standardization of the best cardiologists.
ECG lead-specific analysis
ECG leads were specifically analyzed for the detection of STEMI and MI in the hypothetical real world (Additional file 1: Figure S2). Lead III, V2, aVL, and V3 demonstrated better performance than other leads for the detection of STEMI, with the AUC of 0.913, 0.913, 0.911, and 0.908, respectively. For the detection of MI, V4, Lead I, and V3 demonstrated better performance, with the AUC of 0.841, 0.825 and 0.825, respectively. Lead-specific PRROC curve on the detection of MI and STEMI in the hypothetical real world (Additional file 1: Figure S3), and on the IRA of STEMI (Additional file 1: Figure S4) were analyzed. Lead-specific PRROC curve analysis demonstrated the best performance for the detection STEMI with the AUC of 0.300 on aVL. Moreover, lead-specific PRROC curve analysis on the IRA of STEMI demonstrated the best performance for the LAD with the AUC of 0.970, 0.955, and 0.953 on V4, V2, and V3, respectively; that for the RCA yielded the AUC of 0.995, 0.978, and 0.966 on aVL, Lead III, and aVF, respectively.
Logistic regression analysis of MI, STEMI, and NSTEMI
The univariate and multivariate logistic regression analyses in the development cohort revealed that male, prior CAD, troponin I, hemoglobin, total cholesterol and low density lipoprotein were independent risk factors for the detection of MI, STEMI and NSTEMI (Additional file 1: Figure S5).
Diagnostic value analysis
We evaluated the algorithm performance after adjusting for significant patient characteristics, disease histories, and laboratory results to ensure consistency across a wide range of putative confounding variables in the validation cohort. DLM had significantly better performance than the use of troponin I alone to detect STEMI with the AUC of 0.996. The corresponding sensitivity and specificity are 98.4% and 96.9%, respectively. However, the use of troponin I alone had significantly better performance than DLM to detect NSTEMI. The AUC of combined DLM and the first recorded TnI for the detection of NSTEMI were increased to 0.978, with the corresponding sensitivity and specificity are 91.6% and 96.7%, respectively, which was better than that of DLM (0.877) or TnI (0.949) alone (Figure 4). It is enough to detect STEMI using the DLM alone, and the addition of patient characteristics did not significantly improve the performance. However, troponin I was found to improve the diagnostic accuracy for NSTEMI, and the improvement was better than the combination of all additional characteristics (Additional file 1: Figure S6).