Ethics and governance approval
The study was undertaken at the NHS Golden Jubilee Hospital, a tertiary cardiothoracic centre that provides regional CT imaging for a population of over two million patients in the west of Scotland. This study was conducted in accordance with guidance from the UK Policy Framework for Health and Social Care Research. The project was approved by Yorkshire and The Humber – Leeds East Research Ethics Committee (REC Ref 21/YH/0217).
Scan selection
Using the Picture Archiving and Communication System (PACS) in our institution, we identified CT scans between March 2022 and March 2023 that included the heart in the field of view. The clinical indication for each scan was recorded from the electronic care record together with patient demographics (age and sex). The scan field of view and use of contrast were also recorded. All scans were anonymised and the DICOM file transferred to a remote workstation. Details of additional scan parameters, including CT reconstruction kernel, x-ray tube voltage, and reconstructed slice thickness, were noted from the relevant DICOM tags. When multiple scans were available for a single patient, we chose the thinnest-slice scan (for greater ground-truth annotation accuracy) with the softest reconstruction kernel (for minimum noise), and the patient in a supine position.
Ground Truth Annotations
Following curation, scans were uploaded to a dedicated annotation platform with a custom workflow and image annotation tool which enabled automatic and smooth transition of cases between curator, annotator, reviewer, and quality assurer. Three clinicians (R.H., C.B., R.G.), each with more than two years of experience annotating coronary calcification on CT scans, underwent training on both the annotation protocol and annotation platform, on at least 15 datasets each. The remaining scans were then randomly assigned to each of the three. The most senior of the three annotators had more than ten years of Cardiac CT reporting experience (R.G.) and acted as reviewer. Annotators and reviewer were blinded to radiology reports, patient details such as medical history, scan indication, and scan parameters during the annotation process.
Annotators identified calcium in seven defined regions of interest: the left main stem artery (LMS), left anterior descending artery (LAD), left circumflex artery (Cx), right coronary artery (RCA), aortic valve (AV), aortic root (AR, assumed below the sinotubular junction), and the mitral valve annulus (MV). Annotation was performed manually on axial slices using a threshold-assisted brush. No maximum threshold was advised but rather annotators gradually increased the lower threshold until there were few highlighted pixels within the main chambers (atria and ventricles) to minimise annotation of contrast in the vasculature.
Annotators additionally assessed for the presence of contrast in the scan (yes/no), whether the scan was of sufficient quality for interpretation, and the presence of any artefacts which may affect scan interpretation, e.g., motion, permanent pacemakers (including biventricular pacemaker), valve replacement, central venous lines, or evidence of previous cardiac surgery (including coronary artery bypass grafting). The senior annotator reviewed a random 40% of the image annotations to confirm consistency. Quality assurance was conducted on all cases by S.S.M and S.M. to ensure completeness of the data and compliance to the annotation protocol. Annotations which contained overlapping labels for a given voxel were merged by assigning voxels to labels in the following order of priority: LMS, LAD, RCA, Cx, AR, AV, MA. This ordering was consistent across all patients.
Scan summary data
An unselected series of 320 CT scans, acquired between March 2022 and March 2023, with the heart in the field of view (FOV) were identified, anonymised, and downloaded to the annotation platform. A total of 17 scans were excluded during curation (Fig. 7) due to unsuitable acquisition series, unusable reconstructions, or incomplete coverage of the heart, which made manual annotations of scans difficult. The remaining 303 scans were assigned for annotation, of which annotators excluded a further 4 scans due to severe artefact and/or extensive non-coronary calcification (Fig. 8). The reviewer excluded a further two scans: one scan with situs inversus and one scan with a biventricular pacemaker lead. Two further scans were excluded during the quality assurance (QA) process due to ambiguity around the presence of contrast and severe artefact, leaving a total of 295 scans available for development and testing. All exclusion criteria were applied before model training and testing. Our final cohort represents a wide variety of scan quality, imaging artefacts, and foreign bodies (Supplementary Fig. 3, Supplementary Table 1). Supplementary Table 2 details the various indications for the scans.
Scans were divided into training and validation sets (214 scans) and a testing set (81 scans) (Table 1). Scans were primarily acquired from two scanners, a Canon Medical Systems Aquilion ONE scanner, and a Siemens SOMATOM go Top scanner. A single scan was acquired from a GE Medical Systems Revolution EVO scanner. Scans were reconstructed using soft and sharp kernels, including the soft-tissue Br40f and Standard kernels, as well as sharper Lung and Body Sharp kernels. This indicates the variety in the dataset which may be encountered in incidental findings.
We used stratified random sampling based on the presence of calcium in specific coronary territories to assign patients to the training, validation, and testing splits, thus ensuring territory classes were balanced. We initially split the dataset between training and testing, then repeated the procedure in the training split to create three folds for cross-validation, resulting in three folds of the same dataset with 143 training and 71 validation scans each, and 81 testing scans.
Presence of contrast was indicated on the radiology reports of 221/295 (75%) scans. To understand the effect of contrast agent on CAC detection, we estimated the levels of contrast in cardiac, vascular, and the solid organ components of the scan. We used the open-source TotalSegmentator tool 56 to automatically segment the spleen, liver, aorta, and main pulmonary artery. The median HU values in these regions were used as a surrogate for contrast phase as follows. Early-stage contrast scans can be identified by higher HU values in the aorta and pulmonary arteries than in the spleen and liver (early arterial phase). As contrast diffuses through the vascular system, the liver and spleen parenchyma become brighter. Thus, later stage contrast scans can be recognised by higher HU values in these organs as well (portal venous phase).
The variation in brightness of these predefined elements in our scans is shown in Fig. 9. We estimate the majority (199/295, 67%) of our contrast scans were acquired during the portal venous phase, as we observe enhanced attenuation in both the liver and spleen, and the aorta and pulmonary artery, compared to the non-contrast baseline (dashed lines, Fig. 9). A small number of scans (8/295, 2%) were indicated as contrast scans but presented without enhancement in the organs or arteries considered; we identified these as delayed contrast scans. We identified the remaining contrast scans as early arterial phase (14/295, 4.7%), as they showed enhanced attenuation in the aorta and pulmonary artery over non-contrast but did not show significant enhancement in the liver and spleen.
Importantly, we observe a wide variety of attenuation in the coronary region, which presents challenges for the detection of coronary calcification. Each scan will have varying thresholds at which calcium is significantly differentiated from coronary vessels and surrounding tissue. This makes manual annotation difficult and time-consuming, particularly early-stage contrast scans where calcium may not be differentiated from contrast-enhanced vessels. This highlights the value of an automated technique for detecting calcium that is robust to contrast levels.
Manual annotation results
Figure 10 illustrates the distribution of calcification across coronary territories in our cohort. No coronary calcification was found in 90/295 (30%) scans. Of these, 80/295 (27%) showed no evidence of any calcification, including non-coronary calcification. The remaining 10/295 (11%) had only non-coronary calcification. Of the 205 scans with coronary artery calcification, the most common artery was the LAD (191/295, 64%), followed by the RCA (135/295, 45%), the circumflex artery (125/295, 42%), and the LMS (102/295, 34%). Aortic root calcification was the most common non-coronary calcified region (133/295, 45%), followed by the aortic valve (76/295, 25%), and the mitral annular (58/295, 19%).
To assess annotator agreement, we assigned 18/295 (6%) cases to annotators for inter-observer variability, and 30/295 (10%) for intra-observer variability. Annotators were blinded to previous attempts and followed the same annotation procedure for both attempts. For intra-observer attempts, we ensured at least a three-month gap between an annotator’s first and second attempt at annotating a case.
Artificial Intelligence Algorithm Development
CT scans acquired for non-coronary indications can have wide FOVs, including the neck and pelvis. To reduce the unnecessarily large FOV we crop the images to the heart region in the axial-plane using the TotalSegmentator 56 tool. This significantly improves efficiency and reduces the time for training the model.
Our algorithm uses two CNNs successively in stages, processing the CT volume in 3D patches. The first stage CNN produces a segmentation of calcium, including coronary and non-coronary territories. The second stage CNN refines the output from the first stage network and classifies each calcium voxel to its coronary territory (LMS, LAD, Cx, RCA, or non-coronary arteries AR, AV, and MA). While there is evidence for the prognostic value of calcification in non-coronary structures 57,58, our primary use for detecting calcification in these regions was to help eliminate the possibility of the model including confusing CAC for non-coronary artery calcification. We reduce false-positive predictions by post-processing model predictions. Firstly, we used an in-house multi-atlas heart segmentation tool to generate heart masks to reduce false positives, by excluding any voxels predicted by the models outside of the heart mask. Secondly, we use 3D connected-component labelling and discard lesions smaller than 3mm3. We tuned the minimum voxel volume size on the validation sets. Finally, whole-volume CAC segmentations are obtained by stitching adjacent 3D model output patches.
Algorithm Architecture and Training
We used an encoder-decoder based U-Net 59 architecture for both the first and second CNN stages (Fig. 12). The encoder layers consist of two convolutional blocks which down-sample the input. The decoder layers consist of two convolutional blocks followed by a transposed convolution which use skip-connections from the encoder layers at the corresponding levels. Input CT scans are intensity windowed (window level and width of 200 and 1200 HU respectively), and then cropped to the coronary region. During training we randomly sample input with and without calcium present, and augment patches using random rotation, and random adjustment of voxel intensity values. Both models used early stopping and the Adam optimizer with β1 = 0.9 and β2 = 0.999, ε = 1e− 8, and a learning rate of 3x10− 3.
Coronary calcium is very sparse. To prevent the model simply predicting no calcium, we use a combination of Dice and focal loss 60 to downregulate the contribution of the background voxels. The Dice loss was weighted by 0.5. For the focal loss hyperparameters, we used a \(\gamma\)of 2 for both stages. The first stage model was trained with \(\alpha\)= 6900, and the second stage with \(\alpha\)= [1.5, 1800, 1400, 1600, 1400, 1400], for background, LMS, LAD, Cx, RCA, and non-coronary classes, respectively. We calculated the per-class inverse voxel frequency as the weighting term for each class. Final model predictions during inference were generated by optimising the models for each cross-validation fold, and ensembling the best performing models by voxel-wise probability averaging.
Statistics
All analysis was done in Python (Version 3.9.15) using numpy (Version 1.23.5). Bland-Altman 61 means and 95% limits of agreement were used to evaluate per-vessel and total coronary agreement between annotators, and between the automated model and annotators. Spearman’s rank correlation or Mann-Whitney U tests were used to compare continuous data using the scipy package (Version 1.9.3). Cohen’s Kappa was used to evaluate agreement in classification of patients in the five volume risk categories also using the scipy package. Cohen’s Kappa was interpreted as: (< 0.00): poor; (0.00–0.20): slight; (0.21–0.40): fair; (0.41–0.60): moderate; (0.61–0.80): substantial; (0.81–1.00): almost perfect agreement 62. Volume score classification confusion matrices were created using scikit-learn (Version 1.1.3). We report the voxel-based F1-Score, positive predictive value by averaging over all voxels for a given scan, for all coronary labels. Voxel based metrics were calculated using the MONAI package (Version 1.1.0). All statistical tests were two-sided. Statistical significance was defined as a p value of < 0.05. Intraclass Coefficients were calculated using the Pingouin package (Version 0.5.3).