2.1 Multi-stage Neural Network Design
We present a multi-stage neural network for 3D reconstruction of coronary trees. The input to the network is two or three binarized angiograms for a given coronary tree, with a Euclidean distance transform applied to create a smooth field that implicitly encodes vessel radii (see Fig. 1). In this work, binarized angiograms were directly created by our synthetic data generator (Section 2.2). For clinical angiograms, we have previously developed a vessel segmentation neural network known as AngioNet [19] which can convert the desired clinical images into binarized angiograms. Since no angle or distance information is provided to the neural network as an input, image calibration or parameter correction algorithms are not required.
The multi-stage neural network was designed to reconstruct both vessel centerlines (stage 1) and radii (stage 2) for each branch in the coronary tree. The centerline and radius stages both employed a convolutional neural network backbone to learn relevant features of the coronary tree from the input images. In this work, we chose to use ResNet101 [20] as the backbone network. While the convolutional layers can learn image-based features relevant to the vessel geometry, a multilayer perceptron (MLP) is better suited to solve the regression problems of identifying the 3D coordinates of the vessel centerline and corresponding radii. Therefore, we replaced the final layer of the backbone network with separate MLPs for the centerline and radius stages, as follows.
In the centerline stage, the final fully connected layer of the backbone network was replaced by a multilayer perceptron (MLP) with ReLU activation and batch normalization between layers. The MLP was composed of 4 hidden layers, where the first 3 layers had 1024 neurons and the last layer had 512 neurons. The output of the centerline MLP was a \(M*N*3\) linear layer, containing \(N\) centerline points for each of the \(M\) branches in the binarized angiogram. This output vector was reshaped into a matrix before computing the loss.
Meanwhile, the radius stage replaced the final layer of the backbone with a separate MLP for each branch, for a total of \(M\) MLPs. The radius MLPs were composed of 3 hidden layers with 128 neurons each, with batch normalization and ReLU activation between each hidden layer. The output of each MLP was a vector of radii of dimension \(N\). The MLP for each branch was trained separately to improve the network’s ability to capture sudden reductions in vessel radii at regions of stenosis. Without this step, stenoses are likely to be overlooked since they make up a small portion of the points in the coronary tree.
The outputs of both stages were concatenated to form an \(M\times N\times 4\) matrix, where the last dimension encodes the 3D centerline coordinate and radius for each point as \((x,y,z,r)\).
The radius stage was trained using the mean squared error as the loss function:
$$\begin{array}{c}\frac{1}{n}\sum _{i=0}^{n}{\left({y}_{i}- {\widehat{y}}_{i}\right)}^{2} \#\left(1\right)\end{array}$$
where \(y\) is the ground truth and \(\widehat{y}\) is the neural network prediction. Conversely, the centerline stage was trained using the same loss function with an additional length regularization term:
$$\begin{array}{c}\frac{1}{n}\sum _{i=0}^{n}{\left({y}_{i}- {\widehat{y}}_{i}\right)}^{2}+ \lambda \sum {S}_{y}- {S}_{\widehat{y}}\#\left(2\right)\end{array}$$
where \(\lambda\) is the regularization rate, \({S}_{y}\) is the arclength of the ground truth branches, and \({S}_{\widehat{y}}\) is the arclength of the predicted branches. This vessel length regularization term was included because vessel length is an important determinant of the pressure gradient through a vessel, an important indicator of disease severity.
An ADAM optimizer with learning rate 5e-4 and weight decay (L2) regularization was used to train both stages. Batch size was set to 8 and the multi-stage network was trained for 300 epochs. For the radius MLPs, the backbone was frozen after the initial 300 epochs and each branch MLP was trained for an additional 50 epochs. We did not observe a notable improvement in accuracy when retraining the backbone network for the centerline and radius tasks.
We now present the single-stage counterpart to our multi-stage network for comparison. The single-stage network architecture was composed of the backbone network and a single MLP which outputs an \(M\times N\times 4\)matrix containing both the centerlines and their radii (Fig. 1). The loss function of the single-stage network was a weighted MSE loss:
$$\begin{array}{c}\frac{1}{n}\sum _{i=0}^{n}{\left({y}_{i}- {\widehat{y}}_{i}\right)}^{2}+{\mu ({r}_{i}-\widehat{{r}_{i}})}^{2} \#\left(3\right)\end{array}$$
Here, \(y\) and \(\widehat{y}\) represent the ground truth and predicted centerline coordinates while \(r\) and \(\widehat{r}\) represent the radii along the centerlines. The regularization parameter \(\mu\) was chosen such that the centerline and radius terms were of the same order of magnitude. The single-stage network was trained using the same hyperparameters as the multi-stage network to make a fair comparison: ADAM optimizer with learning rate 5e-4, L2 regularization, and a batch size of 8.
2.2 Synthetic Dataset Generation
To train the proposed multi-staged neural network, we require hundreds or thousands of ground truth 3D coronary trees and their corresponding segmented 2D angiograms. In practice, this means that we must identify thousands of patients with both 3D CTA data and 2D X-ray angiograms, which is typically not feasible in single-center studies such as ours. Another challenge of using clinical image data as input is that the coronaries deform in each frame of an X-ray angiography series due to the contraction of the heart. This necessitates temporal registration of frames from multiple angiographic series in order to create a valid set of input images for 3D coronary tree reconstruction. To produce a large enough dataset and eliminate external sources of error such as temporal registration, we devised a method to produce a sufficiently large training dataset consisting of 5,000 static 3D coronary tree geometries and their corresponding sets of 2D projections. While we have used synthetic data to train and validate our 3D reconstruction network, the use of synthetic projection images as input does not preclude future clinical application. A segmentation algorithm or neural network such as AngioNet [19] could be used to convert clinical angiograms obtained during routine patient care in the future into a suitable input for our network.
This work focuses on 3D reconstruction of the right coronary tree as the large anatomical variation in the left coronary arteries [21] makes reconstruction more challenging. The method includes two steps: 1) a synthetic 3D coronary tree generator, and 2) a projection algorithm to create sets of segmented angiograms. A brief description of these steps is provided next.
Step 1: Synthetic 3D coronary tree generator: Fig. 2 provides an overview of the coronary tree generator. A distribution of patient-specific centerlines was obtained from 10 CTAs for the four main branches of the right coronary tree, namely the right main coronary artery (RCA), sino-atrial node branch (SA), acute marginal branch (AM), and posterior-descending artery (PDA). The posterolateral ventricular branch (PLV) is implicitly included as part of the RCA, which bifurcates into the PDA and PLV branches. These data were used to identify a distribution of controls points and their standard deviations in 3D space for each branch of the coronary tree (see Fig. 2A). From this distribution, new vessels can be generated via uniform random sampling. Linearly tapering radius was assigned to each branch, and stenoses with a gaussian profile were randomly introduced (Fig. 2B). Branches were combined into a tree and augmented with random rotation, shear, and/or warping (Fig. 2C). This algorithm was refined through repeated iterations with a board-certified interventional cardiologist to generate realistic trees. Further details of the clinical and mathematical assumptions used to inform synthetic data generation are given in Appendix A.
Step 2: Projection algorithm: Cone-beam projections of each coronary tree were generated from 5 views to mimic the X-ray angiogram acquisition process. Image acquisition angles were randomly sampled from 20-degree windows around commonly used clinical values. Out of the 5 views, 3 were randomly chosen for training (Fig. 3). A Euclidean distance transform was applied to the projection images, which were then input into the neural network. The data generated in this fashion were split into 4,500 coronary trees and their corresponding projections for training and 500 for validation (90 − 10 training split).