2.1 Principle of the CardiacField
The core of the CardiacField is an implicit neural representation network optimized by the physical-informed loss function, which maps the input 3D coordinate (\(x,y,z\)) of a cardiac volume to its corresponding intensity value. Through self-supervised training with a sequence of 2D echocardiographic images easily collected by a commodity 2DE probe in clinics, a continuous and smooth 3D cardiac volume can be reconstructed by querying the trained network model with the corresponding coordinate of the denser grid (Fig. 1a). The specific steps of the CardiacField can be summarized as follows.
First, we construct the implicit neural representation of a digitalized 3D heart using a multilayer perceptron (MLP) with a multiresolution hash-table to build the mapping function between the coordinates and the associated intensities. The MLP is a fully connected deep network, which can approximate any continuous functions19. The use of a multiresolution hash-table which has been verified to perform well in characterizing high-frequency components in signal representation19, enables the MLP to better represent the realistic spatial details in four chambers, e.g., trabecular muscles.
Second, we reconstruct the 3D cardiac volume just using a series of 2D echocardiographic images. Since each 2D echocardiographic image represents a certain cross-sectional view of the 3D cardiac volume, the physical imaging process of a 2D echocardiographic image can be modeled as the process of virtually slicing a 2D plane from the 3D volume (i.e., 2D-Slicing). We define the slicing position and angle that correspond to the location and orientation of the 2DE probe, by a positional parameter \(\mathcal{B}\). Through the use of a roughly estimated \(\mathcal{B}\) as the initial input20, virtual 2D echocardiographic images can be generated by applying the aforementioned ‘2D-Slicing’ upon the CardiacField. We measure the physical-informed loss by comparing the ‘difference’ between virtually-sliced and corresponding real-captured 2D echocardiographic images and then use it to guide the training of parameters in MLP, multiresolution hash-table, and \(\mathcal{B}\) jointly through the back-propagation optimization. By joint optimization, our method can accurately retrieve positional parameters \(\mathcal{B}\), getting rid of additional trackers for collecting these parameters. More details of the CardiacField are explained in the Methods.
Our method does not rely on large-scale datasets for network model training, and each patient can reconstruct his/her CardiacField using about 200 2D echocardiographic images collected directly from a simple apical ring scan in the clinic (Fig. 1b and 1c). Once the CardiacField is reconstructed, we can easily integrate it with existing deep learning models for further cardiac function assessment. In this work, we realize the accurate and automatic segmentation of LV and RV areas by borrowing the segmentation model proposed in EchoNet15, where the model performance and generalization had been cross-verified using 10,030 echocardiogram videos from different healthcare systems. Specifically, in Fig. 1e, we perform the uniform sampling on the 3D cardiac volume to generate about 20–30, 3\(mm\)-thick 2D slices parallel to the apical four-chamber view and use the segmentation model mentioned above to classify the regions of the LV and RV to calculate the cavity areas from one slice to another. By accumulating the cavity areas of all slices, we can get the volume size precisely and thus use it to accurately estimate EF values. Details of the segmentation model and EF calculation are further described in the Methods.
2.2 Evaluation of Model Performance
To quantitatively evaluate the performance of the proposed CardiacField, we build the first dataset containing paired 2D/3D echocardiographic images, LV/RV volumes, and EFs (acquired by PHILIPS EPIQ 7C machine with S5-1 2DE probe and X5-1 3DE probe). The ground-truth 3D echocardiographic images are directly recorded using the 3DE probe. After the manual calibrations by experienced sonographers, the corresponding LV/RV volumes and EFs are calculated using the 3D ultrasound machine. The 2D echocardiographic images are captured by rotating the 2DE probe in 360 degrees around the apex of the heart. The sequence of 2D echocardiographic images of the same patient is synchronized based on concurrently recorded electrocardiogram (ECG) signals which is then used to train the CardiacField, as shown in Fig. 1b.
We use CardiacField to reconstruct the 3D heart with real-captured 2D echocardiographic images, which can successfully recover the realistic 3D cardiac structure, as shown in Fig. 2a. Compared with the 2DE probe, the 3DE probe suffers from some physical limitations. For instance, the electrical impedance mismatch of the transducers21, 22 and the interference of narrow intercostal space to the emitted ultrasound23 of the 3DE probe both lead to undesired artifacts and inferior resolution. Leveraging high-resolution images provided by the 2DE probe, our CardiacField can reconstruct 3D cardiac volumes with more spatial details than those rendered by a 3DE probe. In Fig. 2b, we compare the same cross-sectional views (apical four-chamber view) between CardiacField reconstructed hearts and the 3D probe captured ones for 10 independent patients. Specifically, in Fig. 2b1-2b5, the CardiacField exhibits enhanced resolution and better contrast than the 3DE probe. In Fig. 2b6, the left atrium (LA) imaged by the 3DE probe is filled with noise, but our method recovers detailed structures of the LA. The same situation can also be observed in Fig. 2b7-2b9, where our method is superior to the 3DE probe in imaging the RV and right atrium (RA) areas. In Fig. 2b10, our method can clearly distinguish the implanted artificial cardiac pacemaker of the patient (as labeled by yellow arrows), which cannot be observed in the image produced by the 3DE probe. We further reconstruct a 3D dynamic heart within one cardiac cycle using the CardiacField, which is also compared with that acquired by the 3DE probe (please see the Supplementary Video 1). Some snapshots of the 3D heart within one cardiac cycle are demonstrated in Fig. 3.
2.3 Volume Segmentation and Analysis of LVEF and RVEF
The EF describes the ratio of changes in the end-systolic volume (ESV) and end-diastolic volume (EDV) of the cardiac chambers and is one of the significant metrics to quantify the cardiac function24. LVEF represents the systolic function of the left ventricular, which is widely used in cardiovascular disease evaluation but suffers from measurement variability and inaccuracies16, 25, 26. RVEF quantifies the right heart, also providing crucial values in predicting major adverse of cardiovascular events27. Using 2D echocardiographic images to calculate the volume of LV is moderately inaccurate since the calculation relies on the assumption that the LV shape is close to a prolate ellipsoid16. Furthermore, due to the complex geometry of RV, it is not recommended to calculate the volume of RV using 2D echocardiographic images16, 17. Although more accurate volumes and EFs of LV and RV can be calculated using the 3D ultrasound machine, it requires extensive calibrations from experienced sonographers due to the inferior resolution details16.
Based on the high-fidelity 3D heart reconstructed by the CardiacField, here we use a classical segmentation method15 to automatically classify the LV/RV volumes (details can be found in Methods) and calculate the EFs of 56 patients, where the results are shown in Fig. 2c and 2d. According to the guidelines of the American Society of Echocardiography and the European Association of Cardiovascular Imaging16, we use the EFs obtained by the 3D ultrasound machine (after calibration by the experienced sonographers) as the ground truth. We also compare our results with EchoNet15, which is the first 2D echocardiogram video-based end-to-end deep learning model to estimate LVEF and has demonstrated a more stable LVEF prediction ability than the standard clinic examinations (by the experienced sonographers based on 2D images). In Fig. 2c, the LVEF predicted by our method has a mean absolute error (MAE) of 2.8% and a root mean squared error (RMSE) of 3.8%, while the prediction by EchoNet15 has an MAE of 4.2% and an RMSE of 5.5%. The LVEF predicted by our method offers about 33% and 31% reduction in MAE and RMSE metrics over EchoNet15 respectively, suggesting that our method on LVEF prediction is more accurate. Furthermore, we calculate the RVEF, which is not available by the EchoNet15, with an MAE of 3.6% and an RMSE of 4.7% (Fig. 2d). As evidenced by the minor errors of LVEF and RVEF, our method could provide reliable assessment for further cardiac diagnosis and treatment. The comparison of volume segmentation and area calculation are presented in the Supplementary Table 1.
2.4 Generalization to Different Ultrasound Machines
To evaluate the reliability of the CardiacField on different ultrasound machines, we recruit another five volunteers to conduct generalization experiments using two widely-used but different 2D ultrasound machines, i.e., PHILIPS and SIEMENS. PHILIPS is the anchor machine that is primarily used and SIEMENS is for generalization verification. We apply the CardiacField to respectively reconstruct the 3D heart using 2D echocardiographic images captured by these two machines. We then calculate the LVEF and RVEF for the same person using reconstructed 3D hearts from two machines, as shown in Fig. 2e. The LVEF predicted by our method on PHILIPS has an MAE of 1.7% and an RMSE of 1.9%, while the prediction on SIEMENS has an MAE of 1.9% and an RMSE of 2.2%. Additionally, the RVEF prediction on PHILIPS has an MAE of 1.1% and an RMSE of 1.4%, while the prediction on SIEMENS has an MAE of 1.6% and an RMSE of 2.1%. The variations of LVEF and RVEF on different machines are less than 1%. The EF error within 5% between different measurements might be considered clinically acceptable26, 28, as a result, this experiment verifies the reliability of the CardiacField for LV/RV EF measurements on different machines. The volume comparison is presented in the Supplementary Table 2.
2.5 Analysis of 3D Reconstruction
In this section, we evaluate the efficiency of CardiacField for its positional parameter \(\mathcal{B}\) estimation, 3D hearts reconstruction and ‘continuous-slicing’ functionality (to generate continuously cross-sectional views of the 3D heart). The ground truth data includes the real-captured 3D heart by the 3DE probe and 90 different positional parameters \({\mathcal{B}}_{0}\). We then slice 90 2D echocardiographic images using \({\mathcal{B}}_{0}\) from the 3D heart to form ‘Input Views’ (Fig. 4c). Note that \({\mathcal{B}}_{0}\) is unknown for CardiacField reconstruction.
For accurate positional parameter \(\mathcal{B}\) estimation, we first utilize the PlaneInVol20 for the initial estimation and then feed the 90 ‘Input Views’ into CardiacField for joint 3D heart reconstruction and positional parameters optimization. Since the \(\mathcal{B}\) can be decomposed into a rotation matrix and a translation vector (details can be found in the Methods), we compare the difference of rotation angles and translation distances between those optimized by CardiacField and the ground truth \({\mathcal{B}}_{0}\) to evaluate the accuracy. The CardiacField finally achieves accurate recovery of positional parameter \(\mathcal{B}\) with only averaged \(0.067\pm 0.024\)-degree error in rotation angles and \(0.210\pm 0.143\)-\(cm\) error in translation distances over 56 hearts, leading to about 81% and 67% error reduction respectively over initial positional parameters. Additionally, we visualize the initialized, refined and ground truth positional parameter trajectories of one heart in Fig. 4a, respectively.
With accurate positional parameters estimation, CardiacField can recover high-fidelity 3D heart (Fig. 4b). We also reuse the sliced 90 ‘Input Views’ and the refined positional parameters by CardiacField to reconstruct another 3D heart via the conventional interpolation method (i.e., interpolating the intensity at each voxel grid by weighted averaging 20 nearest pixels) for comparison. Quantitative measurements are evaluated against the real-captured 3D heart using the 3DE probe. Compared with the interpolation method with a Peak Signal-to-Noise Ratio (PSNR) of \(21.572\pm 1.923\)dB (Fig. 4b), the CardiacField achieves a PSNR of \(27.452\pm 2.013\)dB, demonstrating that the CardiacField can reconstruct the 3D heart reconstructed closely to the real-captured one by the 3DE probe.
We further evaluate the ‘continuous-slicing’ functionality of the CardiacField. For comparison, we generate a total of 160 ‘New Views’ along the long-axis and short-axis of the heart from the CardiacField and the 3D heart reconstructed by the conventional interpolation method, respectively (‘Long-Axis Views’ and ‘Short-Axis Views’ of Fig. 4c). The images sliced from the real-captured 3D heart with the same positions as those ‘New Views’ serve as the ground truth. As shown in Fig. 4b, the interpolation method has PSNRs of \(22.559\pm 1.231\)dB in long-axis and \(23.844\pm 3.767\)dB in short-axis, while the CardiacField has PSNRs of \(29.724\pm 1.453\)dB in long-axis and \(26.514\pm 3.381\)dB in short-axis. As being further verified by Fig. 4c, the CardiacField can avoid interpolation artifacts (highlighted by yellow arrows), exhibiting a superior ‘continuous-slicing’ functionality.