CardiacField: Computational Echocardiography for Universal Screening

doi:10.21203/rs.3.rs-2509563/v1

Download PDF

Article

CardiacField: Computational Echocardiography for Universal Screening

https://doi.org/10.21203/rs.3.rs-2509563/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Cardiovascular diseases, the worldwide leading cause of death, are preventable and treatable. Early diagnosis and monitoring using ultrasound, x-ray or MRI are crucial clinical tools. Routine imaging is, however, currently cost prohibitive. Here we show that computational imaging enables a 3 order of magnitude reduction in the cost of tomographic echocardiography while also radically improving image quality and diagnostic utility. This advance relies on decompressive inference using artificial neural networks. Our system, CardiacField, generates 3D images of the heart from 2D echocardiograms using commodity clinical instruments. CardiacField automatically segments and quantifies the volume of the left ventricle (LV) and right ventricle (RV) without manual calibration. CardiacField estimates the left ventricular ejection fraction (LVEF) with 33% higher accuracy than state-of-the-art video-based methods, and the right ventricular ejection fraction (RVEF) with a similar accuracy, which is not available in existing 2DE methods. This technology will enable routine world-wide tomographic heart screening, such that patients will get instant feedback on lifestyle changes that improve heart health. CardiacField also illustrates the value of a conceptual shift in diagnostic imaging from direct physical model inversion to Bayesian inference. While clinicians tend to prefer linear inference algorithms for their conceptual simplicity, as discussed in this paper, neural inference will save lives.

Health sciences/Cardiology

Biological sciences/Computational biology and bioinformatics/Machine learning

Health sciences/Health care/Medical imaging/Tomography

Biological sciences/Biological techniques/Imaging

Imaging technologies are in the midst of a conceptual revolution. The first century of photographic and x-ray imaging involved physical pictures on film. With the invention of the digital computer, computed tomography technologies involving inversion of Radon or Fourier transformations emerged. With the recent development of neural estimators, a third type of imaging appeared, in which an ensemble of measurements without a direct physical relationship can be combined to estimate physical objects. These strategies are termed ‘computational imaging’ because they optimize digital image estimation rather than physical processing¹. ‘Compressive tomography’ or ‘snapshot compressive imaging’, which estimates high dimensional images (e.g., 3D spatial or 4D space-time) images from low dimensional measurements (e.g., 2D coded patterns), is particularly relevant to diagnostic tomography^{2, 3}. Here we show that a neural estimation strategy can be used to form 3D images of the heart from a sequence of 2D echocardiograms of the beating organ.

Over the past three decades, cardiovascular diseases (CVD) have affected more than 250 million people worldwide, of which more than 6.5 million have lost their lives⁴. Cardiac imaging using ultrasound, x-ray, or magnetic resonance is a vital tool for CVD screening and diagnosis. Cardiac ultrasound preferred among these methods because it is safe, economical, noninvasive, and real-time^{5, 6}. While two-dimensional echocardiography (2DE) is widely available, three-dimensional echocardiography (3DE) has emerged as the most important tool used in cardiac imaging⁷. Accurate geometric modeling of the 3D heart is essential to diagnosis, with 2DE examinations sonographers usually need sufficient anatomical knowledge and years of clinical training to reconstruct the 3D heart mentally for function assessment⁸. 3DE fundamentally offers a more accurate assessment of the chamber volume⁹, ejection fraction (EF)¹⁰, the strain of the right ventricle (RV)¹¹, and the anatomy structure of the tricuspid valve¹². Therefore, 3DE can better predict mortality than 2DE by using left ventricular ejection fraction (LVEF) and global longitudinal strain (GLS) and thus has many other potential prognostic values^{13, 14}.

Here we apply computational imaging to obtain 3D data matching or exceeding 3DE from 2DE systems. Our system, CardiacField, images the 3D cardiac volume by fusing a sequence of 2D echocardiographic images captured by the commodity 2D ultrasound machine in Zhongshan Hospital, Shanghai, without using any additional complex 3DE probes or external trackers. We reconstruct and present the 3D cardiac volumes within a whole cardiac cycle in this article. Benefitting from the high-fidelity reconstruction of 3D cardiac volume, our method enables the automatic segmentation of LV and RV areas without any manual calibrations. As a result, both volume size and EF of LV and RV can be precisely calculated. When evaluated by 56 independent patients, our method provides more accurate LVEF calculation than state-of-the-art EchoNet¹⁵ (built-upon echocardiographic videos), since it avoids the use of inaccurate ‘prolate ellipsoid hypothesis’ in LV volume estimation¹⁶. Our method is also able to give a high-accurate right ventricle ejection fraction (RVEF), which cannot be offered by existing 2DE-based methods (such as EchoNet¹⁵) due to the irregular structure of RV^{16, 17}. As a non-data-driven Artificial Intelligence (AI) tool, CardiacField guards against risks from the data bias in traditional data-driven AI methods¹⁸ and can provide robust and high-fidelity 3D heart reconstruction. CardiacField enables common 2D ultrasound machines already available in most hospitals to generate 3D cardiac volumes for accurate, reliable, and automatic function assessment in clinical diagnosis. We believe the methodology provided here is also suitable for many other imaging devices and imaging models, which might attract broad interest in the fields of biology, medicine, and optics.

2.1 Principle of the CardiacField

The core of the CardiacField is an implicit neural representation network optimized by the physical-informed loss function, which maps the input 3D coordinate ($x,y,z$) of a cardiac volume to its corresponding intensity value. Through self-supervised training with a sequence of 2D echocardiographic images easily collected by a commodity 2DE probe in clinics, a continuous and smooth 3D cardiac volume can be reconstructed by querying the trained network model with the corresponding coordinate of the denser grid (Fig. 1a). The specific steps of the CardiacField can be summarized as follows.

First, we construct the implicit neural representation of a digitalized 3D heart using a multilayer perceptron (MLP) with a multiresolution hash-table to build the mapping function between the coordinates and the associated intensities. The MLP is a fully connected deep network, which can approximate any continuous functions¹⁹. The use of a multiresolution hash-table which has been verified to perform well in characterizing high-frequency components in signal representation¹⁹, enables the MLP to better represent the realistic spatial details in four chambers, e.g., trabecular muscles.

Second, we reconstruct the 3D cardiac volume just using a series of 2D echocardiographic images. Since each 2D echocardiographic image represents a certain cross-sectional view of the 3D cardiac volume, the physical imaging process of a 2D echocardiographic image can be modeled as the process of virtually slicing a 2D plane from the 3D volume (i.e., 2D-Slicing). We define the slicing position and angle that correspond to the location and orientation of the 2DE probe, by a positional parameter $\mathcal{B}$. Through the use of a roughly estimated $\mathcal{B}$ as the initial input²⁰, virtual 2D echocardiographic images can be generated by applying the aforementioned ‘2D-Slicing’ upon the CardiacField. We measure the physical-informed loss by comparing the ‘difference’ between virtually-sliced and corresponding real-captured 2D echocardiographic images and then use it to guide the training of parameters in MLP, multiresolution hash-table, and $\mathcal{B}$ jointly through the back-propagation optimization. By joint optimization, our method can accurately retrieve positional parameters $\mathcal{B}$, getting rid of additional trackers for collecting these parameters. More details of the CardiacField are explained in the Methods.

Our method does not rely on large-scale datasets for network model training, and each patient can reconstruct his/her CardiacField using about 200 2D echocardiographic images collected directly from a simple apical ring scan in the clinic (Fig. 1b and 1c). Once the CardiacField is reconstructed, we can easily integrate it with existing deep learning models for further cardiac function assessment. In this work, we realize the accurate and automatic segmentation of LV and RV areas by borrowing the segmentation model proposed in EchoNet¹⁵, where the model performance and generalization had been cross-verified using 10,030 echocardiogram videos from different healthcare systems. Specifically, in Fig. 1e, we perform the uniform sampling on the 3D cardiac volume to generate about 20–30, 3$mm$-thick 2D slices parallel to the apical four-chamber view and use the segmentation model mentioned above to classify the regions of the LV and RV to calculate the cavity areas from one slice to another. By accumulating the cavity areas of all slices, we can get the volume size precisely and thus use it to accurately estimate EF values. Details of the segmentation model and EF calculation are further described in the Methods.

2.2 Evaluation of Model Performance

To quantitatively evaluate the performance of the proposed CardiacField, we build the first dataset containing paired 2D/3D echocardiographic images, LV/RV volumes, and EFs (acquired by PHILIPS EPIQ 7C machine with S5-1 2DE probe and X5-1 3DE probe). The ground-truth 3D echocardiographic images are directly recorded using the 3DE probe. After the manual calibrations by experienced sonographers, the corresponding LV/RV volumes and EFs are calculated using the 3D ultrasound machine. The 2D echocardiographic images are captured by rotating the 2DE probe in 360 degrees around the apex of the heart. The sequence of 2D echocardiographic images of the same patient is synchronized based on concurrently recorded electrocardiogram (ECG) signals which is then used to train the CardiacField, as shown in Fig. 1b.

We use CardiacField to reconstruct the 3D heart with real-captured 2D echocardiographic images, which can successfully recover the realistic 3D cardiac structure, as shown in Fig. 2a. Compared with the 2DE probe, the 3DE probe suffers from some physical limitations. For instance, the electrical impedance mismatch of the transducers^{21, 22} and the interference of narrow intercostal space to the emitted ultrasound²³ of the 3DE probe both lead to undesired artifacts and inferior resolution. Leveraging high-resolution images provided by the 2DE probe, our CardiacField can reconstruct 3D cardiac volumes with more spatial details than those rendered by a 3DE probe. In Fig. 2b, we compare the same cross-sectional views (apical four-chamber view) between CardiacField reconstructed hearts and the 3D probe captured ones for 10 independent patients. Specifically, in Fig. 2b1-2b5, the CardiacField exhibits enhanced resolution and better contrast than the 3DE probe. In Fig. 2b6, the left atrium (LA) imaged by the 3DE probe is filled with noise, but our method recovers detailed structures of the LA. The same situation can also be observed in Fig. 2b7-2b9, where our method is superior to the 3DE probe in imaging the RV and right atrium (RA) areas. In Fig. 2b10, our method can clearly distinguish the implanted artificial cardiac pacemaker of the patient (as labeled by yellow arrows), which cannot be observed in the image produced by the 3DE probe. We further reconstruct a 3D dynamic heart within one cardiac cycle using the CardiacField, which is also compared with that acquired by the 3DE probe (please see the Supplementary Video 1). Some snapshots of the 3D heart within one cardiac cycle are demonstrated in Fig. 3.

2.3 Volume Segmentation and Analysis of LVEF and RVEF

The EF describes the ratio of changes in the end-systolic volume (ESV) and end-diastolic volume (EDV) of the cardiac chambers and is one of the significant metrics to quantify the cardiac function²⁴. LVEF represents the systolic function of the left ventricular, which is widely used in cardiovascular disease evaluation but suffers from measurement variability and inaccuracies^{16, 25, 26}. RVEF quantifies the right heart, also providing crucial values in predicting major adverse of cardiovascular events²⁷. Using 2D echocardiographic images to calculate the volume of LV is moderately inaccurate since the calculation relies on the assumption that the LV shape is close to a prolate ellipsoid¹⁶. Furthermore, due to the complex geometry of RV, it is not recommended to calculate the volume of RV using 2D echocardiographic images^{16, 17}. Although more accurate volumes and EFs of LV and RV can be calculated using the 3D ultrasound machine, it requires extensive calibrations from experienced sonographers due to the inferior resolution details¹⁶.

Based on the high-fidelity 3D heart reconstructed by the CardiacField, here we use a classical segmentation method¹⁵ to automatically classify the LV/RV volumes (details can be found in Methods) and calculate the EFs of 56 patients, where the results are shown in Fig. 2c and 2d. According to the guidelines of the American Society of Echocardiography and the European Association of Cardiovascular Imaging¹⁶, we use the EFs obtained by the 3D ultrasound machine (after calibration by the experienced sonographers) as the ground truth. We also compare our results with EchoNet¹⁵, which is the first 2D echocardiogram video-based end-to-end deep learning model to estimate LVEF and has demonstrated a more stable LVEF prediction ability than the standard clinic examinations (by the experienced sonographers based on 2D images). In Fig. 2c, the LVEF predicted by our method has a mean absolute error (MAE) of 2.8% and a root mean squared error (RMSE) of 3.8%, while the prediction by EchoNet¹⁵ has an MAE of 4.2% and an RMSE of 5.5%. The LVEF predicted by our method offers about 33% and 31% reduction in MAE and RMSE metrics over EchoNet¹⁵ respectively, suggesting that our method on LVEF prediction is more accurate. Furthermore, we calculate the RVEF, which is not available by the EchoNet¹⁵, with an MAE of 3.6% and an RMSE of 4.7% (Fig. 2d). As evidenced by the minor errors of LVEF and RVEF, our method could provide reliable assessment for further cardiac diagnosis and treatment. The comparison of volume segmentation and area calculation are presented in the Supplementary Table 1.

2.4 Generalization to Different Ultrasound Machines

To evaluate the reliability of the CardiacField on different ultrasound machines, we recruit another five volunteers to conduct generalization experiments using two widely-used but different 2D ultrasound machines, i.e., PHILIPS and SIEMENS. PHILIPS is the anchor machine that is primarily used and SIEMENS is for generalization verification. We apply the CardiacField to respectively reconstruct the 3D heart using 2D echocardiographic images captured by these two machines. We then calculate the LVEF and RVEF for the same person using reconstructed 3D hearts from two machines, as shown in Fig. 2e. The LVEF predicted by our method on PHILIPS has an MAE of 1.7% and an RMSE of 1.9%, while the prediction on SIEMENS has an MAE of 1.9% and an RMSE of 2.2%. Additionally, the RVEF prediction on PHILIPS has an MAE of 1.1% and an RMSE of 1.4%, while the prediction on SIEMENS has an MAE of 1.6% and an RMSE of 2.1%. The variations of LVEF and RVEF on different machines are less than 1%. The EF error within 5% between different measurements might be considered clinically acceptable^{26, 28}, as a result, this experiment verifies the reliability of the CardiacField for LV/RV EF measurements on different machines. The volume comparison is presented in the Supplementary Table 2.

2.5 Analysis of 3D Reconstruction

In this section, we evaluate the efficiency of CardiacField for its positional parameter $\mathcal{B}$ estimation, 3D hearts reconstruction and ‘continuous-slicing’ functionality (to generate continuously cross-sectional views of the 3D heart). The ground truth data includes the real-captured 3D heart by the 3DE probe and 90 different positional parameters ${\mathcal{B}}_{0}$. We then slice 90 2D echocardiographic images using ${\mathcal{B}}_{0}$ from the 3D heart to form ‘Input Views’ (Fig. 4c). Note that ${\mathcal{B}}_{0}$ is unknown for CardiacField reconstruction.

For accurate positional parameter $\mathcal{B}$ estimation, we first utilize the PlaneInVol²⁰ for the initial estimation and then feed the 90 ‘Input Views’ into CardiacField for joint 3D heart reconstruction and positional parameters optimization. Since the $\mathcal{B}$ can be decomposed into a rotation matrix and a translation vector (details can be found in the Methods), we compare the difference of rotation angles and translation distances between those optimized by CardiacField and the ground truth ${\mathcal{B}}_{0}$ to evaluate the accuracy. The CardiacField finally achieves accurate recovery of positional parameter $\mathcal{B}$ with only averaged $0.067\pm 0.024$-degree error in rotation angles and $0.210\pm 0.143$-$cm$ error in translation distances over 56 hearts, leading to about 81% and 67% error reduction respectively over initial positional parameters. Additionally, we visualize the initialized, refined and ground truth positional parameter trajectories of one heart in Fig. 4a, respectively.

With accurate positional parameters estimation, CardiacField can recover high-fidelity 3D heart (Fig. 4b). We also reuse the sliced 90 ‘Input Views’ and the refined positional parameters by CardiacField to reconstruct another 3D heart via the conventional interpolation method (i.e., interpolating the intensity at each voxel grid by weighted averaging 20 nearest pixels) for comparison. Quantitative measurements are evaluated against the real-captured 3D heart using the 3DE probe. Compared with the interpolation method with a Peak Signal-to-Noise Ratio (PSNR) of $21.572\pm 1.923$dB (Fig. 4b), the CardiacField achieves a PSNR of $27.452\pm 2.013$dB, demonstrating that the CardiacField can reconstruct the 3D heart reconstructed closely to the real-captured one by the 3DE probe.

We further evaluate the ‘continuous-slicing’ functionality of the CardiacField. For comparison, we generate a total of 160 ‘New Views’ along the long-axis and short-axis of the heart from the CardiacField and the 3D heart reconstructed by the conventional interpolation method, respectively (‘Long-Axis Views’ and ‘Short-Axis Views’ of Fig. 4c). The images sliced from the real-captured 3D heart with the same positions as those ‘New Views’ serve as the ground truth. As shown in Fig. 4b, the interpolation method has PSNRs of $22.559\pm 1.231$dB in long-axis and $23.844\pm 3.767$dB in short-axis, while the CardiacField has PSNRs of $29.724\pm 1.453$dB in long-axis and $26.514\pm 3.381$dB in short-axis. As being further verified by Fig. 4c, the CardiacField can avoid interpolation artifacts (highlighted by yellow arrows), exhibiting a superior ‘continuous-slicing’ functionality.

While medical imaging systems rely on verifiable physical data, computational and compressive imaging continue to demonstrate that algorithmic inference from incomplete data radically improves the capabilities of physical systems. Here we have shown that CardiacField is able to generate the 3D heart with high-fidelity structures and details only using 2D echocardiographic images that are easily collected in most clinics via prevalent 2DE probes. Our method is, to our best knowledge, the first implicit neural representation model for continuous 3D heart reconstruction. Besides, the reconstructed 3D heart can be used to calculate the volume and EF for assessing cardiac function, revealing more reliable results than previous video-based methods¹⁵.

It is of vital importance to facilitate automatic echocardiographic diagnosis or screening for cardiovascular diseases, especially in rural areas and for non-cardiology practitioners. Artificial intelligence algorithms are developed to interpret massive 2D echocardiographic images and make diagnoses or predictions. Though previously reported results are exciting and encouraging, their performance is somewhat limited by the ambiguities of characterizing a 3D heart using 2D images or videos^{15, 29}. CardiacField represents the 3D heart as a continuous implicit function using a series of 2D images captured from different views, achieving high-fidelity 3D reconstruction with more realistic details in four chambers and overcoming the previous ambiguities. Furthermore, since CardiacField provides the ability of infinite sampling, our method would be able to provide more productive and precise information of a 3D heart and represents an essential step towards automatic cardiovascular disease detection by combining other artificial intelligence algorithms. Currently, our method requires about one hour to achieve the whole loop of the CardiacField reconstruction, segmentation, and EF calculation. In the future, we will focus on realizing real-time 3D imaging and cardiac function assessment using 2DE probes.

4.1 Data Acquisition and Curation

2D data

2D echocardiographic images were acquired using commercial PHILIPS EPIQ 7C/IE ELITE machines equipped with S5-1 2DE probes and SIEMENS ACUSON SC2000 PRIME machines equipped with 4V1c 2DE probes, where sonographers rotated the 2DE probe in 360 degrees around the apex of the heart for each independent patient. Then the sequence of 2D echocardiographic images was synchronized based on concurrently recorded ECG signals. We cropped and masked 2D echocardiographic images to remove text, electrocardiogram information, and other irrelevant information outside the scanning sector, and then resized them to 160160-pixel images through the use of bicubic interpolation filter. These images are later used for the CardiacField derivation and further calculation of LV/RV volumes and EFs.

3D data

3D echocardiographic images of each independent patient were acquired by sonographers using commercial PHILIPS EPIQ 7C equipped with X5-1 3DE probes for comparison. Then the experienced sonographers manually calibrated the 3D echocardiographic images to calculate the LV/RV volumes and EFs using the 3D ultrasound machine. The calculated volumes and EFs serve as the ground truth in our experiments.

This study was approved by the Ethics Committee of Fudan University Zhongshan Hospital (approval number B2021-002). The 2D/3D echocardiographic images used in this study were collected from 56 patients. All the research participants signed the informed consent before we collected the data.

4.2 CardiacField Model

4.2.1 Overview of the CardiacField

CardiacField contains three major components: the physical imaging model of 2DE acquisition, the positional parameter representation, and the implicit neural representation network supervised by the physics-informed loss function. We first build the physical imaging model of 2DE acquisition in Section 4.2.2. Then, we define the positional parameter representation used in our physical imaging model in Section 4.2.3. Finally, we construct the implicit neural representation of a 3D heart in Section 4.2.4 and design the physics-informed loss function based on the imaging model to guide the training of the CardiacField in Section 4.2.5. The training details of CardiacField can be found in Section 4.2.6.

4.2.2 Physical Imaging Model of 2DE Acquisition

Prior to building the neural representation for the 3D cardiac volume, we first introduce the forward imaging model of 2DE acquisition with symbols and definitions adopted throughout the paper. There are two types of coordinate systems used in our representation, i.e., the 3D world coordinate system and the 2D image coordinate system, where the former one describes the 3D cardiac volume and the later one characterizes the 2D echocardiographic images. The 3D heart is formulated as $O\left(\overrightarrow{X}\right)$, where $\overrightarrow{X}={[x,y,z]}^{T}$ is the position of each point in the 3D world coordinate system, and $O\left(\overrightarrow{X}\right)$ represents the corresponding intensity value. Since each 2D echocardiographic image represents a cross-sectional view of the 3D cardiac volume from a specific position, the 2DE can be modeled as applying a slicing operation to the $O\left(\overrightarrow{X}\right)$. This physical process is embodied in the following operator definitions.

Slicing operator. Let ${\mathcal{S}}_{\mathcal{M}}^{\mathcal{N}}$ denote the slicing operator that reduces an $N$-dimensional function down to an $M$-dimensional one by zero-ing out the last $N-M$ dimensions, which can be specifically expressed as${\mathcal{S}}_{\mathcal{M}}^{\mathcal{N}}\circ f\left({x}_{1},{x}_{2},\dots ,{x}_{N}\right)=f\left({x}_{1},{x}_{2},\dots ,{x}_{M},0,\dots ,0\right)$

Transforming operator. $\mathcal{B}$ is an operator, indicating the use of a geometry transformation to an $N$-dimensional function. In our implementation, $\mathcal{B}$ is an $N\times (N+1)$ matrix which is used to act on $N$-dimensional column vectors: $\mathcal{B}\circ f\left(\overrightarrow{X}\right)=f\left(\mathcal{B}{\overrightarrow{X}}^{{\prime }}\right)$, and ${\overrightarrow{X}}^{{\prime }}={[x,y,z,1]}^{T}$ is the homogeneous coordinate of $\overrightarrow{X}$.

With these definitions, the physical process of 2DE imaging can be written in the following equation,

$$\begin{array}{c}P\left(\overrightarrow{u}\right)={\mathcal{S}}_{2}^{3}\circ B\circ O\left(\overrightarrow{X}\right),\#\left(1\right)\end{array}$$

where $\overrightarrow{u}={[u,v]}^{T}$ represents pixel coordinates of a 2D image, $P\left(\overrightarrow{u}\right)$ is the corresponding intensity value, and $\mathcal{B}$ describes the positional parameter of each 2D echocardiographic image.

4.2.3 Positional Parameters

The positional parameter $\mathcal{B}$ is a $3\times 4$matrix and can be decomposed as a $3\times 3$ rotation matrix $R$ and a $3\times 1$ translation vector $\overrightarrow{t}$, i.e.,

$$\begin{array}{c}B=\left[R | \overrightarrow{t}\right].\#\left(2\right)\end{array}$$

The $R$ is in the $SO\left(3\right)$ space, where $SO\left(3\right)$ is a space of rotation matrix in ${\mathbb{R}}^{3\times 3}$, i.e.,

$$\begin{array}{c}SO\left(3\right)=\left\{R\in {\mathbb{R}}^{3\times 3}:R{R}^{T}=I,\text{det}R=1 \right\}.\#\left(3\right)\end{array}$$

Since it is redundant to use a nine-parameter rotation matrix $R$ to describe a rotation with only three degrees of freedom, we convert $3\times 3$ rotation matrix $R$ to the $3\times 1$ axis-angle representation $\overrightarrow{\varphi }=\alpha \overrightarrow{{\omega }}$, where $\overrightarrow{{\omega }}$ is a normalized rotation axis and $\alpha$ is a rotation angle. The conversion between $R$ and $\overrightarrow{\varphi }$ can be achieved by the Rodrigues' formula,

$$\begin{array}{c}R=I+\frac{sin\left({\alpha }\right)}{{\alpha }}\varPhi +\frac{1-cos\left({\alpha }\right)}{{{\alpha }}^{2}}{{\Phi }}^{2},\#\left(4\right)\end{array}$$

where $I$ is an identity matrix, and ${\Phi }$ represents the skew matrix converted from $\overrightarrow{\varphi }$ that can be expressed as:

$$\begin{array}{c}\varPhi =\left[\begin{array}{ccc}0& {-\overrightarrow{\varphi }}_{2}& {\overrightarrow{\varphi }}_{1}\\ {\overrightarrow{\varphi }}_{2}& 0& {-\overrightarrow{\varphi }}_{0}\\ {-\overrightarrow{\varphi }}_{1}& {\overrightarrow{\varphi }}_{0}& 0\end{array}\right],\#\left(5\right)\end{array}$$

where ${\overrightarrow{\varphi }}_{i}$ is the $i$-th element in the vector $\overrightarrow{\varphi }$. We use the axis-angle representation in our implementation.

We retrain the network of PlaneInVol²⁰ to estimate the initial positional parameter for each 2D echocardiographic image. The training dataset comprises 738 3D cardiac volumes acquired using the PHILIPS EPIQ 7C machines. For each 3D cardiac volume, we slice 200 2D echocardiographic images following the trajectories when performing the real 2D echocardiographic images acquisition (i.e., rotating the 2DE probe in 360 degrees around the apex of the heart). Additionally, we add random noise on the slicing trajectory to simulate the hand-held shakes of 2DE probe in real acquisition.

After the initialization, we can further optimize the positional parameter during the training process of the CardiacField through the back-propagation optimization (details can be found in the Eq. (7) and Eq. (8) of Section 4.2.5).

4.2.4 Implicit Neural Representation of 3D Heart

As shown in Fig. 1a, the 3D heart is represented as an implicit function, where the input is a given 3D position $\overrightarrow{X}=[x,y,z{]}^{T}$ and the output is the corresponding intensity $O\left(\overrightarrow{X}\right)$. This continuous 3D function is approximated using a MLP network with multiresolution hash-table ${F}_{\theta }:\overrightarrow{X}\to \text{O}\left(\overrightarrow{X}\right)$, where $\theta$ refers to the parameters of weight and bias in the MLP network, and keys in the hash-table. Figure 1a illustrates the workflow of CardiacField. The MLP is composed of two layers and each layer contains 64 neurons. All hidden layers in MLP are activated with a ReLU nonlinear function and the last layer is activated using the sigmoid function for normalizing the output to $\left[\text{0,1}\right]$. To overcome the limitation of the MLP in representing high-frequency information, a multiresolution hash-table³⁰ with trainable keys is introduced prior to feeding the input coordinate into the MLP. There are ${\sum }_{i=1}^{L}{G}_{i}^{3}$ keys in the hash-table, where $L$ is the number of resolution levels. For the $i$-th level, the 3D space is divided into a grid evenly with ${G}_{i}^{3}$ points,

$$\begin{array}{c}{G}_{i}={G}_{min}+ni,\#\left(6\right)\end{array}$$

where ${G}_{min}$ is 10 and $n$ is 5. Each key in the hash-table has a size 1$\times 3$. All values in the hash-table are initialized randomly following the uniform distribution $\mathcal{U}({-10}^{-4},{10}^{-4})$. For each input 3D coordinates, its hash key in the $i$-th level is indexed by applying tri-linear interpolation to the keys of its nearest 8 grid points. Then these keys are concatenated and fed into the subsequent MLP network.

4.2.5 Physics-informed Loss for Unsupervised Learning

Given a set of 2D echocardiographic images $\{{P}_{i}{\}}_{1}^{N}$ with unknow positional parameters. The implicit function ${F}_{\theta }$ is supervised using a physical loss following the 2DE imaging model in Eq. (1). Firstly, all coordinates in the 3D cardiac volume are fed into the ${F}_{{\theta }}$ to obtain the predicted intensity ${F}_{{\theta }}\left(\overrightarrow{X}\right)$. Meanwhile, all positional parameters $\{{\mathcal{B}}_{i}{\}}_{1}^{N}$ of $\{{P}_{i}{\}}_{1}^{N}$ are initialized using the method described in Section 4.2.3. Then, the predicted 2D echocardiographic images can be synthesized using the Eq. (1). Finally, the loss function is built by comparing the synthesized 2D echocardiographic images with the real measurements,

$$\begin{array}{c}L={\sum }_{i=1}^{N}{\left({\mathcal{S}}_{2}^{3}\circ {\mathcal{B}}_{\mathcal{i}}\circ {F}_{\theta }\left(\overrightarrow{X}\right)-{P}_{i}\right)}^{2}.\#\left(7\right)\end{array}$$

In practice, the optimal parameters ${\theta }^{*}$ and the positional parameters ${\mathcal{B}}^{*}$ are optimized simultaneously using the Adam optimizer, i.e.,

$$\begin{array}{c}{{\theta }}^{*},{\mathcal{B}}^{*}=\underset{{\theta },\mathcal{B}}{\text{arg} \text{min}}\mathcal{L}.\#\left(8\right)\end{array}$$

4.2.6 Training Details of the CardiacField

The CardiacField is implemented using PyTorch. The Adam optimizer³¹ is adopted to train the proposed model for around 10,000 epochs. The learning rate, initialized as 0.001, is decayed every 10 epochs by multiplying with 0.9954 (exponential decay) for the MLP network and multiresolution hash-table; and it is decayed every 100 epochs with a multiplier of 0.9 for the positional parameters.

4.3 Cardiac Function Assessment

In order to obtain the precise EDV and ESV to accurately calculate EF for cardiac function assessment, we first perform the uniform sampling on the reconstructed 3D heart to generate about 20–30, 3-$mm$ 2D slices parallel to the apical four-chamber view, and then use the segmentation model developed in EchoNet¹⁵ to automatically classify the area of the LV and RV. The performance and generalization of EchoNet¹⁵ had been verified on 10,030 echocardiogram videos from different healthcare systems. The detail of our segmentation model is shown in Supplementary Fig. 1. The results of the segmentation model are reported in Supplementary Table. 3.

After the LV and RV segmentation, we calculate the volume of LV and RV according to the widely used Simpson’s rule in the clinical examinations^{16, 32, 33}. Concretely, we denote the areas of LV and RV as the number of pixels belonging to their categories, i.e., the area of LV/RV is the number of pixels classified as LV/RV cavity in a slice. Then the volume $V$ of the LV and RV can be calculated by,

$$\begin{array}{c}V=\sum _{i=1}^{n}{A}_{i}\times {r}^{2}\times d, \#\left(9\right)\end{array}$$

where ${A}_{i}$ denotes the number of pixels of LV or RV in the $i$-th slice, $n$ denotes the number of sampled slices, $r$denotes the physical distance ($0.33-0.41mm$) between the adjacent pixels in the sliced image and $d$ denotes the physical distance between the adjacent sampled slices (i.e., $d=3 \text{m}\text{m}$). The EF describes the ratio of changes in the ESV and EDV of LV/RV,

$$\begin{array}{c}EF=\frac{EDV-ESV}{EDV}\times 100\%.\#\left(10\right)\end{array}$$

Data availability

All relevant data that support the finding of this study are available from the corresponding authors upon reasonable request.

Code availability

We will make the source code, datasets, and relevant materials used in this work publicly accessible after the paper is accepted.

Acknowledgment

This work was supported by the National Science Foundation of China (NSFC) (No. 62022038, 62101242, 62071219, 62025108, U20A20184) and the Science and Technology Commission of Shanghai Municipality (STCSM) (No. 20Y11909800).

Author contributions

Z. M. and X. C. conceived this project. Y. Lin supervised this research. H. Z. and Y. Z. analyzed the imaging model and designed experiments. CK. S. designed algorithm implementations. CK. S. and S. Y. conducted numerical simulations. Y. Liu captured all the experimental data. LL. D. and WP. Z. prepared the samples. CK. S processed the data. All authors participated in the writing of the paper.

Competing interests

The authors declare no competing interests.

Mait, J.N., G.W. Euliss, and R.A. Athale, Computational imaging. Advances in Optics and Photonics, 2018. 10(2): p. 409-483.
Brady, D.J., et al., Compressive tomography. Advances in Optics and Photonics, 2015. 7(4): p. 756-813.
Yuan, X., D.J. Brady, and A.K. Katsaggelos, Snapshot compressive imaging: Theory, algorithms, and applications. IEEE Signal Processing Magazine, 2021. 38(2): p. 65-88.
Roth, G.A., et al., Global burden of cardiovascular diseases and risk factors, 1990–2019: update from the GBD 2019 study. Journal of the American College of Cardiology, 2020. 76(25): p. 2982-3021.
Braga, J.R., et al., Trends in the use of cardiac imaging for patients with heart failure in Canada. JAMA network open, 2019. 2(8): p. e198766-e198766.
Liu, S., et al., Deep learning in medical ultrasound analysis: a review. Engineering, 2019. 5(2): p. 261-275.
Lang, R.M., et al., 3-Dimensional echocardiography: latest developments and future directions. JACC: Cardiovascular Imaging, 2018. 11(12): p. 1854-1878.
Benacerraf, B.R., Three‐dimensional fetal sonography: use and misuse. 2002, Wiley Online Library. p. 1063-1067.
Robinson, S., et al., A practical guideline for performing a comprehensive transthoracic echocardiogram in adults: the British Society of Echocardiography minimum dataset. Echo Research and Practice, 2020. 7(4): p. G59-G93.
Li, Y., et al., Value of 3D versus 2D speckle-tracking echocardiography for RV strain measurement: validation with cardiac magnetic resonance. Cardiovascular Imaging, 2020. 13(9): p. 2056-2058.
Addetia, K., et al., New directions in right ventricular assessment using 3-dimensional echocardiography. JAMA cardiology, 2019. 4(9): p. 936-944.
Badano, L.P., et al., The added value of 3-dimensional echocardiography to understand the pathophysiology of functional tricuspid regurgitation. Cardiovascular Imaging, 2021. 14(3): p. 683-689.
Medvedofsky, D., et al., 2D and 3D echocardiography-derived indices of left ventricular function and shape: relationship with mortality. JACC: Cardiovascular Imaging, 2018. 11(11): p. 1569-1579.
Rodríguez-Zanella, H., et al., Added value of 3-versus 2-dimensional echocardiography left ventricular ejection fraction to predict arrhythmic risk in patients with left ventricular dysfunction. JACC: Cardiovascular Imaging, 2019. 12(10): p. 1917-1926.
Ouyang, D., et al., Video-based AI for beat-to-beat assessment of cardiac function. Nature, 2020. 580(7802): p. 252-256.
Lang, R.M., et al., Recommendations for cardiac chamber quantification by echocardiography in adults: an update from the American Society of Echocardiography and the European Association of Cardiovascular Imaging. European Heart Journal-Cardiovascular Imaging, 2015. 16(3): p. 233-271.
Kaul, S., et al., Assessment of right ventricular function using two-dimensional echocardiography. American heart journal, 1984. 107(3): p. 526-531.
Torralba, A. and A.A. Efros. Unbiased look at dataset bias. in CVPR 2011. 2011. IEEE.
Leshno, M., et al., Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural networks, 1993. 6(6): p. 861-867.
Yeung, P.-H., et al., Learning to map 2D ultrasound images into 3D space with minimal human annotation. Medical Image Analysis, 2021. 70: p. 101998.
Turnbull, D.H. and F.S. Foster, Fabrication and characterization of transducer elements in two-dimensional arrays for medical ultrasound imaging. IEEE transactions on ultrasonics, ferroelectrics, and frequency control, 1992. 39(4): p. 464-475.
Prager, R.W., et al., Three-dimensional ultrasound imaging. Proceedings of the Institution of Mechanical Engineers, Part H: Journal of Engineering in Medicine, 2010. 224(2): p. 193-223.
Lang, R.M., et al., EAE/ASE recommendations for image acquisition and display using three-dimensional echocardiography. European Heart Journal–Cardiovascular Imaging, 2012. 13(1): p. 1-46.
Marwick, T.H., Ejection fraction pros and cons: JACC state-of-the-art review. Journal of the American College of Cardiology, 2018. 72(19): p. 2360-2379.
Huang, H., et al., Accuracy of left ventricular ejection fraction by contemporary multiple gated acquisition scanning in patients with cancer: comparison with cardiovascular magnetic resonance. Journal of Cardiovascular Magnetic Resonance, 2017. 19(1): p. 1-9.
Otterstad, J., et al., Accuracy and reproducibility of biplane two-dimensional echocardiographic measurements of left ventricular dimensions and function. European heart journal, 1997. 18(3): p. 507-513.
Purmah, Y., et al., Right ventricular ejection fraction for the prediction of major adverse cardiovascular and heart failure-related events: a cardiac MRI based study of 7131 patients with known or suspected cardiovascular disease. Circulation: Cardiovascular Imaging, 2021. 14(3): p. e011337.
Pellikka, P.A., et al., Variability in ejection fraction measured by echocardiography, gated single-photon emission computed tomography, and cardiac magnetic resonance in patients with coronary artery disease and left ventricular dysfunction. JAMA network open, 2018. 1(4): p. e181456-e181456.
Ghorbani, A., et al., Deep learning interpretation of echocardiograms. NPJ digital medicine, 2020. 3(1): p. 1-10.
Müller, T., et al., Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph., 2022. 41: p. 102:1--102:15.
Kingma, D.P. and J. Ba, Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
Bai, W., et al., Automated cardiovascular magnetic resonance image analysis with fully convolutional networks. Journal of Cardiovascular Magnetic Resonance, 2018. 20(1): p. 1-12.
O'Dell, W.G., Accuracy of left ventricular cavity volume and ejection fraction for conventional estimation methods and 3D surface fitting. Journal of the American Heart Association, 2019. 8(6): p. e009124.
Mohamed, F. and C.V. Siang, A survey on 3D ultrasound reconstruction techniques. Artificial Intelligence—Applications in Medicine and Biology, 2019: p. 73-92.

There is NO Competing Interest.

CardiacFieldSupplementaryInformationV2023.01.24.docx
Supplementary Information for CardiacField: Computational Echocardiography for Universal Screening
video1.mp4
The Workflow of the CardiacField
video2.mp4
Volume Segmentation Results
video3.mp4
Different Views of Reconstructed 3D Heart

Download PDF

Version 1

posted

You are reading this latest preprint version

CardiacField: Computational Echocardiography for Universal Screening

Status:

Version 1

Abstract

Figures

1 Introduction

2 Results

2.1 Principle of the CardiacField

2.2 Evaluation of Model Performance

2.3 Volume Segmentation and Analysis of LVEF and RVEF

2.4 Generalization to Different Ultrasound Machines

2.5 Analysis of 3D Reconstruction

3 Discussion

4 Methods

4.1 Data Acquisition and Curation

4.2 CardiacField Model

4.2.1 Overview of the CardiacField

4.2.2 Physical Imaging Model of 2DE Acquisition

4.2.3 Positional Parameters

4.2.4 Implicit Neural Representation of 3D Heart

4.2.5 Physics-informed Loss for Unsupervised Learning

4.2.6 Training Details of the CardiacField

4.3 Cardiac Function Assessment

Declarations

References

Additional Declarations

Supplementary Files

Status:

Version 1