Non-rigid image registration of 4D-MRI data for improved target delineation of moving tumors in radiotherapy

Background: To increase the image quality of end-expiratory and end-inspiratory phases of retrospective respiratory self-gated 4D MRI data sets using non-rigid image registration for improved target delineation of moving tumors. Methods: End-expiratory and end-inspiratory phases of volunteer and patient 4D MRI data sets are used as targets for non-rigid image registration of all other phases using two different registration schemes: In the first, all phases are registered directly (dir-Reg) while next neighbors are successively registered until the target is reached in the second (nn-Reg). Resulting data sets are quantitatively compared using diaphragm and tumor sharpness and the coefficient of variation of regions of interest in the lung, liver, and heart. Qualitative assessment of the patient data regarding noise level, tumor delineation, and overall image quality was performed by blinded reading based on a 4 point Likert scale. Results: The median coefficient of variation was lower for both registration schemes compared to the target. Median dir-Reg coefficient of variation of all ROIs was 5.6 % lower for expiration and 7.0 % lower for inspiration compared with nn-Reg. Statistical significant differences between the two schemes were found in all comparisons. Median sharpness in inspiration is lower compared to expiration sharpness in all cases. Registered data sets were rated better compared to the targets in all categories. Over all categories, mean expiration scores were 2.92 ± 0.18 for the target, 3.19 ± 0.22 for nn-Reg and 3.56 ± 0.14 for dir-Reg and mean inspiration scores 2.25 ± 0.12 for the target, 2.72 ± 215 0.04 for nn-Reg and 3.78 ± 0.04 for dir-Reg. Conclusions: In this work, end-expiratory and inspiratory phases of a 4D MRI data

sets are used as targets for non-rigid image registration of all other phases. It is qualitatively and quantitatively shown that image quality of the targets can be significantly enhanced leading to improved target delineation of moving tumors.

Background
Precise information regarding the extent of respiratory induced motion of tumors and surrounding organs at risk (OAR) plays a crucial role in radiotherapy treatment (RT) and planning. Highly conformal treatment techniques require an exact target and OAR delineation in treatment planning. Treatment planning concepts using the mid-ventilation and internal-target volume concept [1,2] are based on the extent of tumor motion between expiration and inspiration. Four-dimensional (4D) imaging is therefore required to provide necessary information about the individual respiration-associated motion patterns [3,4]. In this context, 4D computed tomography (CT) is commonly used as the basis for treatment planning [5]. External [6] or internal [7,8] respiratory surrogates can be acquired simultaneously during free-breathing CT imaging to retrospectively sort the acquired data into different respiratory phases (bins) and hence to avoid motion artifacts [9,10]. Magnetic resonance imaging (MRI) is an alternative method to acquire 4D data sets. MRI is characterized by its superior soft-tissue contrast which is important for the identification of lesions e.g. within the prostate, the liver or the pancreas [11][12][13].
Free-breathing MRI, where the MR data itself serves as an internal respiratory surrogate ('self-gating'), can be used to retrospectively reconstruct 4D data sets with reduced motion artifacts. For this, there are no additional devices compared to 4D CT necessary. Radial [14,15] and Cartesian [16][17][18] acquisition techniques using 'self-gating' were proposed. These techniques commonly make use of compressed sensing or parallel imaging reconstructions to compensate inherently long acquisition times due to the large number of phase encoding steps, which have to be acquired in 4D MRI. This allows for clinically acceptable examination times and image quality in terms of achievable resolution and signal to noise ratio (SNR).
The acquired respiratory signal is divided into several gating windows (bins) which represent different respiratory phases. Usually, 10 different respiratory phases from inspiration to expiration are reconstructed [19]. Small bins are characterized by good motion artifact reduction (blurring and ghosting) but also reduced SNR or even under-sampling artifacts due to missing lines after gating [20,21]. In contrast, larger bins lead to increased SNR but come along with increased motion artifacts.
Artifacts resulting from motion and under-sampling as wells as low SNR degrade the diagnostic value of the 4D MRI data set. This is particularly important in target volume definition, which is often based on the identification of the maximal tumor displacement between end-inspiration and end-expiration. Therefore, sufficient motion compensation and at the same time high SNR have to be achieved in these phases to allow for high delineation accuracy.
In this work, end-expiration and end-inspiration are used as target volumes for nonrigid image registration for the reconstruction of 4D data sets acquired under freebreathing conditions using respiratory self-gating. The resulting data sets are quantitatively and qualitatively compared to investigate and compare the diagnostic value of both registration schemes.

Data Acquisition and Reconstruction
A SIEMENS 1.5 T scanner (Magnetom Avanto, Siemens Healthineers, Erlangen, Germany) was used for all experiments. A six-channel body array in combination with a spine array was used for signal reception. Data were acquired under freebreathing conditions with a recently proposed 3D Cartesian FLASH sequence, characterized by a non-uniform order of phase encoding steps [16]. The central kspace signal was used as a navigator signal for retrospective respiratory self-gating [21]. Data were sorted into ten different gating windows (bins), representing different respiratory phases from end-expiration to end-inspiration, using equal bin intervals. A 16-core Intel Xeon CPU system with 125 GB RAM and MATLAB (Math Works, Natick, Massachusetts) was used for data reconstruction and registration.
Missing k-space lines after the gating process were reconstructed using an iterative parallel imaging algorithm (SPIRiT) [22].
The following imaging parameters were used: echo time TE = 1.

Data Registration
An open-source software package implemented in MATLAB was used for non-rigid registration, based on the demon algorithm [23,24]. Gaussian smoothing with a sigma of 4 was applied to the velocity field each iteration for regularization. Endexpiratory and end-inspiratory phases of the reconstructed 4D MRI data sets served as target image sets. All other phases, with a total number of reconstructed phases n, were registered onto the corresponding target image set using two different registration schemes, referred to as direct (dir-Reg) and next-neighbour (nn-Reg) registration. The different registration schemes are exemplarily illustrated in Fig. 1 with the end-expiratory phase as target and n = 5: nn-Reg (right): A chain of alternating registration and averaging steps being used. Phase 5 is first registered on the adjacent phase 4. The resulting phase 54 and phase 4 are then averaged before this data set is registered on phase 3 and averaged again. In the last registration step, data set 5432 is registered on the target and then averaged once more. Averaging before each registration step leads to equally contributing phases.
dir-Reg (left): Each phase is directly registered on the target resulting in data sets 51, 41, 31, 21 and then averaged.
Both registration schemes lead to a data set, which preserves the anatomy of the target due to registration containing the signal intensity of the complete 4D data set due to averaging.

Quantitative data evaluation
Regions of interest (ROIs) were drawn in the liver, lung and the heart respectively for calculation of the coefficient of variation (CV) as an image quality measurement [17] with the standard deviation σ ROI and the mean µ ROI of the ROIs. The CV describes local image homogeneity. It was used to compare the registration schemes regarding artifacts introduced by registration.
The mean CV of ROIs, drawn in three adjacent slices for each organ, was used for evaluation.
Image sharpness was calculated using 10 manually defined intensity plots across edges at the lung-liver interface and tumor boundaries. Each profile is individually fitted with an error function [25]. The Full Width at Half Maximum (FWHM) of the underlying Gaussian function is then calculated and used as a measure of image sharpness expressed in pixel. A small FWHM means high sharpness. Figure 2 exemplarily shows three edge profile plots and the error function fits on the left.
The calculated FWHM with corresponding Gaussian functions are displayed on the right. CV and sharpness were calculated for end-expiratory and end-inspiratory phase of the target and the registered data sets dir-Reg and nn-Reg for volunteer and patient measurements. CV values were normalized to the corresponding target sets.
Wilcoxon matched-pair signed-rank test was used for comparison between the three data sets. Statistical significance was considered for p < 0.05. Additionally, the total reconstruction and registration times were assessed for all patient measurements for evaluation of clinical suitability.

Qualitative data evaluation
Image scoring of the patient measurements was performed independently by two radiation oncologists and one radiologist all having at least 5 years of experience in reading MR images. Readers were blinded to clinical information and applied reconstruction algorithm. Corresponding target, dir-Reg, and nn-Reg data sets were presented simultaneously to the reader for each patient and respiratory phase (endexpiration and end-inspiration). The order of the three data sets, patients and respiratory phase was randomly changed. Criteria for scoring were: (1) noise level (e.g. in the liver or the lung), (2) sharpness (identification and delineation of tumors, the sharpness of the diaphragm and small liver and lung vessels) and (3) overall image quality. A four-point Likert scale was used for scoring: Score of 1 poor; score of 2 fair; score of 3 good, score of 4 very good. shows strongly reduced blurring but also decreased signal intensity. Figure 3 shows the boxplots of the normalized CV for the different registration schemes. Volunteer and patient data sets were used for evaluation. The median CV of all ROIs of the two registration schemes decreased in all cases compared to the target. The median CVs of the dir-Reg scheme were lower and statistically significant differences (p < 0.05) between the CVs of the two schemes (marked as stars) were found in all cases. The average CV of all ROIs was 5.6 ± 5.3% lower for expiration and 7.0 ± 4.2% lower for inspiration comparing dir-Reg with nn-Reg.

Results
The results of the sharpness measurements are displayed in Fig. 4. Median expiration sharpness is higher (smaller FWHM) compared with corresponding inspiration sharpness in all cases. Target diaphragm sharpness is the highest in case of expiration whereas dir-Reg sharpness is the highest in case of inspiration.
Tumor sharpness is higher in all cases compared to diaphragm sharpness. The nn-Reg registration shows the highest interquartile ranges in all comparisons. Table 1 shows the results of the Wilcoxon signed-rank test used for comparison of sharpness between the data sets. Statistically not significant differences could be found in different comparisons and breathing states. Figure 5 shows a qualitative comparison between coronal slices of the registered and target image sets of three volunteers.
Zoomed regions (depicted in the right images) are shown to emphasize differences in sharpness and signal intensity.
Mean scores of the blinded reading of the patient data sets are displayed in Fig. 6.
The registered data got better scores in all categories compared with the targets.
Dir-reg is rated better compared to nn-Reg in all categories except structure delineation in expiration. Inspiration target and nn-Reg images are much lowerrated compared with expiration whereas dir-Reg scores are better in inspiration.
Over all categories, mean expiration scores were 2.92 ± 0.18 for the target, 3.19 ± 0.22 for nn-Reg and 3.56 ± 0.14 for dir-Reg and mean inspiration scores 2.25 ± 0.12 for the target, 2.72 ± 0.04 for nn-Reg and 3.78 ± 0.04 for dir-Reg. The patient images in Fig. 7 exemplarily visualize these findings. The artifacts in the lower patient images were caused by metal clips. Dir-Reg images show a more homogeneous signal compared with nn-Reg and especially with the target. In the case of inspiration, the image quality of the target and nn-Reg images strongly decreases compared to dir-Reg images.

Discussion
Four-dimensional imaging is essential in target volume and organ at risk delineation in radiotherapy treatment planning for moving targets. The total extent of tumor motion between expiration and inspiration is required in some treatment planning concepts [1,2]. Two different non-rigid image registration schemes are used to increase the signal intensity of end-expiratory and end-inspiratory phases while preserving the motion artifact reduction. To this end, all phases are registered onto either end-expiration or end-inspiration serving as targets. The results are compared amongst the schemes and the corresponding targets in terms of quantitative evaluation of image homogeneity (coefficient of variation) and sharpness as well as qualitative expert reader rating. The CV was used for evaluation because SNR measurements using signal and noise regions are not reliable if multichannel and parallel imaging reconstructions are used [17,26].
Both registration schemes used in this work lead to lower median CV values (higher homogeneity) compared to the targets (Fig. 3). Median CV values were lower for dir-Reg in all comparisons: because artifacts are propagated from adjacent phase to phase in case of nn-Reg. This leads to less efficient suppression of the artifacts due to averaging. Boldea et al. [27] used similar registration schemes with 4D-CT data to calculate patient-specific tumor motion and hysteresis. Accuracy and consistency were globally evaluated and the differences between the schemes were found to be statistically not significant. However, they state that differences may occur in a local evaluation due to various sources of errors [28] having a different impact on the two registration schemes. In contrast, the local evaluation of CV in this study showed significant differences between the registration schemes.
The image quality of inspiration is usually lower compared to expiration using equal bin sizes because resting times in end-inspiratory phases are shorter and the position is not as reproducible compared to end-expiratory phases (see respiration curve in Fig. 1). This leads to increased motion artifacts, noise and under-sampling artifacts due to less accepted data within the gating process [29,30]. As a result, random intestinal peristaltic and heart motion which cannot be gated are also more pronounced due to the reduced averaging effect [31,32]. These factors influence sharpness and CV evaluation as illustrated in Figs. 1 4 and 5. Median inspiration sharpness is lower in all comparisons than the corresponding expiration sharpness (Fig. 4). Moreover, median target sharpness can even be improved by registration in case of inspiration. The sharpness of structures strongly depends on the proximity to the diaphragm. Consequently, tumor sharpness is much higher compared to diaphragm sharpness in this study because most of the tumors are located further away. Statistically significant differences could be found in most of the comparisons ( Table 1).
The qualitative rating results of the patient measurements confirm the findings discussed above (Fig. 6). Expiration target images are rated better in all categories compared to inspiration. Targets were rated lower than both registration schemes.
The dir-Reg scheme got the best noise and overall image quality scores. Structure scores of dir-Reg and nn-Reg were almost the same for expiration but dir-Reg scored higher in inspiration. Dir-Reg inspiration scores are even better than the corresponding expiration scores. This may be explained by the simultaneous reading used in this work. The poor image quality of the inspiration target image could influence the perception of the benefit in image quality.
Buerger et al. used a golden-radial phase encoding acquisition with respiratory selfgating to reconstruct a high-quality reference image and various higher undersampled phase-resolved images [33]. Non-rigid registrations are subsequently applied between the reference and all other under-sampled phases, leading to high-quality 4D images. In contrast to this work, our focus was to improve the image quality of end-expiration and end-inspiration phases. All the acquired data is used for image quality enhancement of these two phases respectively. However, the proposed method by Buerger et al. could additionally be applied to exploit the same acquired data.
4D MRI reconstruction and registration times are varying due to individual breathing leading to various under-sampling patterns responsible for parallel imaging performance, initial image quality and different volume sizes used in the registration process. Average 4D MRI reconstruction and registration times in this work are too long because MRI data sets should be available on the same day as 4D CT to assist in the delineation process.
However, almost the whole volume in cranio-caudal and anterior-posterior direction was used for quantitative and qualitative evaluation in this work. Focusing on the lesions will decrease the volume and hence registration time. The increasing use of image registration in different applications in the context of MR guided radiotherapy [33][34][35][36], registration algorithms capable of parallel computing [37] and the use of different parallel imaging reconstructions will lead to further time reductions [38].
The physician should decide if registration in case of expiration is necessary or not and if nn-Reg should be preferred because of shorter reconstruction times if image quality is supposed to be not significantly lower compared to dir-Reg. If these approaches will lead to sufficient time reductions to integrate the proposed method in clinical daily routine is currently investigated. Simultaneously, geometric fidelity of the sequence protocol will be examined to guarantee the reliable use of the images in the radiotherapy workflow.

Conclusion
In this work, end-expiratory and inspiratory phases of a retrospective respiratory self-gated 4D MRI data set of volunteers and patients were used as targets for nonrigid image registration. All other phases of the 4D MRI were registered onto these targets using two different registration schemes. It was shown that image quality of the target images can be significantly increased while motion artifact reduction is preserved allowing for improved lesion detection.

Declarations
Ethics approval and consent to participate Ethics has been reviewed and approved by the Ethics Committee of the University of Wuerzburg. Reference number: 179/11. Written informed consent was obtained from all participants before study inclusion.

Consent for publication
Not applicable.

Availability of data and materials
The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

Competing interests
All authors declare that they have no competing interests.   Figure 1 Schematic illustration of the two registration schemes with the end-expiratory phase as targ Boxplots of the normalized CV for the two registration schemes for expiration and inspiration Comparison between exemplarily coronal slices of the registered and target image sets of th Mean and standard deviation of the readers scoring derived from the patient measurements. Exemplary transversal slices of the registered and target image sets for each patient contain