Combination of convolutional networks with a learning based ensemble strategy for hippocampus subfield segmentation

Background Accurate segmentation of hippocampal subfields from magnetic resonance (MR) brain images is an important step for studying brain disorders, including epilepsy, Alzheimer’s Disease (AD) and Parkinson’s disease. However, it is a difficult task because of the low signal contrast and small structural size. Many advanced convolutional networks have been proposed and have achieved state-of-the-art performances in various applications. To take advantage of these advanced convolutional networks, in this paper, we propose a learning based ensemble strategy to integrate the results of different convolutional networks for hippocampus subfield segmentation. Our ensemble strategy is implemented by using a convolutional network. We have validated the proposed method based on a publicly available dataset. The experiment results have showed that the proposed ensemble strategy can significantly improve the performance of each single convolutional network, and outperform the state-of-the-art hippocampus subfield segmentation method. The proposed ensemble strategy is effective for combining multiple different convolutional networks in hippocampus subfield segmentation.


Results
The experiment results have showed that the proposed ensemble strategy can significantly improve the performance of each single convolutional network, and outperform the state-of-the-art hippocampus subfield segmentation method.

Conclusion
The proposed ensemble strategy is effective for combining multiple different convolutional networks in hippocampus subfield segmentation.

Background
Hippocampus is a bilateral brain structure, involved in many brain disorders, such as epilepsy, Alzheimer's disease (AD), and Parkinson's disease 1 . It consists of several histologically and functionally specialized subfields: the subiculum (SUB), the cornu ammonis sectors (CA) 1-3, and the dentate gyrus (DG) 2 . The studies have shown that different diseases affect different subfields, which suggest that hippocampal subfields may provide more precise information for earlier disease diagnosis than using the whole hippocampus 3 .
To characterize subfields for studying brain disorders, it is critically important to accurately segment subfields from magnetic resonance (MR) images. Manual segmentation is often treated as a gold standard, but it is time consuming and tedious, along with inter-and intra-observer variability. Thus, automatic segmentation of hippocampal subfields is desirable, especially for large-scale disease studies. Among the automatic image segmentation methods, multi-atlas based image segmentation (MAIS) methods have been applied in both the hippocampal segmentation and hippocampal subfield segmentation [4][5][6][7] . In the MAIS methods, all atlas images are first registered to a target image. Then, the corresponding atlas label maps are warped into the target image space and further fused to obtain the final segmentation for the target image 8,9 . However, these MAIS methods often need accurate nonlinear registration between each atlas image and the target image, which is very time-consuming.
In recent years, deep neural networks have been widely applied in medical image segmentation, without the need of image registration and designing features 10,11 . For example, in 12 , Unet was proposed for biomedical image segmentation, which adopted skip connections to connect the contracting path and the symmetric expanding path. In 13 , a volumetric convolutional network with mixed residual connections was proposed for prostate segmentation. In 14 , V-net was proposed for volumetric medical image segmentation, in which the residual connections was also used. In 15 , pooling-free fully convolutional networks with dense skip connections were proposed for semantic segmentation, which were applied to brain tumor segmentation. In 16 , deep voxelwise residual networks were proposed for brain segmentation from 3D MR images. In 17 , a fully convolutional network was proposed for quick and accurate segmentation of neuroanatomy. In 18 , generative adversarial networks were used for hippocampal subfields segmentation in brain MR images. In 19 , a dilated dense U-net for was proposed for infant hippocampus subfield segmentation. However, it is often difficult to select a suitable network for a specific task, since there are so many candidate segmentation networks.
To alleviate the difficulty of network selection and also take advantage of multiple advanced convolutional networks, in this paper, we propose a learning-based ensemble strategy to combine the results of multiple convolutional networks, for obtaining more accurate segmentation than the case of using just a single convolutional network. Specifically, we adopt three segmentation networks in our task, i.e., 1) Unet-like structure (Unet-like) 12 , 2) Unet-like structure with residual connections (ResUnet) 20 , and 3) fully convolutional network with residual connections (ResFCN) 21 . We validate our proposed method for hippocampal subfield segmentation based on a publicly available dataset. The experimental results demonstrate that our proposed learning based ensemble strategy can significantly improve the performance of each single convolutional network, and also outperforms the state-of-the-art hippocampal subfield segmentation method 4 .

Dataset and Pre-processing
The proposed method was validated on a publicly available dataset (https://www.nitrc.org/projects/mni-hisub25) 22 . The dataset contains 25 subjects with manual labels for hippocampal subfields (CA1-3, CA4/DG and SUB). To facilitate the processing, we identified a bounding box to remove non-hippocampus region, with the box being large enough to cover the whole hippocampus. Then, we applied a histogram matching to the cropped images such that brain tissues have similar intensity levels across all the subjects 23 . To leverage the limited data, we left-right flipped each training image to double the number of training subjects.

Experimental Details
Five-fold cross validation was used in the experiment. In each fold, we selected 20 subjects for training, in which 15 subjects were used for training Unet-like, ResUnet and ResFCN, and 5 subjects were used for training FuseNet. The rest of 5 subjects were used for testing. As the number of training subjects is limited, we randomly extracted patches from each training subject, instead of using the whole images as the input for each network. The patch size was set to 32 × 32 × 32. Since both T1w and T2w images are available, we concatenated the extracted T1w and T2w image patches as input to Unet-like, ResUnet and ResFCN. The cross-entropy loss was used in each network.
The networks were trained by Adam method with a mini-batch size of 15, and implemented with caffe 24 . For Unet-like, ResUnet and ResFCN, the learning rate was initially set to 0.0001, and was decreased by a factor of = 0.1 every 10000 iterations. For FuseNet, the learning rate was initially set to 0.0001, and was decreased by a factor of = 0.25 every 4000 iterations. We used a weight decay of 0.0005 and a momentum of 0.9 for all networks.  Fig. 2 shows the hippocampal subfield segmentation for a randomly selected subject obtained by manual segmentation and different networks, respectively. From Fig. 2, we can find that local segmentation errors (indicated by black circles) introduced by each single network can be effectively corrected by ensemble learning.

Results
We also compare our proposed method with a state-of-the-art hippocampal subfield segmentation method 4 . The results are listed in Table 2. Note that, for fair comparison, we used the published results of HIPS as reported in 4 . It shows that our proposed method FuseNet outperforms HIPS method, especially for segmenting the CA4/DG subfield which is the most difficult task.

Discussion
The idea of the proposed learning based combination strategy was inspired by the multi-atlas label fusion, which was used to fuse the warped label maps from multi-atlases, while the proposed method was used for fusing the results of multiple 3D deep convolutional networks. Compared to multi-atlas segmentation methods, the proposed method do not need the nonlinear registration, and will be much faster in the testing stage. A recent work has proposed three Ensemble-Nets for hippocampal segmentation 25 . The purpose of their ensemble learning is to combine the label decisions from 2D multi-views, such that the complementary information residing in the multiple label probabilities produced by their 2D networks can be captured. By contrast, our ensemble strategy is to tackle the difficulty of network model selection by fusing the results of different 3D networks with additional convolutional network.
According to the results, we also find that ResUnet outperforms Unet-like and ResFCN, which demonstrates both the long-skip connection and the short-skip connection (residual connection) are useful 26 . With the long-skip connection, high-resolution features from the contracting path are combined with upsampled features, which can supplement the information that is lost during pooling.
The residual connection can promote information propagation, accelerate the convergence, and improve performance.

Conclusion
In this paper, we have proposed a learning based ensemble strategy to combine the results of three

Methods
In this section, we first introduce three different convolutional networks, namely Unet-like, ResUnet and ResFCN. Then, we present a learning based ensemble strategy to combine the results of these three networks.

Unet-like Structure
The Unet-like structure is shown in Fig. 3. It consists of a contracting path and an expansive path. The contracting path repeats the application of two 3 × 3 × 3 convolutions and one 2 × 2 × 2 max pooling operation with stride 2. Correspondingly, the expansive path repeats the application of one 4 × 4 × 4 deconvolution with stride 2, and two 3 × 3 × 3 convolutions. Each convolution is followed by a batch normalization and a rectified linear unit (ReLU). Different from the original Unet, padded convolution layers are used to maintain the spatial dimension, and also feature maps in the contracting path are added to the corresponding feature maps with the same resolution in the expansive path with long-skip connections. This element-wise summation can reduce the dimension of features compared to conventional concatenation.

Residual Connection
Residual connection is introduced in the deep residual network 27 . It has been demonstrated that the residual connection can promote information propagation and accelerate the convergence. To exploit the advantage of the residual connection, we group every two convolutional layers with a residual connection in the Unet-like structure, and obtain the residual Unet-like structure (ResUnet). To investigate the influence of long-skip connections in residual networks, we also present a fully convolutional structure with residual connections (ResFCN), by removing the long-skip connections from ResUnet.

Learning based Ensemble Strategy
To fully take advantage of these advanced nets, we propose a learning based ensemble strategy to combine the predictions , from each network with trainable weighting coefficients , , ∈ {1, … , }, ∈ {1, … , }, where , is the probability map for the th hippocampal subfield estimated by the th network, ) .

=1 =1
(2) The optimization problem (2) is considered as a neural network with 1 × 1 × 1 convolutions, which is named as FuseNet. The architecture of FuseNet is shown in Fig. 4. In the FuseNet, we concatenate probability maps for each subfield that are obtained from different networks, and then fuse them with a 1 × 1 × 1 convolution.

Evaluation
We evaluated the image segmentation results based on Dice coefficient (Dice), which is used to measure the relative volumetric overlap between the automated segmentation and the manual segmentation 28 . By denoting as the manual segmentation, as the automated segmentation, and ( ) as the volume of segmentation , Dice is defined as: .