Automated Hippocampus Segmentation and Volume Estimation Using a Transformer-based Deep Learning Architecture

doi:10.21203/rs.3.rs-3238001/v1

Download PDF

Research Article

Automated Hippocampus Segmentation and Volume Estimation Using a Transformer-based Deep Learning Architecture

https://doi.org/10.21203/rs.3.rs-3238001/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

Hippocampus segmentation in brain MRI is a critical task for diagnosis, prognosis, and treatment planning of several neurological disorders. However, automated hippocampus segmentation methods have some limitations. More precisely, hippocampus is hard to visualize through MRI due to the low contrast of the surrounding tissue, also it is a relatively small region with highly variable shape. In this study, we propose a two-stage architecture to first locate the hippocampus and then segment it. We combine a transformer design with CNN based architecture and a customized loss function to segment the hippocampus via an end-to-end pipeline. In the encoding path, the image is passed through a CNN model to generate a feature map. This feature map is then divided into small patches which are passed to a transformer for extracting global contexts. The encoder used here is identical to that of the Vision Transformer image classification model. In the decoding path, the transformer outputs are combined with their corresponding feature maps to enable a precise segmentation of the hippocampus. The proposed architecture was trained and tested on a dataset containing 195 brain MRI from the Decathlon Challenge. The proposed network achieved a Dice value of 0.90±0.200, and 89% mean Jaccard value in this segmentation task. The mean volume difference between generated mask and ground truth is 5% with a standard deviation of 3%. Deploying the proposed method over our in-house data, consisting of 326 MRIs, showed a mean volume difference of 4.4 % with a standard deviation of 3.24%.

Hippocampus segmentation

Hippocampus volume estimation

Magnetic Resonance Images (MRI)

Alzheimer’s disease

TransUnet

Convolutional neural network

Vision Transformers (ViT)

UNet.

Hippocampal atrophy is among the most informative early diagnostic biomarkers of AD. The hippocampus is a region of interest (ROI) for various research studies that include memory function analysis, stress development observation and prediction of neurological and neurodegenerative disorders. Tracking hippocampal atrophy, which is expressed through accurate volume calculations, could help in both multiclass classification and tracking of a neurodegenerative disease.

To determine brain morphological changes, regions of interest (ROIs) should be first extracted from the MR images. Currently in the medical field, this process is performed manually which is not only tedious and time consuming but also subjective, error prone and non-repeatable [1]. However, manual segmentation, when performed by two or more experts, can serve as the gold standard for comparing volumetric results obtained from automated segmentation techniques [2]-[5]. The automatic segmentation of the brain is important not only to overcome the tediousness and the labor-intensive nature of manual tracing, especially in 3D medical images but also augments the practicality of volumetric calculations of disease prone regions for early detection, multiclass classification, gauging disease progression, and performing prediction on future outcomes. Therefore, great attention has been devoted in the literature to automate this type of 3D segmentation process. It is important to emphasize, however, that regardless of the ways this task is performed, manually or automatically, the burden of proof lies in the fact that, beyond the imbalance problem between ROI and background information (i.e., the ROI appears in a very small portion of the entire dataset), there are no clear boundaries between the different regions of the brain. The complexity of this task is best expressed in [4], showing that even when there is a very high agreement rate between four expert tracers (pairwise Jaccard indices 0.82-0.87). The volumetric results, obtained on the HarP benchmark dataset containing 135 MRIs, still showed a mean volume difference of 9% with a standard deviation of 7%.

Segmentation has been a critical task in medical image analysis as it allows for the quantification of key anatomical regions and trace changes in cross-sectional or longitudinal studies [6]. Segmentation can also help in comparative studies using multimodal neuroimaging and multiclass classification. In neuroimaging, several methods have been proposed for various tasks including brain extraction, anatomical ROIs segmentation, White Matter Lesion (WML) segmentation, brain tumor segmentation, etc. Each of these tasks has its specific challenges. With the recent advances in image processing and computer vision and the advent of convolutional neural networks (CNNs), automated segmentation of medical images has also made significant progress. Moreover, segmentation through other imaging or machine learning approaches could extend to other organs like lungs or liver, as our research group has reported in [7] and [8].

Currently the most accurate and effective semantic segmentation models are based on fully convolutional networks (FCN) [9]. These models are composed of several convolutional layers followed by pooling layers which gradually expand the receptive field and create high-level semantic features. However, by passing through a pooling layer, the size of the feature map is reduced and small objects or nuance changes in the pixels get lost. This leads to an inaccurate boundary reconstruction in the segmentation map. To address this drawback, multiscale feature maps have been used in the segmentation networks. This technique helps to effectively aggregate complementary information from various scales and complements the missing boundaries of feature maps [10]. Among many CNN models with multi-scale feature maps (DeepLab, PSPNet, etc.) UNet-based architecture has shown superior performance over other architectures in biomedical domain. UNet-based architectures with deep feature representation, contraction-expansion path and skip-connections have outperformed other models in medical image segmentation tasks, where fine grained masks and accurate margin identification are highly advantageous. While efficient feature extraction with the UNet-based network has shown great success in image segmentation tasks, such as in kidney tumor segmentation and lung tumor segmentation, automatic hippocampal segmentation remains a challenging task due to the low contrast of the surrounding tissue, complicated further by its small size and irregular shape. More specifically, the small size of the hippocampus compared to the whole brain creates an imbalance between ROI and background information, thus compromising the performance of any segmentation model. To overcome this imbalance, this study presents a two-step hippocampal segmentation framework in which the first step is to locate a candidate region surrounding the hippocampus structure in the brain. The candidate region expected to contain the hippocampus, will have a relatively higher ratio of ROI to background, hence attenuating the imbalance data problem and improving the segmentation performance. After extracting the voxels containing the hippocampus, the second step is to leverage an efficient segmentation model based on the combined Transformer and UNet architecture to generate the segmentation mask. This architecture can efficiently model global contexts without losing localization ability as the low-level details are maintained. We have modified the original architecture, changed the loss function, added data augmentation and trained the model on our data to achieve high performance in the hippocampus segmentation task. The hippocampus volume, which is captured almost in real time from this model in the testing phase, can be utilized to enhance AD diagnosis. More importantly, as this study will demonstrate, the proposed segmentation algorithm can be deployed on other datasets of interest in pursuit of similar research goals that rely on segmentation and volumetric measurements.

To evaluate the merits of the proposed segmentation task, we further propose multiple combination of Unet, Unet++, and FPN networks coupled with strong feature extraction backbones pre-trained with large datasets of ImageNet or Instagram. The performance of the aforementioned models on the hippocampus segmentation task are thus provided and drawbacks and strengths of each model are discussed in Section IV. In most models, the pipeline starts with slice-wise feature extraction of voxels attributed to the hippocampus region to be mapped to the target binary mask of the structural MRI. In this method, the slice-wise voxels are the volumetric features. The proposed CNN architecture learns the hidden representation of slice-wise volumetric features and ultimately calculates a segmentation mask for the brain hippocampus region.

This research work is organized in the following manner. In section II, literature related to hippocampal localization and segmentation is presented. The methodology including an explanation of the dataset, data processing, neural network architecture, and utilized loss functions are discussed in section III. Experimental results are provided in section IV, starting with an introduction to the computational platform used to run these experiments followed by a performance evaluation of the proposed method compared to other state-of-the-art methods. Section V provides a discussion with concluding remarks on the contributions of this research work and current limitations that need to be overcome for future neuroimaging segmentation tasks.

This section focuses on how the hippocampus region is located and then segmented. Recent research findings suggest that hippocampal volumetry is an important quantitative metric that can serve as a biomarker for neurodegenerative diseases like Alzheimer’s disease (AD) [11]-[14], Parkinson’s disease [15] neurological disorders like epilepsy [16], neuro-psychological disorders like bipolar disorder, schizophrenia [17][18], and depression [19][20]. Hence, segmenting the hippocampus correctly is of utmost importance for assessing any subtle structural changes and volumetry that may be due to the pathology. In fact, volumetric alterations and hippocampal degeneration could be useful in the classification of the different stages of AD [21]-[23], and in distinguishing between AD patients from dementia with Lewy bodies (DLB) patients [23], and AD from patients with subcortical ischemic vascular dementia [24].

A. Hippocampus Localization

To estimate the hippocampus’ location in the brain MRI, several techniques have been proposed. Bender et al. [25] have proposed a technique that first applies both rigid and contour-based image registration on the brain images (CT or MRI), then generates a population-based hippocampal atlas by mapping the hippocampus from several patients into a template image set. Estimated hippocampal contours can be automatically formed from each image set by mapping this atlas onto it. Multiple researchers have proposed similar segmentation techniques based on the brain atlas registration [26]-[28]. Thus, techniques based on brain image registration became the most widely used brain anatomical segmentation techniques. Such techniques are adopted in many robust software packages such as FreeSurfer. FreeSurfer explicitly starts with some pre-processing steps including motion correction, affine transformation to image space, non-uniform intensity normalization, and removal of non-brain tissues. Then, the image volume will be intensity-normalized to match the brain atlas image intensity histogram. The final step to perform brain structure segmentation is a nonlinear warping of the atlas brain image to the sample brain image. This technique is mostly utilized for atlas-based tissue segmentation, in labeling the brain stem, subcortical structures, cerebellum, and cerebral cortex [29].

An intuitive approach has been proposed by Hajiesmaeili et al. [30]. Their algorithm performs 3D skull stripping and extracts the brain volume first followed by a distance estimation from the first slice of the brain to the first slice where the hippocampus appears in all 3 views of coronal, sagittal and axial. This algorithm is particularly beneficial if a rough estimate of the hippocampus structure is needed.

Basher et al. [31] have presented a two-stage process where the model first locates left and right hippocampal tissues in the MRI with the Hough-CNN model, then slices of the hippocampus are sent to a Discrete Volume Estimator CNN model to extract features from both hemispheres of the hippocampal tissue. All the features are then aggregated and passed to a final deep neural network where the AD classification happens. Suk and Shen [32], on the other hand, propose to combine latent information (more complex information of the low-level features) with the original low-level features to help build a robust model for AD/MCI classification with high diagnostic accuracy.

B. Hippocampus Segmentation

Although manual segmentation or tracing of the hippocampus region remains the gold standard, there is clearly a need for automating the segmentation process [3],[4],[12], [13]. Automatic hippocampus segmentation techniques are mainly based on two types of approaches: 1) Conventional image processing techniques and atlas registration [33], [34]; and 2) CNN-based approaches for feature extraction and volume estimation. While many useful methods and software packages have been proposed based on brain multi atlas segmentation, the performance of such models relies mostly on the accuracy of the registration, atlas selection, the type of brain atlas used, and label fusion. These factors reduce the model applicability for general use cases. With the growing success of deep learning models in various applications, the biomedical field shifted the attention toward this new line of research [35], [36].

To overcome the high dimensionality of 3D brain images, Liu et al. [37] introduced a landmark-based feature learning approach. In their model, critical informative anatomical brain regions are first detected with a heuristic algorithm. Instead of feeding the entire 3D image, patches of the image surrounding the landmark are fed to the CNN model. In contrast to many models that are trained with the entire 3D MRI, this approach needs far less computational power and is easier to carry out in the training phase. Zach et al. [38] have proposed a simplified protocol for hippocampal volume change measurement. They have found a single optimal slice of the brain MRI where the hippocampus is the only visible organ. Then, by calculating the area and volume of the hippocampus in this specific slice, they have tried to classify 40 subjects in normal control group against 40 subjects in AD group with these biomarkers. They have shown that this simple process can be a substitute for the complex methods of hippocampal volume estimation without any further needs for brain or skull normalization.

A. Dataset

In this study we have used two datasets. The data for training the segmentation model has been captured from the 2019 Medical Segmentation Decathlon Challenge. It consists of 105 patient data and 90 healthy subjects. The 3D structural MRI data were captured by a 3D T1-weighted Magnetization Prepared Rapid Gradient Echo Imaging (MPRAGE) sequence (TI/TR/TE, 860/8.0/3.7ms; 170 sagittal slices; voxel size, 1.0 mm3) all with the same machine. Tracing of head, body, and tail of the hippocampus have been performed on the entire data [39]. The data used to test the pipeline end to end has been acquired from Mount Sinai Medical Center (MSMC), Miami Florida as part of the data for the 1Florida Alzheimer’s Disease Research Center (ADRC). We first apply the MR-based skull-stripping technique to extract the brain from each MRI scan. Then the hippocampus is segmented in each brain image separately. The volumetric results from this second dataset are compared with the volumetric results obtained using FreeSurfer 6.0.

B. Evaluation Metrics

To quantitatively evaluate and compare the performance of the proposed method, four standard metrics were used. The mean Dice Similarity Coefficient (DCS) is used to measure overlaps between ground truth mask ${A}_{g}$ and predicted mask ${A}_{p}$.

$$DSC=\frac{1}{n}\sum _{i=1}^{n}\frac{\left|{A}_{{s}_{i}}\cap {A}_{{g}_{i}}\right|}{\left|{A}_{{s}_{i}}\right|+\left|{A}_{{g}_{i}}\right|}$$

The mean Jaccard Similarity Coefficient (JSC) is used to compare the similarity between ${A}_{g}$ and ${A}_{p}$.

$JSC=\frac{1}{n}\sum _{i=1}^{n}\frac{\left|{A}_{{s}_{i}}\cap {A}_{{g}_{i}}\right|}{\left|{A}_{{s}_{i}}\right|+\left|{A}_{{g}_{i}}\right|-\left|{A}_{{s}_{i}}\cup {A}_{{g}_{i}}\right|} (2$ )

The precision Index shows the overlapping ratio between ${A}_{g}$ and ${A}_{p}$ over ground truth mask ${A}_{g}$ while Recall Index (RI) shows the overlapping ratio between ${A}_{g}$ and ${A}_{p}$ over predicted mask ${A}_{p}$.

$$Precision\hspace{0.25em}Index=\frac{1}{n}\sum _{i=1}^{n}\frac{\left|{A}_{{s}_{i}}\cap {A}_{{g}_{i}}\right|}{\left|{A}_{{g}_{i}}\right|}$$

$$Recall\hspace{0.25em}Index=\frac{1}{n}\sum _{i=1}^{n}\frac{\left|{A}_{{s}_{i}}\cap {A}_{{g}_{i}}\right|}{\left|{A}_{{s}_{i}}\right|}$$

All these metrics are calculated per sample and the mean of all the metrics over the test dataset is reported. A good segmentation method should produce a high value in all the metrics.

C. Loss function

Class imbalance between object and background remains a major issue that impacts the segmentation task, especially in neuroimaging. Since a small ROI is usually suppressed through max pooling layers, solutions based on optimizing the cross-entropy loss function are often unsatisfactory. To overcome this issue, besides localization step, a mixed Focal Dice loss function has been adopted for the model training. This loss function is a weighted combination of modified focal loss and modified focal dice loss. Focal loss was first introduced to address the problem of class imbalance faced by cross-entropy loss. To do so, it down-weights the contribution of easy examples which in turn enables learning from harder examples. In this study, as we face class imbalance from the segmentation problem, we have adopted a weighted combination of modified focal loss ${L}_{mF}$ and modified focal dice loss ${L}_{mFD}$ as below:

$${L}_{MF}=\lambda {L}_{mF}+\left(1-\lambda \right){L}_{mFD}$$

$${L}_{mFD}=\sum _{c=1}^{C}{\left(1-mD\right)}^{\frac{1}{\gamma }}$$

where $\lambda \in \left[\text{0,1}\right]$ defines the relative weights of two components of the loss function and $\gamma$ is the focal parameter. Parameter C is the number of classes. The ${L}_{mF}$ term in (5) is defined as in (7)

$${L}_{mF}=-\alpha {\left(1-{p}_{t}\right)}^{\gamma }.{L}_{mCE} \left(7\right)$$

The ${L}_{mCE}$term is computed using Eq. (8).

$${L}_{mCE}=-\frac{1}{N}\sum _{i=1}^{N}\beta \left({t}_{i}-log\left({p}_{i}\right)\right)+\left(1-\beta \right)\left[\left(1-{t}_{i}\right)ln\left(1-{p}_{i}\right)\right]$$

$${p}_{t}=\left\{\begin{array}{ll}p& \text{if y = 1}\\ 1-p& \text{if y = 0}\end{array}\right.$$

The term ${t}_{i}$ refers to the Tversky index, an asymmetric similarity measure which is closely related to the Dice score and enables the optimization for output imbalance by tuning the weights assigned to false positives and false negatives. Details on the calculation of ${t}_{i}$ are provided in [40]. The $\alpha$ term in the range of [0, 1] controls the relative weighting of the Dice and cross entropy terms contribution to the loss, and $\beta$ controls the relative weights assigned to false positives and negatives. A value of $\beta >\frac{1}{2}$ penalizes false negative predictions more than false positives.

D. Network Architecture Design

In this section, we introduce a novel framework for hippocampus segmentation. This framework is composed of two modules: 1) hippocampus localization, and 2) hippocampus segmentation. In the first module, a heuristic model estimates the hippocampus location in the brain and produces a cropped area surrounding the hippocampal tissue. In the second module, the cropped area is passed through a segmentation model.

The design of the first module is inspired by [30], [31], [37]. This is a heuristic algorithm which first performs a 3D skull stripping to extract brain volume. Based on the ratio of the acquired volume to the average of the training set volume, it performs a relative distance estimation. This defines the distance of the first slice of the brain to the first slice where the hippocampus appears in all three views of coronal, sagittal and axial to determine a rough estimate of the hippocampus’ location in the 3D MRI.

In the second module, the cropped area surrounding the hippocampus is fed into a transformer-based segmentation model. The segmentation algorithm is inspired by the architecture design of TransUnet proposed by Chen et al. [41]. This new design is based on integration of Vision Transformer and UNet model, which has shown promising segmentation results on abdominal CT scans but has not been explored for 3D brain MRI segmentation. The latter task is viewed as even more challenging problem given the difficulty in delineating different regions of the brain in 3D MRI.

The hippocampus segmentation problem falls in the category of an imbalanced segmentation as the proportion of the region of interest is much less than that of the background. To improve the original implementation of the model for handling this imbalanced case, a combined loss function based on the focal loss approach has been adopted. Data augmentation techniques are applied to the data (random rotation and flipping) before feeding the data points into the segmentation model. The proposed pipeline is depicted in Fig. 1.

UNet architecture which was proposed in 2015 by Ronneberger et al. [42], has been one of the most dominant methods in medical image segmentation. Since then, this model has served as a building block of many other image segmentation models. In contrast to object detection which draws a bounding box around the subject and defines its corresponding label, in image segmentation, a fine binary map draws over the images and classifies each pixel separating the background and object/region of interest. UNet architecture is composed of two main paths: an encoding and decoding path.

The encoding path is made up of multiple convolution layers are followed by a max-pooling layer. Through this path, the model learns spatially relevant contextual information. A reverse decoding path adds precise localization to yield final a segmentation with a similar size to input image.

To improve the UNet architecture, in 2018, Zhou et al. re-designed the encoder and decoder paths and skipped the connections of the original UNet to introduce UNet++. The pathways in UNet + + are composed of a series of nested dense connections that reduce the semantic gap between the encoder and decoder’s feature maps. This strengthened connections in Unet + + has shown considerable improvement in segmentation tasks [43].

The attention mechanism was first proposed for natural language processing tasks and more recently expanded to the image processing and computer vision domains. Attention mechanism draws from human vision, in that once we know the context in which an object appears in a scene, we look for that same context when we search for that object in the future. Multiple research endeavors have improved their design by adding an attention mechanism in conjunction with convolution layer, one of them is MANet. While many of the proposed UNet architectures are based on multi-scale feature fusion, MANet suggests a new attention-based model.

While there are several studies which have exploited attention mechanism for image classification, a fully transformer-based model has been proposed by Google research team more recently in late 2020. This architecture is identical to the original transformer model proposed for Natural Language Processing (NLP). It processes a sequence of image patches like NLP tokens for image classification tasks. Vision Transformer has shown promising results compared to the state-of-the-art CNN models if it is trained on a large dataset for enough time while requiring substantially fewer computational resources to train [44].

This novel transformer-based architecture is applied to image classification purposes using an encoding module. For each image that is processed the model predicts a label. To make this model applicable to more complex tasks such as object detection and image segmentation, some modifications to the architecture are essential. To apply this model to a segmentation task, Chen et al. have coupled this transformer-based architecture with a decoding module inspired by UNet. They also applied the transformer encoding on the feature maps extracted from the third layer of a ResNet50 network. They selected this design after failing to obtain compelling results by following solely the original architecture which directly tokenized the original image.

This CNN-Transformer hybrid design performs better than pure transformer encoding as it allows the network to exploit high resolution CNN feature maps in the reconstruction path. The reconstruction path consists of several up-sampling units. The very first reconstruction unit gets the output of the transformer encoder and after up-sampling it, it concatenates the current feature map with the feature map of the last CNN layer of the ResNet in the corresponding encoding path to incorporate multi-scale information into the model. The outcomes pass through a 3x3 convolutional and Rectified Linear Unit (ReLU) layer to form the input for the next reconstruction unit. The same process applies two more times on the resulting output of each layer until the decoder reconstructs the segmentation task in the original size of the input.

A segmentation head is added at the end of the reconstruction path which classifies each pixel to its corresponding class and recovers the segmentation mask with the same resolution of the input image. The network architecture adopted from TransUnet is depicted in Fig. 2. To ensure a better performance in an imbalanced segmentation task, we have changed the original loss function to a combined focal loss inspired by [40].

In our design, unlike in the original Vision Transformer model, the image is passed through a CNN model to generate a rich feature map. Furthermore, the first few intermediate feature maps in the ResNet module are also kept, helping reconstruction in the up-sampling path. The final feature map, which has a 2D shape, will be split into fixed size 1x1 patches. Patches are flattened and linearly projected to a new latent space. To retain positional information, position embedding is added to each patch separately as an input to the transformer encoder unit. The transformer layer consists of layer norm and Multi Head Attention (MHA) unit. In this model, 12 transformer units have been stacked on top of each other. The final feature map is bi-linearly up-sampled and concatenated with the corresponding feature map in the down-sampling path. Each up-sampling block consists of a 2 up-sampling operator, a 3×3 convolution layer, and a ReLU function.

Considering the enormous success of the feature extractor networks (ResNet, SeNet, EfficientNet, ExceptionNet) in many segmentation tasks, we have integrated them into our pipeline and studied them extensively when they were paired with 3 distinct segmentation models (UNet,UNet++,MANet) to evaluate their performance when dealing with highly imbalanced segmentation tasks with the specific challenges such as when dealing with convoluted structures like the hippocampus and low contrast margins between the different brain regions.

The experiments were carried out on NVidia Titan RTX with 576 Tensor Cores for AI acceleration and 24 GB of GDDR6 (Graphics Double Data Rate 6) memory. The data has been normalized prior to running the experiments. With the 2019 Decathlon Challenge dataset, the segmentation dataset was divided into two distinct subsets: 194 (3D) volumes for training and validation through cross validation technique and the rest, the 65 (3D) volumes for testing. The batch size is set to 24 and the number of training epochs is set to 150. During every training iteration, the input was augmented by random rotation and flipping. The training process took about 13 hours. A Five folds cross validation technique has been deployed to train and test the end-to-end pipeline. For the MSMC dataset, we used 326 subjects and deployed our pipeline on them.

Optimization of the deep learning model is achieved using the Adam optimizer with a dynamic learning rate of 10e-4. As already mentioned, the first component in the proposed architecture is designed to reduce the imbalance between ROI and background by cropping a region in the brain expected to contain the hippocampus. However, due to the small size of the hippocampus in comparison to the background in the cropped 3D (1:5), the data is still considered imbalanced. To tackle this issue, a mixed Focal Dice loss has been utilized for model training. Similar to [45] and [46], 3D volumes are processed slice by slice, and in the test time, the final 3D structure is reconstructed by stacking all the slices together.

Evaluation of the proposed method was done by the mean Dice similarity coefficient, Jaccard score, mean precision, and mean recall. Related equations for calculating these metrics were provided earlier in the Evaluation Metrics section. To ensure a comprehensive and fair comparison, we have coupled different state of the art segmentation models with multiple powerful backbone architectures. The change in receptive field in the various design architectures along with the different choices of loss functions, depth and width of networks, and other design details resulted in different segmentation masks. The performance of the proposed two-stage architecture for carrying out the hippocampus segmentation task has been compared to various segmentation networks: Vanilla UNet, and UNet++. The implementation of these networks has been borrowed from the PyTorch segmentation models library [47]. For illustrative purposes, Fig. 3 shows the model segmentation mask against FreeSurfer mask and manual segmentation mask in three different subjects with the axial, coronal, and sagittal views. The results for each slice are then combined to generate the 3D volumes of the hippocampus region for each subject.

Table 1 provides a quantitative comparison of the segmentation performance of UNet network coupled with several outstanding feature extracting backbone architectures. Except for DenseNet and ResNet, this is the first study that couples the most advanced feature extractors like SeNet, EfficientNet and ResNext with UNet. As can be seen, UNet-SeNet and UNet-ResNext combinations have the highest DSC and JSC values in the current segmentation task, which supports the fact that SeNet and ResNext are stronger feature extractors than ResNet or the original UNet backbone architecture.

Table 2 summarizes the results of similar studies when coupling state-of-the-art feature extractors with the UNet + + segmentation model. These measured metrics are slightly higher values than their counterpart in Table 1 due to the better performance of UNet + + with almost all the combined feature extractors. UNet + + is an upgraded version of UNet with dense skip pathways that attempt to reduce the semantic gap between the feature maps of the encoder and decoder modules. From Table 2, the same conclusion that SeNet and ResNext do enhance the segmentation model can be drawn.

Table 3 shows the result of an experiment with MANet segmentation model paired with a similar set of backbone architectures. Considering that MANet has outperformed UNet + + in kidney segmentation task, it shows relatively similar outcomes to UNet++; however, not all backbone architectures increase the segmentation results as in UNet++. This difference can be explained by the attention-based mechanism of MANet, which is not a typical CNN based architecture.

Table 1

Performance of UNet model paired with different feature extraction backbones. When a reference is not available, * means implemented by our group
Method	Backbone	Dice	Jaccard	Precision	Recall	Ref
UNet	ResNet	0.882	0.791	0.883	0.884	[48]
UNet	DenseNet	0.885	0.795	0.893	0.878	[49]
UNet	ResNext	0.885	0.796	0.912	0.862	*
UNet	EfficientNet	0.886	0.797	0.879	0.895	*
UNet	Xception	0.884	0.794	0.893	0.877	*
UNet	SeNet	0.888	0.801	0.896	0.883	*

Table 2

Performance of UNet + + model paired with different feature extraction backbones. When a reference is not available, * means implemented by our group
Method	Backbone	Dice	Jaccard	Precision	Recall	Ref
UNet++	ResNet	0.889	0.802	0.899	0.882	[50]
UNet++	ResNext	0.893	0.802	0.8992	0.882	*
UNet++	EfficientNet	0.8865	0.797	0.879	0.895	*
UNet++	Xception	0.886	0.801	0.897	0.879	*
UNet++	SeNet	0.893	0.809	0.899	0.889	*
UNet++	ResNet	0.889	0.802	0.899	0.882	[50]

Table 3

Performance of MANet model paired with different feature extraction backbones. All combinations proposed by our group
Method	Backbone	Dice	Jaccard	Precision	Recall
MANet	ResNet	0.886	0.796	0.896	0.877
MANet	ResNext	0.801	0.717	0.819	0.792
MANet	DenseNet	0.877	0.785	0.92	0.841
MANet	EfficientNet	0.881	0.79	0.903	0.862
MANet	Xception	0.872	0.777	0.921	0.832
MANet	SeNet	0.891	0.805	0.897	0.887

In this study, we have sought the best segmentation model for the second stage of our pipeline. We have investigated various segmentation models such as UNet and UNet + + which are based on convolution mechanism and more recent architectures which are based on combination of convolution and attention mechanism such as MANet.

The best performing segmentation-feature extraction pairs from the above experiments have been gathered and compared together in Fig. 4. As can be seen from the results, the best performance is attributed to TransUnet with adopted mixed Focal Dice loss. This network is one of the most recent attention-based models with Vision Transformers and Resnet feature extractor combined. The results in this figure show better performance of the TransUnet model over other models as expected in both metrics of DSC and JSC. This can be explained by the high performance of the Vision Transformer in terms of feature extraction. Finally, we have performed an Anova test to verify the statistical significance of the difference in the model performance (P-value greater than 0.05). The results presented in Table 4 and Fig. 4 show that our proposed model performs at least 2% higher in all four metrics (DSC, JSC, PI, RI) in comparison to UNet, UNet + + and MANet.

Table 4

The performance of different segmentation methods over Decathlon dataset. *Results of best performing combination of backbone for UNet, UNet + + and MANet models are illustrated.
Networks	Dice	Jaccard	Precision	Recall
UNet	0.888	0.801	0.896	0.883
UNet++	0.893	0.808	0.899	0.898
MANet	0.891	0.805	0.897	0.887
Transformer-Unet with CE loss	0.899	0.81	0.922	0.893
Transformer-Unet with proposed loss	0.911	0.821	0.923	0.913

The hippocampus is one of the most revealing disease-prone regions of the brain used as a biomarker for detecting the onset of AD as well as for assessing other neurological, neuro-psychological, and neurodegenerative diseases. To assist doctors in analyzing patient’s neuroimaging data, automatic, easy to use and deploy, fast, and accurate segmentation models are highly desirable. Therefore, in this study, we proposed a new two-stage pipeline for the segmentation of the hippocampus region from a 3D brain MRI. This pipeline can handle the imbalance in data which results from small sized ROIs like the hippocampus in contrast to the larger background of the brain mask.

The first module follows an intuitive approach that roughly estimates the location of the ROI in the brain MRI and makes a crop around that region. More specifically, the first module starts with skull striping and equalizing slice spacing in all 3 directions. Then, it defines the coordinates of the first slices that hippocampus occupies with respect to the brain atlas, crops the image and sends the cropped section to the segmentation module. We have strengthened the segmentation module by incorporating the advantages that vision-transformers and UNet provide.

The vision transformer recently released by Google is trained on a very large dataset with tremendous success in feature extraction, surpassing the already powerful ResNet performance. We further investigated the power of UNet, UNet + + and MANet for the current task when coupled with stronger feature extraction backbone architectures such as EfficientNet, SeNet or Xception. Our results support the assumption that a stronger backbone architecture leads to a better segmentation performance, showing Transformer-Unet with an adjusted loss can improve these results even further.

Our proposed pipeline based on Transformer-Unet with a combination of dice and focal loss functions is found to be more suited to the imbalance problem of brain segmentation. This new pipeline has shown 2% and 1% improvements in Dice and Jaccard Coefficient in current task, respectively.

Through this research endeavor, the following contributions are made: 1) Investigated and extended the segmentation models provided by previous studies with a focus on enhancing UNet architecture by coupling it with a stronger backbone architecture for hippocampus segmentation. A more advanced architecture composed of UNet-SeNet and UNet + + SeNet has been proposed with better performance than the original UNet. 2) Proposed an end to end two-stage pipeline to locate and then segment the hippocampal structure while addressing the unequal distribution of foreground and background elements. 3) Leveraged a Vision Transformer-

based architecture coupled with the UNet architecture trained for hippocampus segmentation task by adopting a modified focal dice loss function which is shown to optimize training on imbalanced data. 4) Compared the segmentation results obtained with the manual segmentation provided through the Medical Segmentation Decathlon Challenge of 2019 on 40 MRIs not seen in the training phase, showing a mean volume difference of 5% between them with a standard deviation of 3%. 5) Deployed the proposed segmentation method over our own 1Florida Alzheimer’s Disease Research Center (ADRC) data, consisting of 326 MRIs. A comparison with the FreeSurfer version 6.0 results showed a mean volume difference of 4.4% with a standard deviation of 3.24%. 6) Performed an exhaustive comparative assessment of relevant image segmentation architectures like UNet, UNet + + and MANet coupled with stronger feature extraction backbone architectures such as ResNet, DenseNet, EfficientNet, SeNet, or Xception using the metrics of Dice, mean Jaccard, Precision, Recall and average symmetric surface distance (ASD).

In terms of limitations, when deploying the proposed algorithm to process a different dataset as we did with the Mount Sinai data, the learned features of the base network will have to be repurposed to the target domain to ensure a good performance. To transfer knowledge from the source domain, the pre-trained network structure can thus be fully or partially utilized for the new task at hand. Also, using the new labeled data, the network can be adapted and re-trained on the new dataset that has been transformed to the same shape, slicing size, and dimensions as the original dataset on which the original model was developed.

To better appreciate the results summarized in terms of mean volume difference and standard deviation as reported in 4) and 5) earlier in the Discussion section, and to understand the complexity in segmenting brain regions like the hippocampus in 3D MRI, the study reported in [4] provides all the evidence we need in the challenge faced when segmenting brain regions in MR images. Interestingly, the authors of this study show that even when there is very high agreement among four expert tracers (pairwise Jaccard indices 0.82–0.87), the volumetric results among the four expert tracers obtained using the HarP benchmark dataset consisting of 135 MRIs still showed a mean volume difference of 9% between them with a standard deviation of 7%. Note that in the results reported in Tables I through III, they show a Jaccard value between 0.80 to 0.82 for the different architecture with the best result obtained with the Transformer-based architecture. These results on the Jaccard value together with the encouraging results indicated in 4) and 5) earlier as obtained through the proposed machine learning method come in support of the call for automated means to segment disease-prone regions like the hippocampus.

In the future, it will be worth investigating ways to enhance the proposed modified TransUnet with a more powerful feature extraction process using image processing coupled with machine learning that exploits the relational positioning or location of brain anatomical landmarks like the ventricles and cerebellum. This process could also involve relational positioning in between brain regions. For example, once the hippocampus is extracted, the focus will shift to searching for the expected region containing the amygdala to be segmented as the next step. Image processing for enhancing these segmentation tasks could include histogram modifications that extend the dynamic range or perform histogram equalization followed by specialized edge detection methods [51] that would take into consideration any noise effect [52. Such imaging and machine learning algorithms could be further integrated with the newest generation of Vision Transformers also called Token by Token transformers, which show considerable improvements over the traditional vision transformers for image classification and feature matching.

VII. Acknowledgements

We are grateful for the continued support from the National Science Foundation (NSF) under NSF grants CNS-1920182, CNS-1551221, and CNS-2018611. We also greatly appreciate the support of the 1Florida Alzheimer’s Disease Research Center (ADRC) (NIA 1P50AG047266-01A1) and the Ware Foundation. Data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Therapeutic Research Institute at the University of Southern California

B. Fischl, D. H. Salat, E. Busa, M. Albert, M. Dieterich, C. Haselgrove, A. Van Der Kouwe, R. Killiany, D. Kennedy, S. Klaveness et al., “Whole brain segmentation: automated labeling of neuroanatomical structures in the human brain,” Neuron, vol. 33, no. 3, pp. 341–355, 2002.
C. R. Jack Jr, M. S. Albert, D. S. Knopman, G. M. McKhann, R. A. Sperling, M. C. Carrillo, B. Thies, and C. H. Phelps, “Introduction to the recommendations from the national institute on aging-alzheimer’s association workgroups on diagnostic guidelines for Alzheimer’s disease,” Alzheimer’s & dementia, vol. 7, no. 3, pp. 257–262, 2011.
D. Wolf, M. Bocchetta, G. M. Preboske, M. Boccardi, M. J. Grothe, A. D. N. Initiative et al., “Reference standard space hippocampus labels according to the European Alzheimer’s disease consortium–alzheimer’s disease neuroimaging initiative harmonized protocol: Utility in automated volumetry,” Alzheimer’s & Dementia, vol. 13, no. 8, pp. 893–902, 2017.
M. Boccardi, M. Bocchetta, R. Ganzola, N. Robitaille, A. Redolfi, S. Duchesne, C. R. Jack Jr, G. B. Frisoni, E.-A. W. G. on The Harmonized Protocol for Manual Hippocampal Segmentation, for the Alzheimer’s Disease Neuroimaging Initiative, G. Bartzokis et al., “Operationalizing protocol differences for eadc-adni manual hippocampal segmentation,” Alzheimer’s & Dementia, vol. 11, no. 2, pp. 184–194, 2015.
M. Boccardi, M. Bocchetta, F. C. Morency, D. L. Collins, M. Nishikawa, R. Ganzola, M. J. Grothe, D. Wolf, A. Redolfi, M. Pievani et al., “Training labels for hippocampal segmentation based on the eadc-adni harmonized hippocampal protocol,” Alzheimer’s & Dementia, vol. 11, no. 2, pp. 175–183, 2015.
S. Gonzalez-Vill ´ a, A. Oliver, S. Valverde, L. Wang, R. Zwiggelaar, `and X. Llado, “A review on brain structures segmentation in magnetic ´ resonance imaging,” Artificial intelligence in medicine, vol. 73, pp. 45–69, 2016.
M. Eslami, S. Tabarestani, S. Albarqouni, E. Adeli, N. Navab, and M. Adjouadi, “Image-to-images translation for multi-task organ segmentation and bone suppression in chest x-ray radiography,” IEEE transactions on medical imaging, vol. 39, no. 7, pp. 2553–2565, 2020.
M. Goryawala, S. Gulec, R. Bhatt, A. J. McGoron, and M. Adjouadi, “A low-interaction automatic 3d liver segmentation method using computed tomography for selective internal radiation therapy,” BioMed research international, vol. 2014, 2014.
J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3431–3440.
Q. Hou, M.-M. Cheng, X. Hu, A. Borji, Z. Tu, and P. H. Torr, “Deeply supervised salient object detection with short connections,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 3203–3212.
H. Li, M. Habes, D. A. Wolk, Y. Fan, A. D. N. Initiative et al., “A deep learning model for early prediction of Alzheimer’s disease dementia based on hippocampal magnetic resonance imaging data,” Alzheimer’s & Dementia, vol. 15, no. 8, pp. 1059–1070, 2019.
N. Mattsson, P. S. Insel, M. Donohue, J. Jogi, R. Ossenkoppele, ¨T. Olsson, M. Scholl, R. Smith, and O. Hansson, “Predicting diagnosis ¨and cognition with 18f-av-1451 tau pet and structural mri in alzheimer’s disease,” Alzheimer’s & Dementia, vol. 15, no. 4, pp. 570–580, 2019.
D. C. Woodworth, H. L. Nguyen, Z. Khan, C. H. Kawas, M. M. Corrada, and S. A. Sajjadi, “Utility of mri in the identification of hippocampal sclerosis of aging,” Alzheimer’s & Dementia, vol. 17, no. 5, pp. 847–855, 2021.
Y. Liu, J. Meng, K. Wang, K. Zhuang, Q. Chen, W. Yang, J. Qiu, and D. Wei, “Morphometry of the hippocampus across the adult life-span in patients with depressive disorders: Association with neuroticism,” Brain Topography, pp. 1–11, 2021.
D. Weintraub, N. Dietz, J. E. Duda, D. A. Wolk, J. Doshi, S. X. Xie, C. Davatzikos, C. M. Clark, and A. Siderowf, “Alzheimer’s disease pattern of brain atrophy predicts cognitive decline in Parkinson’s disease,” Brain, vol. 135, no. 1, pp. 170–180, 2012.
R. E. Hogan, L. Wang, M. E. Bertrand, L. J. Willmore, R. D. Bucholz, A. S. Nassif, and J. G. Csernansky, “MRI-based high-dimensional hippocampal mapping in mesial temporal lobe epilepsy,” Brain, vol. 127, no. 8, pp. 1731–1740, 2004.
C. V. Ott, C. B. Johnson, J. Macoveanu, and K. Miskowiak, “Structural changes in the hippocampus as a biomarker for cognitive improvements in neuropsychiatric disorders: A systematic review,” European Neuropsychopharmacology, vol. 29, no. 3, pp. 319–329, 2019.
M. Styner, J. A. Lieberman, D. Pantazis, and G. Gerig, “Boundary and medial shape analysis of the hippocampus in schizophrenia,” Medical image analysis, vol. 8, no. 3, pp. 197–203, 2004.
N. Hansen, A. Singh, C. Bartels, F. Brosseron, K. Buerger, A. C. Cetindag, L. Dobisch, P. Dechent, B. B. Ertl-Wagner, K. Fliessbachet al., “Hippocampal and hippocampal-subfield volumes from earlyonset major depression and bipolar disorder to cognitive decline,” Frontiers in aging neuroscience, vol. 13, p. 153, 2021.
J. D. Bremner, M. Narayan, E. R. Anderson, L. H. Staib, H. L. Miller, and D. S. Charney, “Hippocampal volume reduction in major depression,” American Journal of Psychiatry, vol. 157, no. 1, pp. 115– 118, 2000.
R. Cui and M. Liu, “Hippocampus analysis by combination of 3-ddensenet and shapes for Alzheimer’s disease diagnosis,” IEEE journal of biomedical and health informatics, vol. 23, no. 5, pp. 2099–2107, 2018.
K. Kwak, H. J. Yun, G. Park, J.-M. Lee, A. D. N. Initiative et al., “Multimodality sparse representation for Alzheimer’s disease classification,” Journal of Alzheimer’s Disease, vol. 65, no. 3, pp. 807–817, 2018.
J. L. Whitwell, S. D. Weigand, M. M. Shiung, B. F. Boeve, T. J. Ferman, G. E. Smith, D. S. Knopman, R. C. Petersen, E. E. Benarroch, K. A. Josephs et al., “Focal atrophy in dementia with Lewy bodies on MRI: a distinct pattern from Alzheimer’s disease,” Brain, vol. 130, no. 3, pp. 708–719, 2007.
A. Du, N. Schuff, M. Laakso, X. Zhu, W. Jagust, K. Yaffe, J. Kramer, B. Miller, B. R. Reed, D. Norman et al., “Effects of subcortical ischemic vascular dementia and ad on entorhinal cortex and hippocampus,” Neurology, vol. 58, no. 11, pp. 1635–1641, 2002.
A. R. Bender, A. Keresztes, N. C. Bodammer, Y. L. Shing, M. Werkle Bergner, A. M. Daugherty, Q. Yu, S. Kuhn, U. Lindenberger, and ¨ N. Raz, “Optimization and validation of automated hippocampal subfield segmentation across the lifespan,” Human brain mapping, vol. 39, no. 2, pp. 916–931, 2018.
O. T. Carmichael, H. A. Aizenstein, S. W. Davis, J. T. Becker, P. M. Thompson, C. C. Meltzer, and Y. Liu, “Atlas-based hippocampus segmentation in Alzheimer’s disease and mild cognitive impairment,” Neuroimage, vol. 27, no. 4, pp. 979–990, 2005.
F. van der Lijn, T. Den Heijer, M. M. Breteler, and W. J. Niessen, “Hippocampus segmentation in MR images using atlas registration, voxel classification, and graph cuts,” Neuroimage, vol. 43, no. 4, pp. 708–720, 2008.
J. Wu and X. Tang, “Brain segmentation based on multi-atlas and diffeomorphism guided 3d fully convolutional network ensembles,” Pattern Recognition, vol. 115, p. 107904, 2021.
B. Fischl, “Freesurfer,” Neuroimage, vol. 62, no. 2, pp. 774–781, 2012.
M. Hajiesmaeili and M. Amirfakhrian, “A new approach to locate the hippocampus nest in brain MR images,” in 2017 3rd International Conference on Pattern Recognition and Image Analysis (IPRIA). IEEE, 2017, pp. 140–145.
A. Basher, K. Y. Choi, J. J. Lee, B. Lee, B. C. Kim, K. H. Lee, and H. Y. Jung, “Hippocampus localization using a two-stage ensemble Hough convolutional neural network,” IEEE Access, vol. 7, pp. 73 436–73 447, 2019.
H.-I. Suk and D. Shen, “Deep learning-based feature representation for ad/mci classification,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2013, pp. 583–590.
J. Barnes, R. Boyes, E. Lewis, J. Schott, C. Frost, R. Scahill, and N. Fox, “Automatic calculation of hippocampal atrophy rates using a hippocampal template and the boundary shift integral,” Neurobiology of aging, vol. 28, no. 11, pp. 1657–1663, 2007.
R. E. Hogan, K. E. Mark, I. Choudhuri, L. Wang, S. Joshi, M. I. Miller, and R. D. Bucholz, “Magnetic resonance imaging deformation-based segmentation of the hippocampus in patients with mesial temporal sclerosis and temporal lobe epilepsy,” Journal of digital imaging, vol. 13, no. 1, pp. 217–218, 2000.
J. Pipitone, M. T. M. Park, J. Winterburn, T. A. Lett, J. P. Lerch, J. C. Pruessner, M. Lepage, A. N. Voineskos, M. M. Chakravarty, A. D. N. Initiative et al., “Multi-atlas segmentation of the whole hippocampus and subfields using multiple automatically generated templates,” Neuroimage, vol. 101, pp. 494–512, 2014.
P. Aljabar, R. A. Heckemann, A. Hammers, J. V. Hajnal, and D. Rueckert, “Multi-atlas based segmentation of brain images: atlas selection and its effect on accuracy,” Neuroimage, vol. 46, no. 3, pp. 726–738, 2009.
M. Liu, J. Zhang, E. Adeli, and D. Shen, “Landmark-based deep multiinstance learning for brain disease diagnosis,” Medical image analysis, vol. 43, pp. 157–168, 2018. 12
P. Zach, A. Bartos, A. Lagutina, Z. Wurst, P. Gallina, T. Rai, K. Kies- ˇlich, J. Riedlova, I. Ibrahim, J. Tint ´ era ˇ et al., “Easy identification of optimal coronal slice on brain magnetic resonance imaging to measure hippocampal area in Alzheimer’s disease patients,” BioMed Research International, 2020.
A. L. Simpson, M. Antonelli, S. Bakas, M. Bilello, K. Farahani, B. Van Ginneken, A. Kopp-Schneider, B. A. Landman, G. Litjens, B. Menze et al., “A large annotated medical image dataset for the development and evaluation of segmentation algorithms,” arXiv preprintar Xiv:1902.09063, 2019.
M. Yeung, E. Sala, C.-B. Schonlieb, and L. Rundo, “A mixed focal loss ¨ function for handling class imbalanced medical image segmentation,” arXiv preprint arXiv:2102.04525, 2021.
J. Chen, Y. Lu, Q. Yu, X. Luo, E. Adeli, Y. Wang, L. Lu, A. L. Yuille, and Y. Zhou, “Transunet: Transformers make strong encoders for medical image segmentation,” arXiv preprint arXiv:2102.04306, 2021.
O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention. Springer,2015, pp. 234–241.
Z. Zhou, M. M. R. Siddiquee, N. Tajbakhsh, and J. Liang, “Unet++: A nested u-net architecture for medical image segmentation,” in Deep learning in medical image analysis and multimodal learning for clinical decision support. Springer, 2018, pp. 3–11.
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
Q. Yu, L. Xie, Y. Wang, Y. Zhou, E. K. Fishman, and A. L. Yuille, “Recurrent saliency transformation network: Incorporating multi-stage visual cues for small organ segmentation,” in Proceedings of the IEEE conf. on computer vision and pattern recognition, 2018, pp. 8280–8289.
Y. Zhou, L. Xie, W. Shen, Y. Wang, E. K. Fishman, and A. L. Yuille, “A fixed-point model for pancreas segmentation in abdominal ct scans,” International conference on medical image computing and computer-assisted intervention. Springer, 2017, pp. 693–701.
P. Yakubovskiy, “Segmentation models pytorch,” segmentation models. https://github.com/qubvel/. Pytorch, 2020.
X. Li, H. Chen, X. Qi, Q. Dou, C.-W. Fu, and P.-A. Heng, “H-denseunet: hybrid densely connected unet for liver and tumor segmentation from ct volumes,” IEEE transactions on medical imaging, vol. 37, no. 12, pp. 2663–2674, 2018.
Q. Zhang, Z. Cui, X. Niu, S. Geng, and Y. Qiao, “Image segmentation with pyramid dilated convolution based on resnet and u-net,” in International Conference on Neural Information Processing. Springer, 2017, pp. 364–372.
D. Jha, P. H. Smedsrud, M. A. Riegler, D. Johansen, T. De Lange, P. Halvorsen, and H. D. Johansen, “Resunet++: An advanced architecture for medical image segmentation,” in 2019 IEEE International Symposium on Multimedia (ISM). IEEE, 2019, pp. 225–2255.
M. Mafi, H. Rajaei, M. Cabrerizo and M. Adjouadi, “A Robust Edge Detection Approach in the Presence of High Impulse Noise Intensity through Switching Adaptive Median and Fixed Weighted Mean Filtering”, IEEE Trans. on Image Processing Vol. 27(11), 2018, pp. 5475 – 5490.
M. Mafi, H. Martin, M. Cabrerizo, J. Andrian, A. Barreto, and M. Adjouadi, “A comprehensive survey on Impulse and Gaussian Denoising Filters for Digital Images”, Signal Processing, Vol. 157, 2019, pp. 236–260.

No competing interests reported.

Download PDF

Editorial decision: Revision requested
12 Apr, 2024
Reviews received at journal
05 Jan, 2024
Reviewers agreed at journal
07 Dec, 2023
Reviewers agreed at journal
01 Dec, 2023
Reviewers invited by journal
07 Nov, 2023
Submission checks completed at journal
24 Aug, 2023
Editor assigned by journal
24 Aug, 2023
First submitted to journal
05 Aug, 2023

You are reading this latest preprint version

Automated Hippocampus Segmentation and Volume Estimation Using a Transformer-based Deep Learning Architecture

Status:

Version 1

Abstract

Figures

I. INTRODUCTION

II. BACKGROUND: AUTOMATED HIPPOCAMPUS SEGMENTATION

III. METHODS

IV. EXPERIMENTS AND RESULTS

V. DISCUSSION

VI. Conclusion

Declarations

VII. Acknowledgements

References

Additional Declarations

Status:

Version 1