Demographics
This study had two components. First, we developed and evaluated a deep learning algorithm using separate standard MRI sequences in a diverse cohort of adults (including ages and conditions, as detailed below) with the intent of providing a generalizable segmentation algorithm. Second, we applied the method to healthy adults across the lifespan to provide an exemplar for choroid plexus volumes.
Adult participants (n = 50 for model training; n = 98 for subsequent healthy control lifespan analysis) provided informed, written consent in accordance with the Vanderbilt University Institutional Review Board (IRB) and the Declaration of Helsinki and its amendments. All participants were enrolled between February 2020 and July 2023. It is well-known that the brain atrophies with advancing age and in various neurological disorders, and with this atrophy comes ventricular enlargement and often associated choroid plexus enlargement7–8. In order to make the proposed method as generalizable as possible, for algorithm training and development we deliberately enrolled a heterogeneous cohort of persons across the adult lifespan. Participants included healthy controls and patients with mild cognitive impairment (MCI), Alzheimer’s disease, Parkinson’s disease, and Huntington’s disease. Inclusion criteria for healthy control participants consisted of no history of cerebrovascular disease, anemia, psychosis, or neurological disorder including but not limited to prior overt stroke, sickle cell anemia, schizophrenia, bipolar disorder, Alzheimer’s disease, Parkinson’s disease, or multiple sclerosis. The presence of non-specific white matter lesions was not an exclusion criterion for healthy controls, as these lesions become more prevalent with aging, and we sought our cohort to be generalizable and representative. Diagnosis of Alzheimer’s disease, mild cognitive impairment, Parkinson’s disease, and Huntington’s disease was made by a board-certified neurologist (DOC; experience = 15 years).
Image acquisition
All participants underwent non-contrasted MRI at 3 Tesla with body coil radiofrequency transmission and 32-channel SENSE-array reception on a Philips Ingenia system (Philips Healthcare, Best, The Netherlands). Anatomical images consisted of: (i) 3-D T1-weighted MPRAGE (TR = 8.1 ms; TE = 3.7 ms; field of view = 256 x 180 x 150 mm3; number of slices = 150; spatial resolution = 1.0 x 1.0 x 1.0 mm3; duration = 4 minutes 32 seconds), (ii) 2-D T2-weighted fluid-attenuated-inversion-recovery (FLAIR) turbo-spin-echo (TR = 11,000 ms; TE = 120 ms; TI = 2800 ms; field of view = 230 x 184 x 144 mm3; number of slices = 29; spatial resolution = 0.57 x 0.57 x 4.0 mm3; duration = 1 minute 39 seconds), and (iii) 3-D T2-weighted turbo-spin-echo (TR = 2500 ms; TE = 331 ms; field of view = 250 x 250 x 189 mm3; number of slices = 242; spatial resolution = 0.78 x 0.78 x 0.78 mm3; duration = 4 minutes 8 seconds).
Manual segmentation of the choroid plexus
Data utilized for manual segmentation of the choroid plexus consisted of 3-D T1-weighted, 2-D axial T2-FLAIR-weighted, and 3-D T2-weighted MRI from 50 enrolled subjects. Ground truth choroid plexus segmentation was performed manually with final approval from a board-certified neuroradiologist (CDM; experience = 9 years). The manual delineation protocol was defined as follows: first, 2-D axial T2-FLAIR and 3-D T2-weighted images were co-registered to 3-D T1-weighted images using linear registration tools from the Advanced Normalization Tools (ANTs) software package15. Finally, multi-modal data were utilized by the rater to delineate the choroid plexus using FMRIB Software Library (FSL)16. Manual segmentations focused on the choroid plexus in the atria of the lateral ventricles. To limit biasing of the deep learning method, delineations were careful to not include partial volumes from nearby anatomical structures including the thalamus and periventricular white matter.
Automatic choroid plexus segmentation
Automatic choroid plexus segmentations were generated via a fully convolutional neural network model.
The machine learning model was designed following a U-NET architecture17. This architecture was chosen because of its proven success in medical image segmentation algorithms and consisted of an encoding and a decoding step. The encoding step was composed of three blocks each composed of two layers. The number of filters was set to 64 for the first block and doubled at each block thereafter. Each layer consised of a 3-D convolution (kernel size = 3x3x3 voxels, stride = 1, and padding = 1), a batch normalization, followed by a rectified linear unit. Feature maps from each block were downsampled using a maximum pooling operation (kernel size = 2x2x2 voxels). The decoding step followed the same architecture, with each block dividing the number of filters by 2. Up-sampling between each decoding block was performed with a 3-D transposed convolution (kernel size = 2x2x2 voxels, stride = 2x2x2 voxels, and no padding). The final segmentation map was obtained using a block composed of a 3-D convolution operator (kernel size = 1x1x1 voxels, stride = 1x1x1 voxels, and no padding) followed by a hyperbolic tangent as activation function. The model was trained on three separate datasets to compare performance across different MRI sequences (i.e., T1-weighted, T2-weighted, and T2-FLAIR-weighted images). All images were registered non-linearly with ANTs software to the International Consortium for Brain Mapping-Montreal Neurological Institute (ICBM-MNI) 152 T1-weigthed template18. Non-linear registration was utilized in this step-in order to reduce morphological variability of the lateral ventricles and thus increase the inter-subject similarity of the choroid plexus appearance.
Implementation details
A patch-based approach for training of the machine learning model was employed. Patches of 64x64x64 voxels were extracted from the MNI-registered images, and these patches were centered on random voxels from the choroid plexus probabilistic atlas that was generated from the average of the manual choroid plexus segmentations included in the training data set. In total, 41 overlapping patches from each participant were used to train the model. During the training phase, random flipping along the longitudinal fissure was implemented to increase the training sample size further. The network was trained using an ADAM19 optimizer with a learning rate set to 10− 4. A generalized Dice loss function was used to train the network20. Lastly, the segmentation mask patches were pieced back together in MNI space and transformed back to the native T1-weighted space using the inverse transformation and nearest-neighbor interpolation. An overview of the processing pipeline with a diagram of the 3-D U-NET architecture is shown in Fig. 1.
For comparison to available algorithms, choroid plexus segmentation masks were generated using FreeSurfer’s standard segmentation procedure from T1-weighted MRI in all training subjects21–22. Briefly, input images were skull-stripped and intensity corrected, and then FreeSurfer’s aseg atlas was used to generate left and right choroid plexus labels. These labels were inverse transformed back to each subject’s native T1 space and combined to form one choroid plexus mask per subject. These masks were then compared to ground truth manual segmentations for statistical analysis.
Lastly, for the lifespan volumetry analysis, T1, T2, and T2-FLAIR images were separately preprocessed as described previously, and the deep learning model for each corresponding MRI sequence was utilized to generate choroid plexus segmentations for all enrolled participants. From these segmentations in each participant’s native imaging space, the choroid plexus volume was calculated in cm3.
Statistical analyses
To evaluate the accuracy of the choroid plexus segmentation, we investigated how the model performed when trained with different sets of MRI sequences (i.e., T1-weighted, T2-weighted, and T2-FLAIR images). For each MRI sequence, a 5-fold cross-validation scheme was implemented with 30 participants utilized in model training, 10 participants utilized in model validation, and 10 participants utilized in model evaluation. Pseudo-randomization was used to ensure the same participant groups across each modality-based model.
To verify the accuracy of the machine learning and FreeSurfer outputs, standard comparison metrics between the ground truth segmentation and machine learning output were calculated. The Dice-Sørensen coefficient, 95% Hausdorff distance, and area under curve (AUC) were calculated for each iteration of cross-validation and averaged across the iterations to produce representative metrics for each modality-based model. These metrics were then compared to FreeSurfer using two-tailed Wilcoxon tests.
As an exploratory analysis, we evaluated these performance metrics as a function of participant lateral ventricular volume to gain more understanding on how the machine learning models perform across in different anatomical environments. Lateral ventricular volume was calculated from each participant’s T1-weighted MRI using the AssemblyNet software package23. Generalized linear models were utilized to separately regress performance metrics against model-testing participants’ lateral ventricular volume.
The intraclass correlation coefficient between the choroid plexus volume for each healthy control participant from the three deep learning models and between each of the choroid plexus volumes for the training participants and their ground truth choroid plexus volumes, were calculated and descriptive statistics presented as Bland-Altman plots.
For the lifespan component of this study, a generalized linear model was utilized to regress the choroid plexus volume from each modality model against participants’ age. Sex was included as a covariate as well in this regression to account for previously found sex-dependence on choroid plexus volume4, and total intracranial volume calculated from AssemblyNet was included as a covariate as well. The McFadden R2 values were calculated for each regression model.
The machine learning algorithm was implemented using the PyTorch Python library24, and pre-processing, post-processing, and statistical analyses were implemented in Matlab25. All statistical analyses were implemented using the R software package26. All p-values were corrected with false discovery rate for multiple comparisons correction27. Significance criteria was defined as p < 0.05.