Image Data
Data in this study were acquired from the UK Biobank study (UKBB) in the United Kingdom and from the German National Cohort study (NAKO) in Germany, which obtained written informed consents from all subjects and approved our data analysis. Research involving human participants was performed in accordance with the Declaration of Helsinki in both studies. The analysis of anonymized data from these studies was approved by the ethics committee of the Medical Faculty of the University in Tübingen.
Cohort characteristics and imaging protocols differ between these two large-scale population studies. The UKBB study aims to image 100,000 healthy UK participants between 40 and 69 years of ages; MRI are performed on 1.5 T clinical MRI scanners (Magnetom Aera, Siemens Healthineers, Erlangen, Germany). The NAKO study enrolled 30,000 participants between 20 and 69 years of age from the general German population for the MRI part of the study; 3 T clinical MRI scanners (Magnetom Skyra, Siemens Healthineers, Erlangen, Germany) are used in the NAKO.
In both studies, as part of an extensive imaging protocol, whole-body T1-weighted dual echo gradient echo (GRE) sequences are acquired with parameters given in Table 3. These generate four MRI contrasts (Dixon contrasts), namely water, fat, in-phase (IP) and out-of-phase (OP). A total of 20,000 participants (10,000 subjects from each study) were available in our analysis. Notably, spatial resolution is markedly higher in NAKO, mainly due to the higher magnetic field strength of deployed MRI scanners.
Table 3
| UKBB | NAKO |
Magnetic Field strength [T] | 1.5 | 3 |
Matrix size | 224 x 156–224 x 174 | 240 x 320 |
Pixel size [mm x mm] | 2.23 x 2.23 | 1.2 x 1.2 |
Slice thickness [mm] | 3–4.5 | 3 |
Echo times [ms] | 2.39 / 4.77 | 1.23 / 2.46 |
Repetition time [ms] | 6.69 | 4.36 |
Flip angle [°] | 10 | 9 |
Band width [Hz/px] | 440 | 680 |
Data Pre-processing and Automated Organ Segmentation
The source data for this study consisted of whole-body T1-weighted images from 6 acquisition stations for the UKBB (neck to knee) and 4 acquisition stations for the NAKO (neck to upper thigh) stored in the DICOM format. Following the approach described in 5 these images were converted into four 3D image files per subject, each of which corresponds to one of the four Dixon contrast, and stored in the NIfTI format. As part of this process, a composing step stitching together the single MRI acquisition blocks was performed on UKBB data using publicly available in-house software (https://github.com/biomedia-mira/stitching) 17, whereas NAKO data were available as pre-composed whole-body data sets. For automated segmentation of the liver, the spleen, the left and right kidneys as well as the pancreas, publicly available, pre-trained deep learning-based models for abdominal organ segmentation described in 5 were used for this study (code: https://github.com/BioMedIA/UKBB-GNC-Abdominal-Segmentation, trained models: https://gitlab.com/turkaykart/ukbb-gnc-abdominal-segmentation). These models, based on a standardized U-Net architecture (nnU-Net) detailed in 4, were previously trained on UKBB and the NAKO training data based on 400 manually labeled image volumes and were extensively validated in a previous study 5. All four Dixon image contrasts were given as model inputs. The UKBB model was deployed on a GPU workstation equipped with 2 Titan RTX GPUs (NVIDIA, Santa Clara, USA) whereas the NAKO model was deployed on a dedicated GPU server using two Tesla V100 GPUs (NVIDIA, Santa Clara, USA).
The first 10,000 complete data sets on which inference was technically successful were drawn from each, the UKBB and NAKO data pool for this study.
Visual Quality Control (QC)
For visual quality control, a QC tool with an interactive graphical user interface (GUI), called SegQC, was developed in Python (code available at: https://github.com/BioMedIA/UKBB-GNC-QC-Tool). It enables visual assessment of segmentation quality by an expert in an efficient and scalable setup, specifically designed for population imaging studies with flexible caching and indexing to simplify the QC process (Fig. 4). Similar assistive tools aiming for expediating assessments and reproducibility have already shown the benefits of such interactive quality control in neuroimaging 18–20.
SegQC’s graphical user interface was created with an open-source application programming interface (API) Streamlit (https://github.com/streamlit/streamlit), which enables the development of a web app accessible through a web browser. In addition to working with large datasets, SegQC offers various features such as different viewing orientations (coronal, sagittal and axial), overlay of segmentation masks and generation of maximum intensity projections (MIP) as well as storage of quality ratings as a csv file. The tool allows to assess organ segmentation quality slice by slice in different views and/or visualize different organ segmentations all in one screen using MIPs. In addition, the user can adjust the granularity of QC rating options through a configuration file. Furthermore, one can navigate subjects consecutively, select subjects based on IDs as well as flag interesting subjects for later re-assessment.
Using SegQC, visual segmentation quality analysis of all 20,000 data sets was performed by an expert radiologist (SG − 11 years of experience). The overall aim of this process was to identify data sets with relevant segmentation errors and to assess whether these errors were easily correctable.
Organ segmentations were defined as “without error” (“error-free”, “no error”) in case no segmentation error was visually perceivable on coronal, axial or sagittal MIP images and segmentation masks consisted of a single connected component (examples given in Fig. 5).
Accordingly, automated segmentations for each organ were defined as “erroneous” or “with error” if segmentation errors were visually perceivable on coronal, axial or sagittal MIP images (as only partly segmented organ or as segmentation mask exceeding organ boundaries) or if the segmentation mask consisted of more than one connected component for a single organ (Fig. 6).
Relevant segmentation errors due to multiple connected components were regarded as “easily correctable” in case the largest connected component of the segmentation mask corresponded to the target organ and showed no relevant error (Fig. 6). By this definition, an easily correctable erroneous organ segmentation can be corrected by discarding smaller connected components and retaining the largest. Erroneous segmentations that were not easily correctable are referred to as not easily correctable.
In addition, the existence of composing artifacts between adjacent MRI acquisition stations was visually assessed and recorded. As whole-body MRI data were acquired in several acquisition blocks with subsequent composing to a single 3D image block (see above), spatial inconsistencies between adjacent blocks can cause image artifacts. Composing artifacts were defined as inconsistencies between two adjacent MRI acquisition stations that resulted in missing or duplicated anatomical structures (e.g., due to different respiratory states along the diaphragm). In case target organs were partially missing or duplicated due to composing artifacts, segmentations were considered erroneous even if they technically correctly corresponded to the respective organs since these segmentations did not correctly describe the actual organ anatomy (Fig. 6).
In a small number of datasets (92/10,000 UKBB data sets and 12/10,000 NAKO data sets), severe image acquisition errors were observed (such as fat/water swaps and MR signal alterations) resulting in severely altered image properties and relevant segmentation errors in all organs. These data sets were discarded from further statistical analysis.
Segmentation Error Correction
As described above, by definition, an easily correctable error occurred in case automated organ segmentation masks consisted of multiple connected components and the largest of these components corresponded to an error-free segmentation of the target organ. For such organ segmentations with easily correctable errors correction was possible by choosing only the largest connected component, generating the final segmentation map. This ensured that corrected segmentation masks did not have relevant errors in contrast to simply applying this post-processing step without previous qualitative visual assessment.
Extraction of Organ Phenotypes
Three image-derived features were extracted from the composed images and their corresponding segmentation maps per organ and subject: organ volume, organ surface area and maximum 3D organ diameter. Feature extraction was implemented in Python using the PyRadiomics 21 package.
Statistical Analysis of QC Results
Statistical analysis consisted of several stages. We first extracted the epidemiological parameters, namely age, sex, weight and height from the UKBB and NAKO meta-data for all 20,000 subjects. For the UKBB cohort, age was calculated from the MRI examination date (data field ID 53), year of birth (data field ID 34) and month of birth (data field ID 52). Sex, weight, and height were acquired from the data field IDs 31, 21002 (if missing 12143), 50 (if missing 12144), respectively. For the NAKO cohort, age was drawn from the field df100_age, sex was drawn from the field df100_sex, size from the field anthro_groe, anthro_groe_eigen or anthro_groe_man (in this order depending on data availability) and weight from anthro_gew, anthro_gew_eigen or anthro_gew_man (in this order depending on data availability).
For further analysis, we first calculated the frequency of segmentation errors per participant and organ for each cohort. To identify associations between segmentation quality ratings and epidemiological and imaging factors, multivariable logistic regression was performed. Regression analysis was performed for each organ individually. In this analysis, the binary dependent variable was the presence of segmentation errors (yes / no) per organ and independent variables were age, sex, BMI, data source (UKBB / NAKO) and the presence of composing artifacts (yes / no). With multivariable logistic regression, we calculated odd ratios per organ error category and associated p-values. The odd ratios for continuous variables (age and BMI) were scaled using the standard deviation (SD) to reflect the odds per SD change in that variable.
In addition, we compared the distributions of extracted shape features (volume, surface area, maximum 3D diameter) of segmentation masks with and without errors. In the latter case we compared feature distributions before and after correction (by choosing the largest connected component).
Statistical analyses were performed using the Statsmodel library 22 in Python.