MRI Data Sets
This study was approved by Institutional Review Board (IRB) in all the participating sites. All retrospective subject data were obtained with a waiver of consent under IRB approval. The data were anonymized before being shared.
Data sets for segmentation training (Data set A & B)
Training and validation of the proposed lumbar spine semantic segmentation method was carried out by performing an institutional review board–approved retrospective analysis of lumbar spine images from 286 subjects who underwent MR imaging in the Longhua Hospital, Shanghai University of TCM between January 1, 2019, to December 31, 2020. Among these, there’re 223 subjects using a 1.5-T MRI unit (MAGNETOM Aera XJ, SIEMENS) and 63 subjects using another 1.5-T MRI unit (MAGNETOM Avanto, SIEMENS), which were trained two separate segmentation networks for different resolution of 512*512 (Data set A) and 320*320 (Data set B). Mid-sagittal T2 images of different resolution were exported from Data set A and Data set B respectively, being randomly allocated into each training set or test set (Fig.1). All images in the segmentation data set were labeled by LabelMe (version 3.3.6, CSAIL, Massachusetts Institute of Technology) 21. Based on the structural features mentioned in the modified Pfirrmann grading system, the segmentation area of 14 parts, included 5 vertebral bodies (L1-L5), 5 lumbar IVDs (L1/L2-L5/S1), sacrum (S1), pre-iliac fat area, cerebrospinal fluid area in the spinal canal, and background as Fig. 2a.
Data set for quantitative analysis (Data set C)
The proposed LSSN was used to extracted 1051 lumbar spine images as Data set C in four hospitals around China, including Longhua Hospital, Shanghai University of TCM, Guangdong Provincial Hospital of Chinese Medicine, Shenzhen Pingle Orthopedics Hospital, and Dongzhimen Hospital, Beijing University of Chinese Medicine between January 1, 2019, and March 30, 2021. The imaging parameters of all sites are summarized in Table 1.
Table 1 Imaging Parameters for the MRI Sequences in the 4 Sites
Site
|
City
|
Strength of the Magnet
|
Company
|
Model
|
Coil
|
Longhua Hospital, Shanghai University of TCM
|
Shanghai
|
1.5-Tesla
|
SIEMENS
|
MAGNETOM Aera XJ
|
18-channel Spine Tim 4G coil
|
Guangdong Provincial Hospital of Chinese Medicine
|
Guangzhou
|
3-Tesla
|
SIEMENS
|
TIM Systems
|
32-channel Spine Tim coil
|
Shenzhen Pingle Orthopedics Hospital
|
Shenzhen
|
1.5-Tesla
|
SIEMENS
|
MAGNETOM Essenza
|
8-channel quadrature body coil
|
Dongzhimen Hospital, Beijing University of Chinese Medicine
|
Beijing
|
1.5-Tesla
|
SIEMENS
|
MAGNETOM Amira
|
24-channel quadrature body coil
|
A research team, composed of a 4-year radiology resident (DW Kong), two 8-year orthopedic resident (J Chen, XF Ma), two 4-year orthopedic resident (YL Sun, YP Lin) and a 2-year orthopedic resident (MC Yin), discussed together for the final segment and Pfirrmann grade for each MR image.
Lumbar Spine Segmentation from MR Images
Convolutional Neural Network (CNN) Training
The critical component of LSSN is an improved deeplabv3+ segmentation network22 with backbone ResNet-10123, called BianqueNet. The BianqueNet was built on the basis of deep feature extraction to extract richer semantic information and denser features. An illustration of this semantic segmentation network is shown in Fig. 2. The entire network consists of a swin transform skip connection (ST-SC) module and a deep feature extraction (DFE) module. Swin Transform is a hierarchical transform calculated by shifting the window, which has the advantages of high efficiency and low complexity 24. The skip connections structure designed in this study uses two successive Swin-Transformer blocks, with 1*1 convolutional layers in parallel at the same time, and finally the two output features are spliced. Through the pyramid pooling module, feature information of different depths through pooling operations of different scales can be obtained. By repeating check with feature map of 4096 channels multi-scale information 25, the network can achieve efficient features extraction with a dense semantic feature map of 256 channels.
32-depth BMP images were exported from raw MRI to train the LSSN as input. In the upsampling phase, a modified upsampling operation with a deconvolution decoder was used to recover more detailed features of the segmentation target. In the feature extraction phase, the feature maps of different resolutions were obtained by down-sampling and output to the ST-SC module, which splices images and extracts features from different resolutions. According to feature pyramid26, feature maps with low-resolution and high-resolution were integrated to extract more semantic and spatial information, in which a 3*3 double convolutional layer was used for the fused feature map to improve the feature. Finally, a double upsampling operation was performed to obtain a dense prediction image.
Weighted Dice Loss Function
A weighted dice loss function as below was proposed to enhance segmentation performance by estimating difficlties in difference images with typical or atypical structure, which ensured consistent in segmentation:
This formula was used in the output of the softmax layer, where the is the probability of voxel (target) and is the probability of voxel (non-target). So was for and . represents different segmentation areas, represents the total number of channels, which is taken as 14. represent the weight of different segmentation channels. According to the experimental analysis results, channels weight was set to 0.9, 0.8 and 1 for vertebral body, IVD and the other respectively, which may achieve the best segmentation performance.
For avoiding that the subsequent feature extraction operations are affected, corrosion and expansion operations were used to remove the burrs (Fig. 2b).
Lumbar IVD Quantitative Analysis
Parameters Calculation based on Pfirrmann Grading System
Based on previous studies 18, 25–27, some extraction and calculation methods were modified with histogram features of IVD. signal intensity difference (∆SI) was obtained to quantify the blurring degree of boundary between NP and AF, which indicating water content in IVD. Average disc height (DH), disc height index (DHI), and disc height-to-diameter ratio (DHR) were obained to quantify structural collapse in IVD degeneration. Specific calculation methods for each parameter are described in the Supplement File 1.
Versatility Test for Images with Different Origins
IVD parameters extracted by LSSN in mid-sagittal lumbar MR images with different resolutions were compared with each other. In the data set B, 46 images with resolution of 320*320 were randomly selected to be segmented and quantified by model B. Meanwhile, these images were adjusted to 512*512 for segmentation and quantitation by model A. IVD parameters extracted from these two models were used for versatility test. If IVD parameters from LSSN shows good consistency under different origins of imaging, LSSN will be used into a larger population (Data set C) with different machines and models to establish degenerative ranges of IVD parameters in different people.
Quantitation for IVD Degeneration
Relationship between IVD parameters (including ∆SI, DH, DHI, and HDR) and demographic information (including gender, age, and segment) and correlation between IVD parameters and IVD degeneration (based on the modified Pfirrmann grading system) were analyzed respectively in Data set C. Based on these results, a baseline of IVD degeneration in larger population was established, which may indicate a qualitative IVD degeneration in different population with accordance to the modified Pfirrmann grading system. Details in quantitative protocols were shown in the Supplement File 2.
Statistical analysis
The intraclass correlation coefficient (ICC) was used to analyze the consistency between the IVD parameters calculated using the original resolution (320*320) from the data set B and the adjusted resolution (512*512). The macroF1-score and the Kendall correlation coefficient were used to test sensitivity and specificity in IVD degeneration grading performance among deep learning methods and 3 residents in the two data sets with different resolution according to the modified Pfirrmann grading system. An absolute value of r of 0-0.4 was considered as weak correlation, 0.4-0.6 as moderate correlation, and greater than 0.6 as strong correlation.
Spearman rank correlation coefficient between IVD signal intensity and grading score has been calculated via SPSS (version 26, IBM, USA). Multiple regression analysis was performed on IVD quantitative parameters (, DH, DHI, HDR) and baseline information (including gender, age, segment) to describe some characters in larger population via Stata (version 15.1, USA).