Study population
The patient information and CT scan data were collected from April 2018 to July 2021 at our institution. All data were desensitized before using.
The inclusion criteria were:
-
underwent both vertebral fracture assessment (VFA) in dual-emission X-ray absorptiometry (DXA) and LDCT examination in chest combined with upper abdomen, with scan range from the apical lung to the lower edge of liver;
-
underwent relevant tests if a differential diagnosis is needed.
The exclusion criteria were:
-
with postoperative metal or bone cement implant;
-
with the target vertebrae not completely covered in CT examination;
-
Osteoporosis caused by secondary factors, such as renal failure, hyperparathyroidism and diabetes;
-
vertebral compression fractures caused by other reasons as trauma or tumors.
Finally, 500 individuals (mean age 66.03 ± 9.71 years, range 39–90 years) were enrolled in this study, including 270 normal cases and 230 OVCF cases. 500 individuals were randomly divided into a training set, a validation set, and a test set in the ratio of 6:2:2.
All 500 LDCT scans were manually annotated by an experienced radiologist for the contours of target vertebrae and were used to develop the 3D deep learning-based system for the automatic localization and segmentation of vertebral bodies.
Diagnosis Of Ovcf
Based on the Genant visual semi-quantitative method [22], using the lateral radiograph of DXA, vertebrae were graded on visual inspection and without direct vertebral measurement as normal (grade 0), mildly deformed (grade 1, approximately 20–25% reduction in anterior, middle, and/or posterior height and a reduction of area 10–20%), moderately deformed (grade 2, approximately 25–40% reduction in any height and a reduction in area 20–40%) and severely deformed (grade 3, approximately 40% reduction in any height and area).
LDCT was used in our study to identify the integrity of the circumferential wall of the vertebrae, the presence of a bone mass protruding from the posterior edge of the vertebral body into the spinal canal, and the extent of spinal canal involvement. Magnetic resonance imaging (MRI) and Laboratory tests were used to differentiate osteoporotic fractures from pathological fractures caused by bone tumours and other diseases by showing both the bone and the surrounding soft tissue lesions [23–24].
The overall OVCF diagnostic process is shown in Fig. 1.
Image Acquisition And Reconstruction
All individuals underwent plain CT scans on a Revolution CT scanner (GE Healthcare) from the apical lung to the lower edge of liver using the following scan parameters: 120 kVp, SmartmA (10-130mA), gantry rotation time of 0.5 sec per rotation, helical pitch of 0.992:1 to obtain a preset noise index of 16. The SmartmA is an automatic exposure control system that uses z-axis and angular tube current modulation to customize the dose to the size and shape of an individual. Before scanning, the user defines the desired image quality (noise index). A higher noise index is related with a lower milliampere second and, as a result, a lower radiation dosage. The average volume CT dose index (CTDIvol) for the patient population in our study was low at 6.54 ± 2.53 mGy.
Images were reconstructed with 512×512 reconstruction matrix and 1.25-mm slice thickness and interval using a standard reconstruction kernel.
Development Of The Ovcf Detecting System
The development of this automated OVCF detecting system consisted of three main stages. First, train a localization model to identify and locate the target vertebrae in the region for coarse segmentation. Second, train a segmentation model to finely segment the target vertebrae in the localized image. Finally, train a DenseNet based classification model for detecting OVCF on target vertebrae, the output of which is the diagnostic result and the OVCF probability in target area.
Vertebra Localization
Image preprocessing
This model needs to convert axial CT images into maximum intensity projection (MIP) image, which can be better localized to the location of Target vertebraes (T12 L1 L2).
The image preprocessing consisted of the following three steps. First, select the maximum density projection map formed by the rays perpendicular to the coronal plane direction, where the horizontal coordinates correspond to the coordinates of the X-axis in the original CT data and the vertical coordinates correspond to the coordinates of the Z-axis. Second, use the labeling tool to label the three vertebrae in the generated maximum density projection map as a whole. Third, divide data into training set, validation set and test set.
Development of localization model
The localization model was mainly to obtain the upper and lower coordinates of the target vertebrae in the Z-axis direction in the original CT data. The object detection algorithm chosen was the Faster R-CNN, which mainly consists of Fast R-CNN and RPN [25]. RPN mainly provides region proposal information, while Fast R-CNN is mainly responsible for extracting input image feature-maps, receiving input image features from RPN input and unifying the map size and outputting the class of the target object as well as the position information. Figure.2 shows the specific structure of the Faster R-CNN model.
Vertebra Segmentation
Image preprocessing
All 500 LDCT scans were manually annotated by an experienced radiologist for the contours of target vertebrae to develop the segmentation model. The location of the three vertebrae in the cropped CT data are obtained by the localizing model, where the X-, Y- and Z-axes ranged from [180,330], [200,420] and [ZL-5,ZU+5] respectively, where the ZL and ZU roots are based on the upper and lower positions of the three vertebral bodies in the Z-axis direction provided by the localizing model. All cropped data were processed using a window level of 0 HU and window width of 400 HU. All CT values in the data were scaled to [0, 1] using the min-max normalization method. The size of the input segmentation model was set to 150 x 220 x 120. If the size of the cropped data was smaller than this set size, then the input condition was met by up- and down-complementing by 0.
Development of segmentation model
The segmentation model chosen was the 3D AnatomyNet model, a variant of U-Net as can be seen from the structure picture. The advantage of 3D AnatomyNet is to improve the ability to segment small regions, and the interdependencies between channels can be adaptively modelling and calibrating, better able to extract effective features and increase the representational power of the network. The structure of AnatomyNet is shown in Fig. 3.
-
Divide the data into a training set, a validation set, and a test set in the ratio of 6:2:2.
-
Train the training and validation sets using the AnatomyNet model and use the model with the highest average Dice coefficient of the evaluation metrics on the validation set as the final model.
-
Input the test set data into the trained segmentation model and check the average Dice coefficient to test the generalization ability of the model.
-
Input the training set, validation set, and test set into the trained model and output the Dice coefficients and segmentation results for each data to check whether there are multiple bones or mis-segmented vertebrae.
Ovcf Detection
In this study, a 3D DenseNet model for the detection of osteoporotic vertebral compression fractures was constructed. The uniformly sized images obtained from the segmentation model were randomly divided into a training set, a validation set, and a test set in a ratio of 6:2:2 and used as the dataset for the input of osteoporotic vertebral compression fracture classification model.
As shown in Figure.4, this detection model was a 121-layer 3D-Densenet-BC with the following structure of the network in order: Initial convolutional layer, Max-pooling layer, Dense block 1, Transition layer 1, Dense block 2, Transition layer 2, Dense block 3, Transition layer 3, Dense block 4, Global average pooling layer, and Fully-connected layer.
The initial convolutional layer comprises convolution of a kernel size of 7×7×7, stride of 2 and padding of 3, followed by a Batch Normalization layer and a ReLU activation function. The Max-pooling layer comprises pooling of a kernel size of 3 x 3 x 3, stride of 2 and padding of 1. The bottleneck layer is used in dense block 1 to dense block 4, with the number of bottleneck layers in dense block 1 to dense block 4 being 6, 12, 24 and 16, respectively. Each bottleneck layer consisted of a BN layer, a ReLU layer and a convolutional layer. Each transition layer connects two contiguous dense blocks, and a BN layer, a ReLU layer, a convolution layer, and an average pooling layer. The convolution layer comprises convolution of a kernel size of 1×1×1 and stride of 1. The pooling layer comprises pooling of a kernel size of 2×2×2 and stride of 2. We set compression factor = 0.5, growth rate = 32 in our experiments.
Global average pooling was performed after dense block 4, specifying the output feature map size as 1 x 1 x 1.The softmax layer was a softmax function applied to the output of the Fully-connected layer of the network to predict the class probability.
A threshold of 0.5 was set for our experiments, meaning that when the network predicted an OVCF category probability greater than 0.5, this sample was predicted as an OVCF sample.
The \({\alpha }\)-balanced variant of the focal loss is chosen as the loss function, which plays the role of balancing hard and easy samples, with the following expressions:
$$\text{F}\text{L}\left({\text{p}}_{t}\right)=-{\alpha }_{t}{\left(1-{p}_{t}\right)}^{\gamma }log\left({p}_{t}\right)$$
Our experiment set the loss function parameters \({\alpha }_{t}\)=0.6, \(\gamma =2\), initial learning rate of 3e-5, used the MultiStepLR decay strategy, milestones=[16,33], gamma = 0.1, and used Adam as the default optimizer. After 50 epochs of iterative training, the model parameters were saved for each epoch and the best result was selected as the final model parameters.
Statistical analysis
SPSS 22.0 software (IBM SPSS Statistics for Windows) was used for statistical analysis. The Dice coefficient, which effectively quantifies the spatial overlap between segmentation and ground truth, was used to evaluate segmentation performance. The area under the curve (AUC) for receiver operating characteristic (ROC) analysis was calculated to assess the diagnostic performance of the developed system for OVCF, using clinical diagnosis as the reference standard. We also evaluated the performance of this detecting system through sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV).