In the current study, we developed a deep learning model to detect the L5 vertebra using YOLOv5x. The mAP_0.5 metric of our developed model was 98.2%. The results of the failure analysis showed that 42 cases (93.3%) among the validation dataset were correctly identified, whereas only 3 (6.7%) were incorrectly detected. Considering that ≥ 90% accuracy in a deep learning classification model is outstanding, our L5 vertebra detection model seems to have outstanding performance [17].
We aimed to confirm the exact location of L5 using deep learning. The accurate determination of the lumbar spine segment is of great importance [18]. Lumbar spine pathology and related clinical symptoms are significantly associated with specific lumbar nerve roots, which can be localized to the lumbar spine level [19]. Thus, selecting improper lumbar spine levels makes identifying the wrong lumbar nerve root likely, resulting in the management of improper lumbar segment levels during interventional or surgical procedures [20]. Historically, several methods for accurately identifying the number of a lumbar spine level have been reported, including a method using the 12th rib, iliolumbar ligament, kidney level, and craniocaudal level on whole-spine radiographs. However, conventional methods are not accurate [21–23]; therefore, spine physicians have endeavored to develop methods of accurately identifying the number of a lumbar spine segment.
In particular, accurate selection of the L5 level is important in the clinical setting, as most lumbar pathologies requiring intervention or surgery are located at L4–5 and L5–S1 [24]. However, identifying the L5 vertebra using conventional methods is difficult for the following reasons. L5 is easily miscounted when sacralization or lumbarization occurs. In the case of spondylolisthesis, the lumbar vertebrae are easily numbered erroneously when only the anteroposterior radiograph is examined. Additionally, in cases of severe osteoporosis, spinal or pelvic deformity, or congenital spinal pathologies, an accurate L5 selection is more difficult than usual. When spinal computed tomography (CT) or MR imaging (MRI) is performed together with lumbar spinal radiography, the determination of the L5 vertebra is not quite difficult. However, in clinical practice, spinal radiography is typically performed without spinal CT or MRI. Therefore, during spinal surgery or intervention, the L5 vertebra could be incorrectly determined by clinicians.
Regarding the determination of the L5 vertebra, many studies have examined patients with lumbosacral transitional vertebrae [22, 23]. There are two main methods for determining the location of the L5 vertebra. The first method identifies the L5 vertebra on the basis of characteristic vertebrae or structures. By counting down the number of vertebral bodies from C2, clinicians can accurately identify L5. Moreover, identification of the T12 vertebra by checking the vertebra where the last rib is attached can reveal the L1 vertebra, and the fourth vertebra below L1 is the L5. The L5 vertebra can also be identified by its transverse process using the Ferguson view. However, in typical clinical situations, whole-spine radiography is not performed. The method used to initially identify T12 is less accurate owing to the dysplastic rib, and the method using the Ferguson view has a sensitivity of 76–84%. The second method involves determining the L5 vertebra by identifying the iliac crest tangent sign in the coronal planes of MRI scans [21–23]. However, this method has a sensitivity of 81%. Compared with these classical methods used to count the number of vertebrae, our developed deep learning model shows high accuracy (93.3%) for detecting the L5 vertebra on anteroposterior lumbar spine radiographs. A deep learning model is characterized by a multilayer structure with multiple hidden layers and provides a higher ability than a traditional shallow learning model [8]. We believe that our deep learning model extracts valuable features that differentiate the L5 vertebra from other vertebral levels.
Certain limitations should be pointed out in the present study. First, it included a relatively small amount of imaging data. Second, we used a dataset from a single hospital. Since each hospital may have different image acquisition methods and resolutions, it may be important to acquire multi-center data. Third, we did not compare the accuracy of our developed model with that of human labor. Fourth, only AP radiography was used as input data for deep learning model development, and it is thought that the accuracy could be further improved if additional radiograph (e.g., lateral radiograph) was included in input data. Finally, we could not find a way to improve the problems mentioned in failure analysis. To increase the accuracy and versatility of our deep learning model and measure its performance more accurately, further study with larger samples and enhanced study design would be necessary.