This was a retrospective study. The study was approved by the Institutional Review Board of the National Rehabilitation Center for Children with Disabilities (2014-17), and all methods were carried out in accordance with their guidelines and regulations. Written informed consent was obtained from the parents or the legal guardians of all children.
・ Data collection
Patients younger than 12 months who were suspected to have DDH-related hip dislocation or hip subluxation and who had undergone anteroposterior(AP) view hip radiography between June 2009 and November 2021 were selected retrospectively. For patients who underwent several hip radiographies, the oldest image in the time series was selected. All patients were examined by the same experienced pediatric hip specialist for DDH screening. Patients were diagnosed with DDH based on physical examination findings, AP view hip radiography images, and hip ultrasonography images, if necessary. A positive DDH diagnosis using hip radiography images was based on the following criteria: a) lateralization of the epiphyseal ossification center, b) interruption of the Shenton line,17 c) widened tear drop distance compared to that on the other side,18 d) delayed femoral head ossification compared to that on the other side, e) high acetabular index (>30), and f) dulled edge of the acetabulum. The Graf method was used for hip ultrasonography.19 The International Hip Dysplasia Institute (IHDI) classification was used to quantify DDH severity because the classification does not rely on the presence of the ossification center of the femoral head and it can be applied to patients of all ages.20
Basically, patients with DDH of IHDI grade 2 or worse and/or type 2c or worse Graf classification for hip ultrasonography images were considered as belonging to the DDH group since they required careful observation.19
・ Data preparation
The original images were 1430 × 1140 pixels in size. These images were changed into a square shape (1430 × 1430 pixels) by adding black regions to the top and bottom. Then, the images were resized to 864 × 864 pixels. Approximately 15 percent of normal images were randomly but equally distributed to validation and test datasets, and 15 percent of DDH images were also randomly but equally distributed to validation and test datasets, considering the equality of DDH severity based on IHDI classification.
For the training dataset, all images were augmented by flipping them horizontally. In addition, to avoid overfitting, the DDH images were augmented by 10° and -10°rotations.
・ Image annotation
Image annotation was performed with LablImg version 1.8.1.21 Object bounding boxes were drawn with the following criteria: a) the inner boundary is drawn in anatomical regions deeper than the deepest region of the acetabulum, b) the outer boundary is drawn to include the greater trochanter, c) the upper boundary is drawn to include the acetabulum and the ossification center of the femoral head, and d) the lower boundary is drawn to include the lesser trochanter (Figure.1). Normal hips are labeled as “Normal” and DDH hips are labeled as “DDH”.
・ Deep learning algorithm
Transfer learning was performed using YOLOv5. Transfer learning is a technique in which a well-trained model from a large dataset is used for applications of interest with a small dataset.22 Therefore, transfer learning can reduce the requirement of large datasets. YOLOv5 is the latest product in the YOLO series. YOLOv5 contains four different models: YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5x. The main difference between these models is the amount of feature extraction modules. YOLOv5s has the smallest size of modules and amount of module parameters, and YOLOv5x has the largest size of modules and amount of module parameters.15 All four models were utilized for the present study and results were compared. For transfer learning, the first 10 layers of the YOLOv5 models were frozen in place, and the rest of the layers were retrained with our new datasets.
A learning rate of 0.01, mini-batch size of 32, and 100 epochs were used for the training.
The analyses were performed using Python 3.7.12 (Python Software Foundation, Wilmington, DA, U.S.). Consequently, the trained models could detect hips in AP view radiography images and label them as either “Normal” or “DDH”, with confidence scores (Figure.2). Confidence
The test set was evaluated using the trained models, with a 0.5 confidence score threshold.
If both “Normal” and “DDH” were labeled on the same hip, the hip evaluation was considered invalid (Figure.3). Sensitivity, specificity, positive predictive value, and negative predictive value were calculated for each trained model.