Radiographic Bone Texture Analysis Using Deep Learning Models for Early Rheumatoid Arthritis Diagnosis

Yun-Ju Huang CGMH: Chang Gung Memorial Hospital https://orcid.org/0000-0002-6226-6635 Miao Shun PAII labs, Bethesda, USA Kang Zheng PAII labs, Bethesda, USA Le Lu PAII labs, Bethesda, USA Yuhang Lu PAII labs, Bethesda, USA Chihung Lin Chang Gung Memorial Hospital Chang-Fu Kuo (  zandis@gmail.com ) Division of Rheumatology, Allergy and Immunology, Chang Gung Memorial Hospital, Taoyuan, Taiwan 2. School of Medicine, Chang Gung University, Taoyuan, Taiwan https://orcid.org/0000-0002-9770-5730


Conclusion
Fully automated quantitative assessment for periarticular texture by deep learning models can help the classi cation of early RA.

Background
Rheumatoid arthritis (RA) is an autoimmune disease characterized by symmetric polyarthritis at peripheral small joints, especially the proximal interphalangeal, metacarpal phalangeal, and radiocarpal joints. The progression of RA, including bone structural and textural changes, can be assessed via conventional radiographs, computed tomography (CT), magnetic resonance imaging (MRI), or densitometry. (1)(2)(3) Conventional radiography is an inexpensive and reproducible technique that can assist in RA screening, diagnosis, evaluation, and monitoring by indicating joint space narrowing, erosions, and periarticular bone microarchitecture (radiographically present as texture) changes such as osteoporosis.(4) However, it is challenging to assess periarticular texture changes using plain radiographs quantitatively; this poses a problem because the extent of periarticular osteoporosis could be an early indication of RA.
Fractal analysis is one of the techniques used to determine bone texture characteristics from radiographs. A fractal dimension is a measure of the space-lling capacity of a pattern and can be used to indicate irregular pattern complexity with self-similarity at different scales. (5) A particular type of fractal analysis is fractal signature analysis (FSA), which is a computerized textural analysis method used to measure vertical and horizontal trabeculae based on the fractal dimensions of the bone structure over a range of trabecular widths.(6-8) FSA has been previously applied for bone architecture measurements and disease progression using radiographs in cases of knee osteoarthritis, (9)(10)(11)(12)(13)(14) osteoporosis treatment response after administration of risedronate,(15) hand osteoarthritis, (16) and hip osteoarthritis.(17) Furthermore, differences in fractal signatures in RA radiographs among three types of bone conditions, namely normal, osteopenic, and eroded bone, have also been assessed. (18) While these previous studies indicate that bone disease classi cation and disease progression assessment can be performed by examining textures radiographically using techniques such as FSA, the present techniques are based on xed descriptors for texture features. They are not capable of 'learning' latent features in radiology lms that may indicate disease classi cation or progression.
Deep learning methods such as those based on multilayer convolutional neural networks (CNNs) are robust alternatives for various image analysis tasks, including image classi cation and segmentation. (19,20) CNNs are capable of automatically learning and extracting hidden structural and textural bone features from radiographs to classify them and quantify their features, which are often not apparent to the human eye, such as those of periarticular osteoporosis and trabecular abnormalities. Therefore, we hypothesized that deep learning algorithms would be capable of identifying textural feature changes in periarticular regions of the phalanges, which could indicate signs of early RA. In addition, these textural features may be used to diagnose early RA using conventional radiographic images of the hand by dividing patients into different risk groups.

Patient characteristics and study design
In this study, we developed a deep learning-based image processing model to automatically detect and segment distal metacarpal bones as regions of interest (ROIs) in plain radiography images of both hands; the extracted radiographic features were used to classify the images for early RA. Our proposed model was trained, tested, and validated using data recorded at Chang Gung Memorial Hospital, Taiwan. In particular, digital anterior-posterior radiographs of bilateral hands from early RA and non-RA patients aged 18 years or older were retrospectively collected to form the primary dataset. The radiographs of RA were collected within one year of the initial diagnosis of RA, which was based on the 2010 European League Against Rheumatism / American College of Rheumatology (EULAR-ACR) classi cation criteria for RA. (21) The RA diagnosis was con rmed by two rheumatologists after a thorough chart review. This study was approved by the Institutional Review Board of Chang Gung Memorial Hospital, Taiwan. The requirement for signed informed consent was waived because the data used in this study were derived from partial hand radiographs obtained from de-identi ed digitized patient data to prevent any con dentiality concerns.

Datasets for CNN training and testing
Our CNN model for early RA classi cation was trained using a random set of 3,740 radiographs obtained from 892 RA and 1236 non-RA patients, which represent 80% of the primary dataset; this random set was further partitioned into training (80%) and validation (20%) datasets. It is noteworthy that multiple hand radiographs from the same patient were considered as independent radiographs in the training set. The nal trained model was then tested using the remaining 20% of the primary dataset-consisting of 905 radiographs from 228 RA and 272 non-RA patients-as the test dataset. The digital radiographs included in our primary dataset were obtained at 50 kVp using the same radiography system (Fuji lm Healthcare); these radiographs were greyscaled and had resolutions ranging from 1192 × 1536 to 3015 × 2505 pixels.

Segmentation of ROIs
For the image pre-processing, model training, and validation tasks in our study, we used the high-performance computing systems available at the Center for Arti cial Intelligence in Medicine, Chang Gung Memorial Hospital, Taiwan. Deep learning algorithms were used to segment the distal third of metacarpal bones and analyze the corresponding radiographic textural features from the radiographs. A curve-graph convolutional network (GCN) was trained for fully automated segmentation of the second, third, and fourth metacarpal bone images. The speci c AI methodology for the GCN-based automated anatomical tissue segmentation approach used in this study has been described in a previous work (arXiv:2007.03052v2 [cs.CV], Accepted: MICCAI 2020). In summary, a novel GCN-based contour transformer network (CTN), which is a one-shot anatomy segmentor with a naturally built-in human-in-theloop mechanism, was used to segment the ROIs in the radiographs by learning a contour evolution behaviour process. The CTN was trained to t a contour to the required object boundary by learning from one labelled image exemplar; this network takes the image exemplar and an unlabelled image as inputs, and then detects contours with similar features as those in the image exemplar in the unlabelled image. Three losses were considered to ensure that the CTN was 'one-shot' trainable. This segmentation model was then connected to a classi cation model to realize a fully automatic process for RA classi cation.
The set of segmented images was augmented via random rotation (-180° to + 180°) and intensity jittering (brightness: -0.2 to + 0.2; contrast: -0.2 to 0.2). Subsequently, the obtained images were resized to 192 × 192 pixels before texture feature extraction. The deep texture encoding network (Deep-TEN) was the base architecture used to generate textural feature vectors in our study. (22) Finally, the texture feature vectors were used for RA classi cation of radiographs.

Algorithm and training of proposed RA classi cation models
We developed a deep learning algorithm based on the Deep-TEN model to extract bone textural features from hand radiographs. The proposed algorithm is based on a multilayer CNN with parameters that are structured as a hierarchy of layers. In general, a CNN image classi cation model scans an image to extract and aggregate structural and textural features from it. With a large amount of data, such a model can learn the essential features necessary to t and identify ROIs for a problem, which, in our case, is the classi cation of radiography images for RA.
Deep learning models can extract texture representations using a pre-trained generic CNN model (such as the ResNet-18 or ResNet-50 models) considering both texture and structure or speci c models considering texture alone. (23) The Deep-TEN model used as the base architecture in our proposed algorithm is a texture-speci c model that includes a novel encoding layer on top of the convolutional layers of the generic ResNet-18 model.(24) Therefore, the Deep-TEN model is a specialized model that can detect and extract image texture features with superior performance, and is thus especially useful for material and texture recognition. (22) Because the features extracted by the Deep-TEN model are learnable, the proposed model is dynamic and does not rely on any xed feature set. Our proposed model architecture is shown in Fig. 1. The vectors generated by the proposed model represent the orderless textural features; however, the structural features are excluded from these extracted representations. Separate models were trained for the second, third, and fourth distal metacarpal bones, and a nal ensemble model was developed using the ve trained models by averaging their outputs for the three metacarpal bones in an input image. Furthermore, we trained a ResNet-50 model to classify the radiographs for RA using the extracted ROIs for comparison with our proposed model; in this case, aside from the textural features, the structural changes in the images were also considered for RA classi cation. In previous works, the ResNet model has been shown to be useful for RA diagnosis, either using clinical information (25) or using diffuse optical tomography images.(26) Both Deep-TEN and ResNet-50 models take the ROIs as input and provide a continuous RA risk probability value between zero and one as an output. Patients were divided into groups of low, moderate, and high RA risk based on this output value. The dataset of the original training radiographs was split into a subject-strati ed 5-fold cross (FC) validation set. A nal ensemble model was created from the corresponding ve trained models by averaging their outputs for the three metacarpal bones in the input image.

Evaluation of the proposed model
The performance of our proposed model for RA classi cation of hand radiographs was evaluated using the test dataset. The receiver operator characteristic (ROC) curve was used to visualize the performance of the classi cation model for RA prediction, and the area under the ROC curve (AUROC) was used to indicate model performance, where a value of '1' indicates perfect prediction and a value of 0.5 or less indicates that the model has no class separation ability. Separate ROC curves were obtained for the Deep-TEN and ResNet-50 models, and the corresponding AUROCs and 95% con dence intervals (CIs) were also estimated. (27) Statistical analysis Summary statistics for patients with and without RA were compiled and compared. The performances of the Deep-TEN and ResNet-50 models were compared with the obtained RA classi cation results; in addition, metrics such as sensitivity, speci city, and positive predictive value were calculated. Differences were considered to be signi cant if there was a two-tailed P value of less than 0.05. Multivariate logistic regression was used to assess the association between the RA risk groups and RA diagnosis, and the odds ratios (ORs) and 95% CIs for RA were calculated with adjustments for age and sex. The image processing, deep learning model building, and training were based on Python programming language with the deep learning framework of Pytorch. All statistical analyses were conducted using the SAS program, version 9.4 (SAS Institute Inc., Cary, NC, USA).

Patient characteristics
In this study, we acquired de-identi ed digitized medical data of 1119 RA patients, which were then split into the training/validation (n = 891) and test (n = 228) datasets such that both sets had patient data with similar age and sex distributions. The patient characteristics of all patients (RA and non-RA) are listed in Table 1. Furthermore, the median disease duration (interquartile range (IQR)) was 35 (14,294) and 21 (14,49) days in the training/validation and testing sets, respectively.

Performance comparison between the Deep-TEN and ResNet-50 models
The Deep-TEN model achieved an AUROC of 0.69 (95% CI: 0.64-0.74) for RA classi cation based on textural features obtained from patient radiographs; this performance was similar to that of the ResNet-50 model, which had an AUROC of 0.73 (95% CI: 0.69-0.77). Figure 2 shows the ROC curves of the Deep-TEN and ResNet-50 models for RA classi cation; from these curves, it can be observed that the Deep-TEN model, which uses only textural features for classi cation, is capable of classifying patient radiographs for early RA with a performance similar to ResNet-50 model, which considers both textural and structural features. Using the Youden's index, the cut-offs for the texture score for RA classi cation were obtained as 0.43 and 0.45 for the Deep-TEN and ResNet-50 models, respectively. The sensitivity, speci city, and positive predictive value of a high texture score to classify early RA were 0.67, 0.62, and 0.64 for the Deep-TEN model and 0.68, 0.67, and 0.67 for the ResNet-50 model.
Texture risk group and RA prediction High mean texture scores with age-and sex-adjusted ORs (95% CI) of 3.42 (2.59-4.50) and 4.30 (3.26-5.69) were obtained using the Deep-TEN and ResNet-50 models, respectively, for RA prediction (see Table 2). Based on the results listed in Table 2, it can be deduced that the sex of patients did not have any signi cant effect on the models' RA classi cation performance. Further, we partitioned the predicted texture score into tertiles in order to differentiate the patients into three risk groups for RA. Table 3 lists the mean texture scores for RA risk in the three different risk categories. Using the Deep-TEN model, the moderate and high RA risk groups had age-and sex-adjusted ORs (95% CIs) of 2.48 (1.78-3.47) and 4.39 (3.11-6.20), respectively, compared with the low RA risk group. Similarly, using the ResNet-50 model, the age-and sex-adjusted ORs (95% CI) for RA were 2.17 (1.55-3.04) and 6.91 (4.83-9.90) in the moderate and high RA risk groups, respectively, compared with the low RA risk group.

Discussion
In this study, we demonstrated that radiographic textural features of distal metacarpal bones could indicate early signs of RA. Because of the complexity of the high-dimensional textural features in radiographs, simple mathematical operations such as FSA cannot be used to describe them. In contrast, deep learning methods can provide an overall insight into the complex textural bone properties and yield risk scores based on them, thereby enabling the classi cation early RA and stratifying patients into different risk groups. Thus, deep learning methods can be used for automatic reporting of RA risk based on plain radiographs; this risk information could then be incorporated into standard clinical risk analysis for early RA prediction.
We compared two deep learning models, namely the Deep-TEN and ResNet-50 models, for RA classi cation. Based on our results, the performance of both models is similar in terms of binary classi cation into RA and non-RA radiographs. However, the primary difference between both models is that the Deep-TEN model only takes into account the textural information from radiographs for RA prediction, while the ResNet-50 model considers both their textural and structural features. For example, bone erosions resulting in a change in bone contour are not considered by the Deep-TEN model because it is a structural feature change. Therefore, the ResNet-50 model performs slightly better at identifying patients at high risk of RA. In contrast, the Deep-TEN model is better at separating the patients into three risk groups for RA based on changes in the texture, thereby forming a more homogenous risk continuum. Hence, the selection of a deep learning model for RA prediction in clinical settings would depend on clinical needs.
The 1987 ACR classi cation criteria for RA(28) de ne erosion or unequivocal bony decalci cation (periarticular osteoporosis) in hand and wrist posteroanterior radiographs as one of the radiographic features relevant to RA diagnosis. Periarticular osteoporosis, which is a bone textural feature, is an osseous morphologic indication that is observed before the occurrence of bone erosions and joint space narrowing. (29,30) Early periarticular osteoporosis, which is characterized by the loss of trabecular size and reduction in the number of metaphyseal regions, is di cult to detect and quantify via traditional hand radiography; therefore, X-ray radiogrammetry,(31) CT, (32) and MRI (33) have been applied to detect periarticular osteoporosis in previous studies. However, the application of these approaches in clinical settings is hampered by their high costs. In the 2010 EULAR-ACR classi cation criteria for RA, (21) information on RA diagnoses based on clinical features such as joint involvement or symptom duration as well as using laboratory tests for anti-citrullinated peptide antibodies, rheumatoid factor, C-reactive protein, and erythrocyte sedimentation rate were included. Radiographic bone texture changes were not emphasized as in the previous 1987 ACR criteria (28) because early indications of bone erosion and periarticular osteoporosis were di cult to assess objectively from plain radiographs, and this could have led to delayed RA diagnosis. Traditionally, conventional radiography was considered to be less sensitive to early indications of RA. Nevertheless, in recent times, with the assistance of machine learning techniques, as we have observed in our study, conventional radiography could perhaps be useful for early RA classi cation.
In many clinical situations, the automatic evaluation of radiographs using deep learning will be of great medical value, because such a system could potentially support RA diagnosis as a screening tool for RA in both general clinics and specialized hospitals. Furthermore, our proposed CNN model can estimate the bone texture score and predict RA from radiographs within one second per image, which is considerably faster than analyses by human clinicians. Thus, our proposed model could save time and be used as a diagnostic tool in countries where the number of available rheumatologists or radiologists is low. Furthermore, it can be used by family physicians to refer their patients to RA specialists based on the diagnostic predictions by the model. Moreover, because this is a computerized model, intraobserver and interobserver variabilities can be avoided if it is applied in clinical trials related to RA research.
Compared with our current work, previous attempts to use CNNs for the interpretation of hand radiography images of RA patients did not consider the distinctive textural or structural changes that occur in the joints of RA patients. Despite the advantages of our proposed CNN-based approach for the detection of early RA indications, our study has the following limitations. First, we only analyzed the texture of the second, third, and fourth distal metacarpal bones for RA classi cation of radiographs. Thus, further investigation is required to con rm whether the inclusion of radiographic images of other parts of the hand as input to the proposed CNN model would increase its RA risk classi cation performance. Second, the training data used in the current study are from patients with early RA (for most patients, RA was diagnosed less than a year prior to the study). Thus, later temporal changes in the bone texture or structure due to RA as the disease progresses were not considered in our work. Third, the complexity of the proposed deep learning model with millions of parameters prevents a straightforward interpretation of the results by human doctors and clinicians.

Conclusions
In this study, we proposed a deep learning model that can detect bone texture changes related to RA from hand radiographs, which, when coupled with automatic joint detection and segmentation, can help the classi cation of early RA. Flow of machine learning The rst model (above) depicts our proposed CNN model, which produces a texture descriptor; it is comprised of the rst two blocks of the ResNet-18 model, which are then followed by a Deep TEN encoder. The second model (below) depicts a CNN model with the ResNet-50 network as the backbone architecture, which produces a dense feature map, followed by a GCN that detects the boundary of the metacarpal bones. Subsequently, a texture score was assigned to each subchondral region detected by the GCN. The data set of the original training radiographs was split into a subject-strati ed 5-fold cross (FC) validation set. A nal ensemble model was created from the corresponding ve trained models by averaging their outputs for the three metacarpal bones in the input image.