In this study, we developed a two-stage deep-learning method based on orbital 99mTc-DTPA SPECT/CT images that were highly accurate in distinguishing between active and inactive phases of GO (AUC = 0.89). Furthermore, our segmentation module extracted the EOMs in the orbital CT well (IOU = 0.82).
99m Tc-DTPA SPECT/CT imaging for classification of GO
Several previous studies have demonstrated that 99mTc-DTPA SPECT and SPECT/CT can provide an accurate assessment of disease activity in the orbit of GO. Szumowski et al. concluded that SPECT/CT had a high sensitivity of 93% and specificity of 89% for diagnosing GO [6]. Regarding the classification of disease activity, the uptake ratio(UR) generated from SPECT images is the most frequently used functional parameter [19]. We previously reported that the UR of EOMs had a good correlation with CAS ((R = 0.77, P < 0.01) [20]. However, the semi-quantitative measurement of UR varied depending on the choice of referenced denominators, which may confine the cognition of more detailed pathologic information within tissues and subsequently reduce the staging performance. On the other hand, in the active GO stage, infiltration with inflammatory cells and increased proteoglycan content result in edematous swelling of orbital soft tissues and enlarged EOMs. Therefore, EOMs involvement is one of the key features of GO. The volume of EOMs has been proven to be a reliable sign for the judgment of staging and therapeutic efficacy in GO [21]. However, the measurement is not an easy task, which requires special computer analysis, including application software and hardware, as well as radiologists’ valuable time [22]. Additionally, the EOMs enlargement can occur in a different phase, and it may be challenging to stage GO only using morphologic parameters alone. Thus, a fully automatic method to assess EOMs by combining morphologic and functional features using hybrid SPECT/CT is highly desirable.
Machine learning for GO diagnosis and activity assessment
A number of studies have demonstrated that ML is capable of automating the screening and diagnosis of various ocular disorders, such as cataracts, diabetic retinopathy, glaucoma, age-related macular degeneration, and retinopathy of prematurity [8]. Nevertheless, ML-based techniques using orbital imaging have rarely been investigated in diagnosing GO and activity assessment: Hu et al [9]. used a ML model to evaluate the GO disease classification using MRI. They considered a group of 60 patients with active GO and 40 patients with inactive GO. The study collected variations of the magnetization transfer ratio, signal intensity ratio (SIR), and apparent diffusion coefficient (ADC) of the EOMs for each eye from MRI images. Their ML-based model obtained a better performance for disease activity differentiation and CAS prediction than the model merely combining SIRs and ADCs (AUC, 0.93 vs 0.90; R = 0.70 vs. 0.67). However, this study collected a small number of patients and only pre-planned features were used in the statistical ML model. Song et al[15]. built a deep learning model for screening GO, using 784 (normal: 625, TAO: 168) orbital CT images to train the model and 114 and 227 orbital CT as the validation and test sets, this study achieved good results in the task of screening GO (accuracy, 87%; sensitivity, 88%; specificity, 85%). However, the sample of GO patients enrolled in this study was small and the activity of GO patients was not staged. Chen et al. [10] proposed an algorithm based on a deep convolutional neural network (DCNN), for detecting GO activity. This algorithm was trained using 160 orbital MRI images of GO patients (50 active, 110 inactive), 80% of which were used for training and validation, 20% for testing, and the accuracy, precision, sensitivity, specificity, and F1 score of the resulting best model were 85.5 ± 1.8%, 64.0 ± 3.3%, 82.1 ± 7.1%, 86.5 ± 4.0%, and 0.72 ± 0.04, respectively. Notably, this study analyzed 32 GO patients (7 active, 25 inactive) on the test set, which was unbalanced, resulting in the overfitting of the model.
Strengths of our study
Our study proposed a deep learning-based method using SPECT/CT images. Compared to the above-mentioned two studies using ML for GO, both anatomic and functional features were utilized in our deep learning model. Moreover, our method has two important modules EOMs segmentation and disease activity classification. As a result of segmentation, the EOMs mask was used to add to the input of classification. Noteworthy, the segmentation method used in this study was improved over our previous algorithm [14] by mirror-flipping the orbital CT of both eyes, making the SR better on the IOU (0.79 ± 0.03 vs. 0.74 ± 0.05).
In terms of classification, the three-channel input model (SPECT, CT, and EOMs mask) achieved the best accuracy of 86.10%. Comparatively, the accuracy of the model without EOMs masks was only 60.88%, the model of CT combined with EOMs masks at 75.13%, and the model of SPECT combined with EOMs masks at 79.79%, suggesting that a three-channel architecture approach can improve diagnostic sensitivity and specificity for staging GO activity. Moreover, this three-channel input model achieved a higher sensitivity, precision, and F1 score, compared with the DCNN-based architecture using MRI by Chen et al[10]. (84.6% vs.82.1%, 83.4% vs.64.0%, 0.83 vs.0.72).
Moreover, the GO-Net model performed well for interpretation. The true-positive case in Fig. 3(a) shows that there is good agreement between the model-focused region and the actual lesion region. Particularly, inferior and medial rectus attract more attention than other rectus.; While in the true-negative case[ Fig. 3(b)], four EOMs received the same attention due to negative finding in SPECT/CT. These findings indicated that GO-Net could automatically focus on the pathological changes in four EOMs and give different levels of attention to the staging. Notably, the model was falsely positive due to physiological uptake in the adjacent nasal sinuses (Supplementary Fig. 4), which is what caused the false positive seen in Fig. 3(c). Moreover, we found that CAS and SPECT/CT findings were not always consistent, as shown in Fig. 3(d): according to CAS(CAS = 4), this case should be diagnosed as active GO, while no significant morphological or functional changes were found in SPECT/CT imaging, resulting in a negative result. Although previous studies have shown a significant positive correlation between CAS and DTPA uptake[5], some individuals with CAS > 3 had low DTPA uptake and did not respond favorably to anti-inflammatory treatment[22]. In line with these results, the Grad-CAM derived from our model failed to identify the regions of suspicious lesions, further indicating that CAS has limitations for GO staging.
Limitations
There are several limitations. First, our patient population was from a single medical center. Although SPECT/CT enables more accurate and objective staging of GO activity, we were unable to obtain large enough samples from other medical centers due to the cost and technical issues, which may lead to selection bias. Second, the proposed model only predicts the diagnosis based on SPECT/CT scans. In practice, clinicians will follow other clinical assessments and follow-up to make the final decision. We believe that our model can achieve better results if other clinical assessments are added in the model's training. In addition, this study is limited to the activity staging of GO, and the diagnosis and prognosis should be combined to build an improved intelligent medical system.