Development and Validation of a Deep Learning Model for Preoperative Screening of Myasthenia Gravis in Patients with Thymoma based on CT Images

Objectives: Thymoma-associated myasthenia gravis (TAMG) is the most common paraneoplastic syndromeof thymoma. The screening of TAMG before thymoma resection is required to avoid severe perioperative complications, especiallyrespiratory failure. Herein, we developed a 3D DenseNet deep learning (DL) model based on preoperative computed tomography (CT) to detect TAMGin thymoma patients. Methods:A large cohort of 230 thymoma patientswere enrolled. 182 thymoma patients (81 with TAMG, 101 without TAMG) were used for training and model building. 48 cases from another hospital were used for external validation. A 3D-DenseNet-DL model and ve machine learning models with radiomics features were performed to detectTAMG in thymoma patients. A comprehensive analysis by integrating 3D-DenseNet-DL model and general CT image features,named 3D-DenseNet-DL-based multi-model, was also performed to establish a more effective prediction model. Results: By elaborately comparing the prediction ecacy,the 3D-DenseNet-DL effectively identied TAMG patients, with a mean area under ROC curve (AUC), accuracy, sensitivity and specicity of 0.734, 0.724, 0.787 and 0.672, respectively. The effectiveness of the 3D-DenseNet-DL-based multi-model was further improved as evidenced bythe following metrics: AUC 0.766, accuracy 0.790, sensitivity 0.739 and specicity 0.801. External verication results conrmed the feasibility of this DL-based multi-model with metrics: AUC 0.730, accuracy 0.732, sensitivity 0.700 and specicity 0.690,respectively. Conclusions: Our 3D-DenseNet-DL model can effectively detect TAMG in patients with thymoma based on preoperative CT images. This model may serve as a non-invasive screening method or as a supplement to the conventional diagnostic criteria for identifyingTAMG. preoperative CT images. This model may serve as a supplement for identifying TAMG. These results suggest our 3D-DenseNet-DL based multi-model is an effective and non-invasive method for screening TAMG in patients with thymoma. To our knowledge, this is the rst study about the diagnosis of TAMG in thymoma patients by using machine learning based on CT imaging data.


Introduction
Thymoma is the most common neoplasm of the anterior mediastinum in adults, and often associated with various autoimmune paraneoplastic syndromes (PNS) [1]. Thymoma-associated myasthenia gravis (TAMG) is the most common PNS, accounting for 30-50% of all PNS [2]. TAMG is an autoimmune disease, involving antibodies against the postsynaptic nicotinic acetylcholine receptors (AChR) at neuromuscular junctions, resulting in variable weakness of voluntary muscle [3]. Patients with TAMG can experience severe cardiopulmonary complications [4,5]. One of the most severe complications of TAMG is postoperative myasthenic crisis, which can rapidly worsen, leading to respiratory failure and even death [6]. Moreover, the incidence of postoperative myasthenic crisis was high (ranges from 11.5 to 18.2%), and patients with crisis have a high mortality rate [7][8][9]. According to NCCN clinical guidelines for thymomas, all patients suspected of having thymomas (even those without symptoms) should be carefully evaluated for the presence of TAMG before surgical procedure in order to avoid respiratory failure during the operation [2,10,11]. However, muscle weakness, as a symptom, is common in many other diseases, attributing to frequently missed or delayed diagnosis of MG in patients experiencing mild weakness or in individuals with weakness restricted to only a few muscles [12]. In addition, although the current criteria for diagnosing MG (including immunological, electrophysiological, and pharmacological approaches) have improved [13], a simple, noninvasive method for preoperative screening of TAMG, especially for patients with inconspicuous or atypical symptoms, has important clinical signi cance.
A very puzzling, but interesting characteristic of MG is that most of patients have histopathological abnormalities in their thymus, such as hyperplastic thymus and thymoma [14]. Thymomas are now strati ed into six entities (types A, AB, B1, B2, B3, and TC (carcinoma)) on the basis of the morphology of epithelial cells and the lymphocyte-to epithelial cell ratio [15]. TAMG is the speci c subtype of MG, which is closely related to the different pathological subtypes of thymoma: TAMG is more common in type B (B 1 , B 2 and B 3 ) than type A and AB thymomas and absent in TC [14,16]. In thymoma, the correlation with grading and staging of thymoma has been widely analyzed based on imaging features or quantitative texture analysis [17,18].In recent years, deep learning (DL) and radiomics in the medical imaging eld have been studied intensively to explore the potential of utilizing various medical images as diagnostic, predictive, or prognostic information of human diseases, including the possibility of identifying tumor pathological subtypes, tumor phenotypes and the gene-protein signatures [19,20]. Therefore, with rapid advancement in machine learning algorithms, it is possible to determine the status of TAMG in patients with different pathological subtypes of thymoma using imaging data from preoperative routine CT scan of thymoma.
Here, we designed this study to explore the effectiveness of 3D-DenseNet-DL model and ve radiomics as predictive methods for detecting TAMG based on preoperative chest CT image. The nal optimal model, named as 3D-DenseNet-DL based multi-model, integrating with general CT image features was ultimately established to detect MG in thymoma patients.

Patients
For this study, 182 patients diagnosed with thymoma who had undergone thymectomy at the First A liated Hospital of Sun Yat-sen University from Jan 1st, 2011 to Jun 31st, 2018 were included for analysis and model building (Table 1). Another 48 thymoma patients admitted to the Sun Yat-sen Memorial Hospital of Sun Yat-sen University from Jan 1st, 2017 to Mar 31st, 2019 were used as the external validation cohort (Table 1). All cases had undergone enhanced preoperative CT examination and had been clearly staged based on pathological examination and clinical manifestation. All patients in our study were evaluated by neurologists to determine the status of myasthenia gravis (MG) syndrome or other autoimmune diseases before operation. This project was approved by the Ethics Committee and Institutional Review Board of Sun Yat-sen University. Informed consent was waived due to the retrospective nature of this study.

CT imaging Characteristics and Scan Protocol
Enhanced chest CT images were acquired within one week prior to operation. Imaging features were carefully evaluated through PACS reading workstation by two experienced radiologists specializing in chest CT imaging that were blinded to the MG statuses of the patients. CT Imaging characteristics that were evaluated included ( Table 2): maximum diameter (3-D Maximum diameter); degree of enhancement (increment of enhanced CT value, HU); enhancement (homogeneous or heterogeneous); necrosis/cystic component (divided into 0%-25%,26%-50%, 51%-75%, 75%-100% according to its volume percentage); shape (round or oval, lobulated, irregular); contours (smooth or irregular); presence of calci cation, adjacent organ invasion, effusion(pleural/pericardial), and lymphadenopathy. All preoperative enhanced chest CT images were obtained with a 64-row multidetector CT scanner (Aquilion 64; Toshiba Medical, Tokyo, Japan). Scan parameters: x-ray tube voltage of 120 kVp; maximum of 500 mA with automatic tube current modulation.
Axial thin-section CT images of the whole lung were reconstructed with a section thickness and spacing of 1.0 mm. Iopromide at 80-100 ml/per patient (300 mg I/m1, Schering Pharmaceutical Ltd) was injected at 3-4 ml/s ow rate and applied to contrast enhanced scanning protocol. Note: † Data are mean ± standard deviation; NA-Not Applicable.
Machine Learning

Datasets
Thymoma on CT images were segmented manually using the annotation tool "ITK-SNAP" (www.itksnap.org) [21]. "ITK-SNAP", as a free software, is widely used for medical image annotation and labeling. In this work, ITK-SNAP was applied for thymoma lesion segmentation. The output from ITK-SNAP are NIFTI les containing mask information of the thymoma for each sequence of CT images. We then used the mask information to extract the area of thymoma, namely the regions of interests (ROI) ( Figure S1). For feature extraction in radiomics analysis, the segmented thymoma was used directly. For deep learning modeling, a further preprocessing step was designed to prepare the segmented data for the convolutional neural network.

Radiomic analysis procedure
Radiomics analysis involved several steps: feature extraction, feature selection and machine learning. First, feature extraction was performed to convert raw images to structural data with radiomics information that could be processed by machine learning algorithms. Then, several methods were applied to further select high-quality features based on variance or regression. Finally, the data with selected features are used as inputs for several mainstream machine learning algorithms to train and test the model.

Radiomic features
The radiomic features were extracted using open source PyRadiomics software

Radiomic feature extraction
Feature selection was conducted to select a subset of features from all extracted features for use in model building. The aims of this step were to reduce the dimensions of features, simplify the model and enhance generalization by reducing over tting. A multi-level selection approach was adopted, which involved three algorithms in the order of: variance threshold method, k-best method, and the least absolute shrinkage selection operator (LASSO). Variance based method was adopted at rst to select features with variance larger than a threshold (threshold = 0.1 in this study, data were normalized to a range of -1 to 1). Then, top k (k = 300 in this paper) features were further selected based on top ANOVA F-value between feature and the label. Finally, LASSO with ve-fold cross-validation was adopted to automatically select the more effective features ( Figure S2).

Radiomics model building
The performance of radiomics analysis was evaluated using ve popular machine learning algorithms:

3D-DenseNet
DenseNet [23] is a type of convolutional neural network (CNN). DenseNet composes of four dense blocks, as shown in the schematic diagram. Dense connections between layers within dense blocks are present in DenseNet. We chose DenseNet as the base model in this study due to its various advantages. First, DenseNet can be used to reduce over-tting. Second, DenseNet is computationally e cient as it requires less than half of the parameters of ResNet. Although DenseNet was rst designed for two-dimensional images, our study targeted 3D CT sequences. As most medical images are three-dimensional, we designed a 3D-DenseNet where the kernel of each convolutional and pooling layer was modi ed to 3D versions. In the proposed 3D DenseNet model, recti ed linear unit (ReLu) was used as activation function in each layer, and softmax function was applied in the last layer of our network to obtain the probability for each sample ( Fig. 1 and Table S1). Batch normalization was applied before activation layer. The loss function of our model was due to binary cross-entropy, which was optimized using Adam with mini-batch size of 16.

Training Process Optimization
Two kinds of data augmentation were applied during the training stage of deep learning ( Figure S3) to avoid over tting. First, random cropping was implemented by randomly placing the segmented thymoma image in the xed cube with shape. Second, a xed window center (WC) and window width (WW) of 300 were applied for input images with original CT values. A random change was applied for training data with WW value ranging from − 10 to 10 and WC value from − 5 to 5. Transfer learning was also applied to obtain bene t such as acceleration of the training stage from the pretrained model, which boosted the training speed signi cantly compared with the other initialization methods (such as Xavier).  Associations between General CT Image Characteristics and status of TAMG Ten common variables were used to describe the CT imaging features of thymomas included in this study ( Table 2). The statistical differences between two groups were found in necrosis/cystic component rate (P = 0.029), contours (smooth/ irregular, P = 0.030), shape (P = 0.027), adjacent organ invasion (P < 0.001), pleural/pericardial effusion (P = 0.028) and lymphadenopathy (P = 0.030). In general, thymoma patients with TAMG tend to have less enhancement heterogeneity, less lobulated shape and lower rate of adjacent organ invasion.

Detection of TAMG by Radiomics analysis and 3D DenseNet DL model
For the radiomics analysis and deep learning (DL) analysis, a total of 1390 radiomic features were extracted from the Routine contrast enhanced chest CT image data. After applying Variance Threshold, K-best and LASSO methods, the remaining features after application of each method were 499, 300, and 16, respectively.
The 16 features nally selected were listed in table S2. To decipher the relationship between features, correlation analysis using the Pearson method was applied and a heatmap was constructed for visualization ( Figure S4).

Building of the 3D-DenseNet-DL based multi-model for TAMG detection
With the multivariable logistic regression analysis, only the shape of thymoma (P = 0.031), the invasion rate of adjacent organ (P = 0.001) and DL score (P < 0.001) quali ed as independent predictable factors (Table 3).
To optimize the effectiveness of TAMG-detecting model, we further built 3D-DenseNet-DL based multi-model (DL plus two general CT features).With ROC curve analysis, the AUC of DL model, general CT features model (the shape and the invasion rate of adjacent organ) and 3D-DenseNet-DL based multi-model were 0.740, 0.677and 0.766, respectively ( Fig. 3A and B), suggesting that the 3D-DenseNet-DL based multi-model demonstrated better performance for detecting TAMG in thymoma patients. Note: OR, odd ratio; CI, con dence interval; DL, deep learning; #, The P value was calculated by multivariable logistic regression analysis adjusted for age and gender;&, Unadjusted P value; * P < 0.05 was considered as statistically signi cant.
The external validation of 3D-DenseNet-DL based multi-model

Discussion
In this study, we proposed and validated a non-invasive method based on preoperative routine CT imaging of thymoma, referred to as "3D DenseNet deep learning (DL) based multi-model", to detect TAMG before operation. With this model, we successfully ltered out most of TAMG patients in the training set (n = 182, AUC of 0.766), and further veri ed its reliability and e cacy in an external validation set (n = 48, AUC of 0.730). These results suggest our 3D-DenseNet-DL based multi-model is an effective and non-invasive method for screening TAMG in patients with thymoma. To our knowledge, this is the rst study about the diagnosis of TAMG in thymoma patients by using machine learning based on CT imaging data.
Currently, there are three accepted diagnostic criteria for con rming MG by neurologists: immunological, electrophysiological, and pharmacological approaches. The immunological assay for serum AChR binding antibodies is considered as the most reliable approach to diagnose MG [24,25]. AChR antibody is found in nearly all of TAMG patients, but the false positive rate was also high [14]. Repetitive nerve stimulation (RNS) [26] and single-ber electromyography (SFEMG) [27] are widely used in electrophysiological con rmation. However, SFEMG may not provide con rmation of the presence of MG unless weak muscles are tested, and the reliability of results is highly dependent on the experience of the technician [13]. Pharmacological con rmation has long been used for the diagnosis of MG [28]. However, the reported false-positive results [29] and the possible occurrence of potentially lethal vagal bradycardia following Tensilon injection [30], particularly in elderly persons, greatly limit its clinical application for MG con rmation. Therefore, although current diagnostic criteria are widely used for the nal diagnosis of MG, some other methods may be used as a supplement for the initial screening or diagnosis of MG. Our 3D-DenseNet-DL based multi-model is a candidate, and the favorable results indicates that this model can be considered as a complementary method to the conventional diagnostic criteria, especially for screening TAMG before thoracic surgery. Considering the e cacy, safety, minimal-invasiveness and economic cost, we proposed a clinical ow chart for preoperative screening of MG: a combination of clinical symptoms, serum AChR antibody and image-based DL method ( Figure S5). This ow chart may be important for necessary clinical management and preoperative risk assessment of the disease.
Nowadays, increasing number of studies are performed to evaluate the potential relationship between image and biological features of solid tumors [31], such as glioblastoma [32], rectal and lung adenocarcimoma [33,34]. As the most common primary neoplasms of the mediastinum, the prediction of thymoma histology and stage by radiographic criteria have been mentioned in several previous reports. CT ndings, such as smooth contours [35], calci cation [35,36], heterogeneous attenunation [36,37], were interpreted as being of value in differentiating the various histologic subtypes of thymomas. Recently, Angelo lannarelli and colleagues [17] found the relationship between radiomics parameters, histology and grading of thymic tumors. More importantly, their study also demonstrated that MG syndrome was signi cantly associated with some parameters in quantitative texture analysis (QTA) [17], which represented an incentive for further evaluation the value of radiographic analysis in detection of MG syndrome in thymoma patients. Unfortunately, their study only included 16 patients (7 patients with TAMG). We therefore proposed a DL model based on preoperative CT imaging for screening TAMG in large cohort of thymoma patients (230 cases, and 95 with TAMG). Moreover, our results further con rmed the superior reliability and e cacy of this developed 3D-DenseNet-DL model comparing to the other ve radiomic-based methods. These results also highlight the importance of radiographic analysis as diagnostic tools from the accurate characterization of the lesion itself to the detection of the paraneoplastic syndromes, which is a great stride in the application of AI in the medical eld.
However, despite its satisfactory outcomes, this study has some limitations. First, given the retrospective nature of this analysis, a selection bias was unavoidable. Second, patients were not strati ed into more detailed clinical status categories due to limited sample size. Third, the status of serum AChR binding antibodies was important for TAMG diagnosis, but the absence of such information in certain cases restrained further analysis. Therefore, a perspective, multi-center clinical trial with larger cohort would be indispensable to further con rm and optimize the screening model for MG patients.
In conclusion, with a large sample data for modeling and an independent cohort for external validation, we rstly developed a 3D- An illustration of the architecture of our 3D DenseNet deep learning model. Images with dimension 160×160×64 pixels are feed into the network, followed by multiple convolution and pooling operations, resulting in probability predictionfor MG. In dense block, features with different levels are concatenated using skip connections. The dimension is halved after each transition layer.

Figure 2
Results of Radiomics analysis and 3D DenseNet deep learning model for detecting MG in a cohort of 182 thymoma patients. The performance of ve machine learning and 3D-DenseNet-DL model was compared using Area Under ROC Curve (AUC) (A), accuracy (B), sensitivity (C) and speci city (D). 3D DenseNet deep learning model for detecting MG showed similar results in AUC and speci city, but relatively better results in accuracy and sensitivity comparing to ve radiomics analysis models (E). "RF", "LR" and "DL" refer to "Random Forest", "Logistic Regression", and "Deep Learning" respectively; "AUC", "ACC", "SN" and "SP" refer to the metrics Area Under ROC Curve, Accuracy, Sensitivity and Speci city, respectively.