Patients
This retrospective study was approved by the Ethics Committee of the Institutional Review Board. We retrieved 3060 cross-sectional images including three modalities: plain CT (1020 slices), PET (1020 slices), and PET/CT (slices) images respectively. Each modality contained 510 slices attributing to lymphomas and 510 slices attributing to solid tumors from 211 patients (153 patients with lymphomas and 116 patients with solid tumors) (Table.1) undergoing 18F-FDG-PET/CT scanning before clinical interventions. Each image only contained one enlarged cervical lymph node. All eligible plain CT, PET, and PET/CT images were randomly distributed into training (350 lymphomas and 350 solid tumors) (70%), validation (80 lymphomas and 80 solid tumors) (15%), and test (80 lymphomas and 80 solid tumors) (15%) cohorts. All eligible patients received PET/CT examinations before individual therapy between January 2014 and June 2018. For patients with malignant lymphomas, the diagnoses were completed by FNAC or biopsy on peripheral lymph nodes. The patients were diagnosed as lymphomas with positive histological and immunohistochemical evidences on examined nodes despite original locations. In some cases, there was no positive observation on samples of cervical nodes. As for patients with solid tumors, all eligible subjects were confirmed histologically by primary lesions. Only few patients received cervical lymph node biopsy of FNAC. We gave preference to solid tumor metastases when facing enlarged cervical lymph node with evidence of primary tumor.
We performed standard whole-body 18F-FDG PET/CT using a Gemini GXL PET/CT scanner (Philips, Amsterdam, Netherlands). The blood glucose level was evaluated immediately before the administration of 18F-FDG that PET/CT scanning would be completed till the blood glucose level less than 150 mg/dL. Fasting for at least 6 hours was required before the examination. Approximately 5 MBq of 18F-FDG per kilogram (up to 550 MBq) of body weight was administered intravenously, and patients received low-dose CT scanning (40 mA, 120 kVp) after resting in a quiet, dark environment for approximately 60 minutes. After initial low-dose CT, emission images were obtained from the top of the skull to the middle of the thigh, with acquisition times of 2 minutes per bed position in the 3D mode. PET images were reconstructed iteratively with CT based attenuation correction. In this study, we concentrated on patients’ cervical region for the proof of concept (Figure.1).
Image preprocessing
All CT, PET, and fused PET/CT images were reviewed by one experienced radiologist and one nuclear medicine physician who depicted the axial regions of enlarged cervical lymph node lesions with rectangular masks which could enter the ultimate DL-CNN, respectively. In order to avoid disturbing noises from surrounding organ tissue and blood vessels, the area around the bladder cancer lesions were regulated up to 15% of all masked regions. Moreover, to control the balance of images recognition, all CT imaged were transformed into final masking images under stable CT window [200~300 Hounsfiled Unit (HU)] and window level (20~30 HU) which were augmented into 150 × 150 pixels to standardize the distance scale and avoid distortion simultaneously. Through injected activity and body weight to SUV, all PET images were intensity normalized for injecting activity and body weight to SUVs. The real-time data augmentation was applied to the deep learning models which were exposed to multi-aspects of existing data by random transformation from limited samples in the training phase under dynamic control to improve model generalization and decrease the degree of overfitting.
Model development and evaluation
The development of DL-CNN contained three sections: CT, PET, and fused PET/CT images. The models were developed on different types of images by extracting salient features in axial scans, respectively. To optimize DL-CNN diagnostic performance, eight algorithms were constructed to classify histological components of cervical lymph nodes on fundaments of pretrained models on the ImageNet database including VGG16, Xception, VGG19, InceptionV3, InceptionResNetV2, DenseNet121, DenseNet169, and DenseNet201[16-22]. These networks ended with a layer_dense of size 1 with a sigmoid activation because of this binary classification problem. The RMSprop optimizer was applied with binary crossentropy as the loss in the compilation step. In addition, generated activation maps by class activation mapping (CAM) for the prediction model with best performance among eight algorithms on the test dataset were applied to evaluate the region of interest of PET and PET/CT images for further clinical review. The process of CAM was completed by linking 2D grid of an input image with a specific output class to compute every location in any input image, and indicating how important each location is with respect to the class under consideration. In this study, we performed Grad-CAM to interpret DL-CNN procedures consisting steps of taking the output feature map of a convolution layer, given an input image, and weighing every channel in that feature map by the gradient of the class with respect to the channel[23]. This workflow scheme was presented concisely in Fig. 1.
A total of twenty-four DL-CNN (eight algorithms for three kinds of images) were trained in the same training sample through deep learning package Keras (http://keras.io/). During the fitting process, the batch size was set to 30 and the epochs was 15~20 times until when no further obvious fluctuation of loss function was observed after tuning the models in the validation cohort (Figure.2). The performances of the eight algorithms were measured by the accuracy and loss function of the training and validation sets, and the accuracy, loss function, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) of the test set. The receiver operating characteristic curve (ROC) plots were completed by pROC package35 (https://cran.r-project.org/web/packages/pROC). The whole processes of DL-CNN including development, validation, testing and visualization were performed by R statistical software 3.6.1 (https://www.r-project.org/).