2.1. Study design
The current, retrospective study was approved by the Institutional Review Board of our institution and the requirement for written informed consent was waived (approval number: R02-054). The inclusion criteria are stated as follows: (A) woman above 20 years of age, (B) pelvic MRI scan obtained as per the protocol followed at our hospital during the time period from January 2015 to May 2020, (C) hysterectomized and pathologically confirmed as endometrial cancer (cancer group), and (D) pathologically or clinically definitely benign lesions (non-cancer group). The exclusion criteria are stated as follows: (A) history of treatment for uterine diseases and (B) macroscopically non-mass-forming cancers according to pathological reports. A flowchart for the patient selection process is presented in Figure 1.
Figure 2 shows a flow diagram of the study design. As shown in Figure 2a, Experiment 1 constructed CNNs for diagnosing the presence of endometrial cancer. Single and combined image sets of T2-weighted image (T2WI), apparent diffusion coefficient of water (ADC) map, and contrast-enhanced T1-weighted image (CE-T1WI) were used to validate optimal imaging conditions for CNN, and we compared their diagnostic performance with those of experienced radiologists. As shown in Figure 2b, Experiment 2 verified whether the diagnostic performance could be improved by the addition of sequences and cross sections, other than the same type as the test image set, to the training data.
2.2. MRI acquisition
The MRI scan was performed using 3T or 1.5T equipment (Ingenia®, Achieva®; Philips Medical Systems, Netherlands) with a 32-channel phased-array body coil. The protocol employed to obtain the image of the entire uterus along the uterine axis included T2WIs, Diffusion weighted images (DWIs) (b-value: 0, 1000), and CE-T1WIs of the equilibrium phase (Table 1). Gadopentetate dimeglumine 5 mmol (Magnevist® 0.5 mol/L or Gadovist® 1.0 mol/L; Bayer, Germany) was used for CE-T1WIs. The gadolinium dose varied according to the patient's weight, as recommended (0.2 ml/kg). Bolus intravenous contrast injection rate was 4 mL (2 mmol)/sec (in case of Gadovist, dilute with saline solution and inject at 4 ml/sec).
Table 1
MRI acquisition parameters
Scanner | Sequence | Cross- section | Type | TR/TE (ms) | FA (degree) | Slice/Gap (mm) | FOV (mm) | Matrix |
Ingenia® 3.0T | T2WI | Sg | 2D-TSE | 1400/110 | 90 | 3-5/0.3-0.5 | 280 | 640 × 640 |
| T2WI | Ax | 2D-TSE | 4955-5789/100-110 | 90 | 3-5/0.3-0.5 | 280 | 704 × 704 |
| DWI | Ax | EPI | 6500-7500/77-79 | 90 | 3-5/0.3-0.5 | 280 | 224 × 224 |
| CE-T1WI | Sg | 3D-GRE SPIR | 4/2 | 10 | 3.3/1.6 | 280 | 576 × 576 |
| CE-T1WI | Ax | 3D-GRE SPIR | 4/2 | 10 | 3.3/1.6 | 280 | 576 × 576 |
Achiva® 1.5T | T2WI | Sg | 2D-TSE | 1400/100-110 | 90 | 3-5/0.3-0.5 | 280 | 512 × 512-640 × 640 |
| T2WI | Ax | 2D-TSE | 1400-6013/100-110 | 90 | 3-5/0.3-0.5 | 280 | 512 × 512-704 × 704 |
| DWI | Ax | EPI | 3963-7500/70-77 | 90 | 3-5/0.3-0.5 | 280 | 224 × 224-256 × 256 |
| CE-T1WI | Sg | 3D-GRE SPIR | 4-5/2 | 15 | 4.4/2.2 | 280 | 336 × 336-576 × 576 |
| CE-T1WI | Ax | 3D-GRE SPIR | 5/2 | 15 | 2/1 | 250-280 | 320 × 320-576 × 576 |
TR, repetition time; TE, echo time; FA, flip angle; FOV, field of view; Sg, sagittal; Ax, axial; TSE, turbo-spin echo;.EPI, echo planar imaging; GRE SPIR, gradient echo spectral pre-saturation with inversion recovery. |
2.3. Data set
The image slices comprising the endometrium were extracted to create a dataset. In the cancer group, the sequences and pathological findings were considered and only the image slices depicting the tumor were visualized and extracted, as per the consensus of two radiologists (A.U., T.S.). The same cross-sectional images were extracted for all the sequences.
A total of 485 patients were randomly assigned to the training and testing groups. In the training phase, images obtained from 388 patients (204 and 184 patients in the cancer and non-cancer groups, respectively) were used; 2,905 axial images (1,471 and 1,434 images in the cancer and non-cancer groups, respectively) were used in each T2WI, ADC map, and CE-T1WI; 1,105 sagittal images (624 and 481 images in the cancer and non-cancer groups, respectively) were used in both T2WI and CE-T1WI. In the testing phase, only one central image of the stack was extracted, and 97 images (51 and 46 images from the cancer and non-cancer groups, respectively) were used in each sequence and cross-section.
The digital imaging and communications in medicine (DICOM) images were converted to joint photographic experts group (JPEG) images using the viewing software Centricity Universal Viewer (GE Healthcare, Chicago, Illinois, United States) because the graphical deep learning software we used could not handle the DICOM data itself. Subsequently, the JPEG images were resized to 240 × 240 pixels by trimming the margins using the XnConvert (Gougelet Pierre-Emmanuel in Reims, France), in order to perform the analysis. Along with the five single image sets, four combined image sets, including axial T2WI + ADC map, axial T2WI + CE-T1WI, sagittal T2WI + CE-T1WI, and axial T2WI + ADC map + CE-T1WI, were created for training and testing. The axial images were vertically combined (240 x 480 or 240 x 720 pixels) and the sagittal images were horizontally combined (480 x 240 pixels) using ImageMagick (15).
2.4. Experiment 1: Diagnostic performance for single and combined image sets: CNN vs. radiologists
The current study compared the diagnostic performance of the CNNs and three board certificated radiologists with 27, 26, and 9 years of experience in pelvic MRI interpretation (T.M., K.M., and T.I.) using five single image sets and four combined image sets. The same types of single or combined image sets were used for training and testing. The radiologists were blinded to the clinical and pathological findings and independently reviewed the 97 randomly ordered test images in each image set and reported the presence or absence of cancer. The interpretation commenced with single image sets (ADC map first), followed by combined image sets. A time interval of one week was maintained between the sessions of interpretation.
2.5. Experiment 2: CNN in testing single image sets using different image sets for training
Experiment 2 investigated whether the addition of different types of image sets for training improved the diagnostic performance of CNNs. The CNN was trained using images of the same sequence regardless of the cross-sections, same cross-sectional images regardless of the sequences, and all images regardless of the sequences and cross-sections, in order to test five single image sets; only single image sets were used for training and testing.
2.6. Deep learning with convolutional neural networks
Deep learning was conducted on Deep Station Entry (UEI, Tokyo, Japan) with a GeForce RTX 2080Ti graphics processing unit (NVIDIA, Calif, USA), a Core i7-8700 central processing unit (Intel, Calif, USA), and the graphical deep learning software Deep Analyzer (GHELIA, Tokyo, Japan). The conditions optimized based on the ablation and comparative studies of the previous research were as follows: CNN with Xception architecture (16) was used for deep learning and ImageNet (17) which consists of natural images was used as pre-training. The parameters of optimization are stated as follows: optimizer algorithm = Adam (learning rate = 0.0001, β1 = 0.9, β2 = 0.999, eps = le-7, decay = 0, AMSGrad = false). The batch size was automatically selected. Horizontal flip, rotation (±4.5°), shearing (0.05), and zooming (0.05) were automatically used as the data augmentation techniques. The CNNs were generated by setting the training/validation split ratio to 9:1, 8:2, or 7:3, and the epochs to 50, 100, 200, 500 or 1000 and the diagnostic results of each were validated. The training/validation split ratio and epochs were selected for each image set on the basis of the best performance among the CNNs with sensitivity and specificity above 0.75 (Table 2).
Table 2
The best settings for training/validation split ratio and epoch in Experiment 1 and 2
Test image set | Training image set | Training/validation split ratio | Epoch |
Experiment 1 | | | |
Axial ADC map | Axial ADC map | 9:1 | 100 |
Axial T2WI | Axial T2WI | 9:1 | 50 |
Sagittal T2WI | Sagittal T2WI | 8:2 | 50 |
Axial CE-T1WI | Axial CE-T1WI | 8:2 | 200 |
Sagittal CE-T1WI | Sagittal CE-T1WI | 8:2 | 100 |
Combined axial T2WI + ADC map | Combined axial T2WI + ADC map | 9:1 | 100 |
Combined axial T2WI + CE-T1WI | Combined axial T2WI + CE-T1WI | 9:1 | 100 |
Combined sagittal T2WI + CE-T1WI | Combined sagittal T2WI + CE-T1WI | 9:1 | 50 |
Combined axial T2WI + ADC map + CE-T1WI | Combined axial T2WI + ADC map+ CE-T1WI | 9:1 | 200 |
Experiment 2 | | | |
Axial ADC map | All axial | 8:2 | 50 |
Axial ADC map | All | 9:1 | 50 |
Axial T2WI | All T2WI | 8:2 | 100 |
Axial T2WI | All axial | 9:1 | 50 |
Axial T2WI | All | 9:1 | 50 |
Sagittal T2WI | All T2WI | 8:2 | 200 |
Sagittal T2WI | All sagittal | 8:2 | 200 |
Sagittal T2WI | All | 8:2 | 100 |
Axial CE-T1WI | All CE-T1WI | 8:2 | 50 |
Axial CE-T1WI | All axial | 8:2 | 100 |
Axial CE-T1WI | All | 8:2 | 100 |
Sagittal CE-T1WI | All CE-T1WI | 9:1 | 100 |
Sagittal CE-T1WI | All sagittal | 9:1 | 50 |
Sagittal CE-T1WI | All | 9:1 | 100 |
T2WI, T2 weighted image; ADC, Apparent Diffusion Coefficient; CE, contrast enhanced. |
2.7. Statistical analysis
Statistical analyses were conducted using EZR (Saitama Medical Center, Jichi Medical University, Saitama, Japan), a graphical user interface for R (The R Foundation for Statistical Computing, Vienna, Austria), and SPSS software (SPSS Statistics 27.0; IBM, New York, NY, USA). The clinical values for each group were compared using the Mann-Whitney U test and the chi-square test. The test data set was used to evaluate the sensitivity, specificity, and accuracy in cancer diagnosis. The receiver operating characteristic (ROC) analysis was performed to evaluate the diagnostic performance (18). For statistics, 95% confidence intervals (CIs) and significant differences were estimated. P < 0.05 was considered to be significant.