The Classication Between Prostate Transitional Zone Cancer and Hyperplasia Using Deep Transfer Learning from Disease-Related Images: A Retrospective Study

Background The diagnosis of prostate transition zone cancers (PTZC) remains a clinical challenge due to its similarity to benign prostatic hyperplasia (BPH) on MRI. The Deep Convolutional Neural Networks showed high ecacy in medical imaging but was limited by the small data size. A transfer learning method was combined with deep learning to overcome this challenge. Methods A retrospective investigation was conducted on 217 patients enrolled from our hospital database (208 patients) and The Cancer Imaging Archive (9 patients). Based on the T2 weighted images (T2WIs) and apparent diffusion coecient (ADC) maps of these patients, DCNN models were trained and compared between different TL database (ImageNet vs. disease-related images) and protocols (from scratch, ne-tuning or transductive transferring). Results PTZC and BPH can be classied through traditional DCNN. The ecacy of transfer learning from ImageNet was limited but improved by transferring knowledge from the disease-related images. Furthermore, transductive transfer learning from disease-related images had the comparable ecacies with the ne-tuning method. Limitations include retrospective design and relatively small sample size. Conclusion For PTZC with a small sample size, the accurate diagnosis can be achieved via the deep transfer learning from disease-related images.


Background
About 25% prostate cancers originate in transition zone (TZ), and its diagnosis remains a clinical challenge, due to its similarity to benign prostatic hyperplasia (BPH) on MRI [1,2]. The conventional transrectal ultrasound-guided biopsy face the dilemma of both underdiagnoses and overdiagnosis because of its invasive nature, the small size of minute tumor, and instinct limitations of H&E slides [3,4].
In addition, the screening e cacy in developing areas are impaired by the lack of radiology interpretation expertise. Consequently, several machine learning methods were investigated to classify prostate cancer from normal tissue or BPH [5,6].
Traditional machine learning methods are laborious because of the complex feature extraction procedure [7,8]. Most importantly, the selection of feature may be in uenced by different data sources and processing software, thus its generalization is limited. The Deep Convolutional Neural Networks (DCNN) automatically extract features in medical imaging diagnosis based on xed architectures [9][10][11]. Besides, several deep learning studies were based on multi-center database, which proved the robustness of DCNN model [12,13]. However, the DCNN model is a data-dependent classi er, in which a larger database yields a better result, but prostate TZ cancer (PTZC) images are usually scarce [2]. DCNN is developed to imitates how the visual cortex of the brain processes and recognizes images, but the learning procedure has always been regarded as a "black box" [14]. Human can learn one thing much easier and faster if they had learnt similar things at before, which is called transfer learning (TL). In machine learning, TL is designed to transfer the information from a certain source domain to the target domain [15,16]. This method is usually combined with deep learning to overcome the issue of small sample size [17,18]. By transferring similar features from everyday pictures to disease, previous DCNN studies proved the e cacy of TL based on a big natural image database called ImageNet [12,19,20].
However, a more effective way for human to learn one disease is to analogically learn another similar disease. The imaging manifestation of PTZC is similar to prostate peripheral zone cancer (PPZC) in some ways, and the data of PPZC is far more fruitful [21]. Consequently, to deal with the scarcity of PTZC and better differentiate it from BPH, we decided to imitate the analogical learning ability of human brain by combining TL and DCNN. In this current study, we trained DCNN models and compared varied TL database [ImageNet (natural images) vs. PPZC images (disease-related images)] and protocols (from scratch, ne-tuning or transductive transferring).

Patients
From May 2010 to March 2016, the detailed clinical information and MRI images of 309 patients from our Hospital were retrospectively recruited. After excluding patients received previous surgery or medication, lacked pathological diagnosis or with poor MRI qualities, 208 pathologically con rmed prostate cancer or BPH patients were enrolled in the current study. These patients underwent a series of MR scanning followed by radical prostatectomy or MRI-guided biopsy within a month. Gleason score equal or higher than 6 was considered as malignant tumor. Imaging data of these 208 patients from local dataset were enrolled, and their basic information was showed in Table 1. In addition, 9 PTZC patients from The Cancer Imaging Archive (TCIA) [22] were enrolled and resulted in a nal enrollment of 217 patients. Among these 217 patients, 81 were PPZC patients, 30 were PTZC patients and 106 were BPH patients ( Fig. 1).  Table S1. Patients had received axial T2 weighted images (T2WIs) and axial multiple b value DWI (multi-b DWI) scans. Apparent diffusion coe cient (ADC) maps were calculated by using two different b values (0 and 1,000 s/mm 2 ). Images from TCIA included axial T2WIs and axial ADC maps.

Preprocessing
Images were converted from DICOM to bitmap format. The locations of individual lesions on the MR image were independently determined by 2 radiologists with 15 years' experience (C.Y. and YZ.S.) (Fig. 2). After these 2 experienced radiologists read both modalities of images of 1 patient, ROIs were drawn by hand on all modalities and planes that the tumor is visible. If their decision are inconsistent, a senior radiologist with more than 20 years experience would make the nal decision (GB.C.) After that, the target lesion was cropped with a rectangular ROI, in which the tumor locates in the center of the image and occupies about 80% area. These cropped images were then resized to a 256 × 256 matrix by using bilinear interpolation method.
Data augmentation plays a vital role in the utilization of DCNN, and it can signi cantly improve the e cacy of DCNN classi er [23]. Images were augmented by using random cropping, mean subtraction, and mirror images, which were prebuilt options within the Caffe framework. Further augmentation includes 90°-rotation, vertical ipping, adding standard gaussian white noise and histogram equalization processed by using MATLAB (Matrix Laboratory 2016b, Mathworks, Natick, MA) [24]. Since some of these processed images are intrinsically different to the original images, test sets were not augmented all through the project to prevent biases.

Training procedure
Prostate images were processed via a series of operations to produce a predicting probability for each image. DCNN include diverse layers to form a pipeline of imaging features extraction and export nal output of these labels (i.e. The possibility of malignancy and hyperplasia for each cropped image.). These procedures were conducted with the "weights" in the whole network, which were randomly initialized before the training procedure. A DCNN is trained to discover and optimize the "weights". After dozens or hundreds of training, an optimized set of "weights" could be obtained to exert a satisfying predicting ability [18].
2.5. Transfer learning procedure TL can improve a classi er in one domain by transferring knowledge from a larger relevant domain. In some cases, this goal can be achieved by using the "off-the-shelf" "weights" trained with data from the relevant domain (Fig. 3c). The transferred "weights" could be used directly to classify the target data, a process often called transductive TL [16]. In other cases, the "weights" of the network are retrained with the target data, a process often called " ne-tuning" (Fig. 3b) [17]. These 2 methods were both testi ed in the current study.

Statistical analysis
For each image, the probability to be a malignant lesion was de ned to be the nal output. On the test datasets, receiver operating characteristic (ROC) curves were plotted according to these output values, and area under the curves (AUCs) and their 95% con dence intervals (CI) were determined [25]. For these ROC curves, comparisons between AUCs were made with DeLong and Clarke-Pearson method [26], and P values less than 0.05 were considered as statistically signi cant. When it came to multiple comparison tests, P values were corrected through a post hoc Bonferroni method [27]. The optimal accuracy, sensitivity, and speci city were determined from the optimal cutoff value by the  . 4a-iii). Then, 5 Alexnet models were trained through a 5-fold cross-validation method. In each training procedure, 4/5 data in training set was used to train the model, while the rest 1/5 was used as the validation set to select the optimal model. After that, the test set was used to testify the e cacy of those 5 models, and for each image, 5 probabilities (to be malignant) were derived. In the end, 5 probabilities of each image were averaged to calculate the nal output value.
Even with the small sample size, PTZC and BPH can be distinguished using DCNN model (Fig. 4b, Without TL modal). Using only PTZC and BPH data, T2WIs were associated with the AUC of 0.73 (95% CI = 0.63-0.83), as well as the sensitivity, speci city and accuracy of 69%, 75% and 81%, respectively. ADC images were associated with the AUC of 0.94 (95% CI = 0.90-0.99), as well as the sensitivity, speci city and accuracy of 84%, 97% and 89%, respectively. The diagnostic e cacy of Alex-Net DCNN model using ADC images was quite satisfying, but that using T2WI needed to be improved further.
3.2. The performance of TL from natural pictures (ImageNet) is limited by small data size An Alex-Net DCNN model were pre-trained with 1.2 million natural color pictures of ImageNet (Fig. 4a-ii) [28], and then, it was ne-tuned by using aformentioned target data (60 PTZC and 60 BPH patients).

TL from disease-related images (PTZC images) improved the diagnostic e cacy of DCNN model
Another TL model was pre-trained with images of the rest 76 BPH and 81 PTZC patients (Fig. 4a-i). This pre-trained model was ne-tuned with the aforementioned target dataset (60 PTZC and 60BPH patients).
Using the model trained from the disease-related dataset (Fig. 4b, TL-Related dataset), T2WIs were associated with the AUC of 0.86 (95% CI = 0.79-0.93), as well as the sensitivity, speci city, and accuracy of 90%, 69% and 80%, respectively. The diagnostic e cacy of TL-Related dataset model was signi cantly higher than that of Without TL model (P = 0.00014) or TL-ImageNet model (P = 0.00046).
ADC images were associated with the AUC of 0.97 (95% CI = 0.90-0.99), as well as the sensitivity, speci city, and accuracy 90%, 94% and 92%, respectively. However, there was no signi cant difference between AUCs of TL-Related dataset model and TL-ImageNet model (P = 0.88), or between TL-Related dataset model and Without TL model (P = 0.29).

Transductive method is a novel and effective way for TL
A transductive Google-Net and a transductive Alex-Net models were separately trained with the TL-Related dataset (images of 81 PPZC and 76 BPH patients), and the models were directly used to classify all PTZC and BPH images (Fig. 3c). Finally, ROC curves and the AUCs were obtained (Fig. 5).

Ensemble contributes to the stabilization of output values
Because each lesion may have multiple planes, ensembles were performed by averaging the output values of these planes to get a stable predicting output. Of these 20 patients, 2 patients were misdiagnosed with images of T2WI, while in ADC images, only 1 patient was misdiagnosed (Fig. 6).

Discussion
Based on the current investigations, we revealed that PTZC and BPH can be distinguished through traditional DCNN. The e cacy of TL from ImageNet to T2WI and ADC images was limited but that from the disease-related images signi cantly improved the diagnostic e cacy of T2WI. Besides, transductive TL from disease-related images had similar diagnostic e cacies to ne-tuning method on both T2WI and ADC images. In addition, we found that DCNN model is robust enough to process images from different source.
The e cacy of TL from ImageNet was proved to be signi cant in previous studies, but their ndings contradicted with ours [12,18,29]. In our study, the diagnostic e cacy was improved by TL from natural images, but the improvement was not signi cant. This maybe because of that the DCNN model had indeed learned useful texture characteristics from natural images and had applied it on the diagnosis of prostate TZ cancer, but the e cacy was still limited with the small sample size of rare disease. On the contrary, the DCNN model trained with disease-related images performed signi cantly, which means the DCNN model is to some extent like human brain network, and learning directly from related disease is de nitely a more effective way. There were also studies that focused on transductive TL, but according to our knowledge, very few studies had applied this method on medical images [30][31][32]. It is worth testing the application possibility of transductive DCNN for diseases before it can step from bench to bed. In the future, this TL method could be used to diagnose rare diseases, such as differentiating lung cancer cell lymphatic metastasis from normal lymphatic tissue by transferring information from lung cancer.
Previous researches suggests that the DCNN is effective in classifying prostate cancer and BPH or other benign lesions, but few researches were speci cally conducted to classify PTZC and BPH [33][34][35]. This issue should be concerned, because the diagnostic accuracy between PTZC and BPH in previous studies may be biased with the far more PPZC patients. These studies also revealed that T2WI and ADC map are two most e cient protocols. We revealed that the diagnostic e cacy of T2WI was lower than ADC images when the sample size is small, which was partly made up by TL. As a result, we assumed that although imaging features of ADC images were simple and effective, the potential of T2WI could be tapped more fully by experienced radiologists. In the developing regions where the advanced MRI is lacked, the full utilization of T2WIs using the current TL-DCNN strategy could be a critical and practical way to improve the diagnostic e cacy.
Our study has two limitations. First, although we applied a TL method to make up for the shortcomings caused by the small sample size, it still had limited help for practical applications. Second, because of the retrospective nature of the current study, some bias cannot be ruled out, thus a randomized controlled trials (RCT) should be conducted in the future.

Conclusion
The diagnostic e cacy of the Alex-Net model for differentiating PTZC and BPH could be signi cantly improved by transferring the disparity information between PPZC and BPH, which was obviously better than TL from ImageNet. Furthermore, transductive TL model trained with the data of PPZC and BPH could be directly used to classify PTZC from BPH.

Consent for publication
Written informed consent for publication was obtained from all participants.

Availability of data and material
Please contact authors for data requests.
Competing interest Figure 1 The patients recruiting procedure. PZC = peripheral zone cancer, TZC = transitional zone cancer, BPH = benign prostatic hyperplasia. The procedure of TL. TL was conducted by transferring the adjustable weights from the model trained with the either data from ImageNet to our target domain (A) or disease-related domain (C). (B) Feature extraction would be conducted with "weights" of the network. After dozens or hundreds of times of training, these "weights" would be optimized. ReLu = recti ed-linear activation.   T2WIs and ADC map of the patient were misdiagnosed in both protocols after the ensemble procedure.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.