Identifying Malignant Lymph Nodes of Prostate Cancer Patients Using A Combination of Pre-Trained Deep Models and Traditional Machine Learning Classiers

Prostate cancer is the second most common new cancer diagnosis in the United States. The prostate gland sits beneath the urinary bladder and surrounds the rst part of the urethra. Usually, prostate cancer is slow-growing; stays conned to the prostate gland; and can be treated conservatively (active surveillance) or with surgery. However, if the cancer has spread beyond the prostate, such as to the lymph nodes, then that suggests the cancer is more aggressive and surgery is not adequate. In those cases, radiation and/or systemic therapies (e.g., chemotherapy, immunotherapy) are required. The challenge is that it is often dicult for radiologists to differentiate malignant lymph nodes from non-malignant ones with current medical imaging technology. In this study, we design a scalable hybrid approach utilizing a deep learning model to extract features into a machine learning classier to automatically identify malignant lymph nodes in patients with prostate cancer.


Introduction
"Cancer," "malignancy," and "tumor" are all terms for uncontrolled growth of cells in the human body. Cancer not only threatens the organ it originates in, but also has the potential to spread to other organs.
There are many different types of cancer. In 2017 in the United States, prostate cancer was the second most common new cancer diagnosis and the third most common cause of cancer-related death [1]. Only males have prostate glands, so prostate cancer is only found in men. The prostate gland sits beneath the urinary bladder and surrounds the rst part of the urethra. The urethra carries urine and semen outside the body through the penis. The seminal vesicles, which generate the major component of semen, are glands located above the prostate, behind the urinary bladder. The prostate gland also generates components of semen [2]. Prostate cancer, as the name suggests, is excessive growth of cells originating in the prostate gland. Cancer follows two typical routes of spread. The rst route is through small drainage channels in the lymphatic system. The lymphatic system helps drain tissues and provide a meeting place for white blood cells. Cancer can follow these channels to the nodes of the lymphatic system, lymph nodes, and set up a site of excessive growth. The second route of metastasis is through the blood, also known as hematogenous spread. This is more onerous, because the blood carries cancer cells to other organs, especially organs rich in vascular supply. However, prostate cancer is often slowly growing and initially con ned to the prostate gland [3].
There are many treatment regimens for prostate cancer. Some patients may need no treatment, opting instead for active surveillance, which is a periodic checking-in on the state of their disease. Other patients need surgery, chemotherapy, immunotherapy, radiation therapy, or often a combination of these. The decision to intervene and the best intervention hinges on the cancer's stage. Staging cancer is a nontrivial task as it involves determining primary tumor growth and secondary, metastatic extent. Prostate cancer can grow into adjacent organs, eg, the organs described above, but often spreads to pelvic/retroperitoneal lymph nodes and to bones. As overall prostate cancer tumor burden often determines the treatments offered, reliable staging is important. In this respect, imaging plays a key role.
Magnetic Resonance Imaging (MRI) has emerged as an important imaging modality for assessment of tumor invasion and pelvic lymph node metastases [4]. However, determination of lymph node metastatic status can be challenging because abnormal (cancerous) and normal lymph nodes often appear similar.
As such, the sensitivity of imaging for lymph node metastasis in prostate cancer is low [5].
In this study, we have designed an integrated deep learning-machine learning pipeline for the purpose of identifying malignancy in the lymph nodes of prostate cancer patients.

Literature Review
Owing to the computational capabilities associated with most modern computers, as well as the availability of enormous quantities of data, deep learning is in pervasive use in medical imaging applications. The most common type of deep neural network is the convolutional neural network (CNN).
CNNs are modelled on the visual cortex of the human brain. They are extensively used for imaging, video and audio applications. The authors of [6] have conducted a detailed survey pertaining to the e cacy of CNNs in various types of object detection applications. The structure and working of LeNet, AlexNet, ZFNet, GoogleNet, VGGNet, ResNet, SENet, DenseNet, Xception, PNAS/ENAS, along with their variations, have been discussed in detail.
One main application of CNN may be to identify salient features. For example, [7] studied the applicability of CNNs in performing component-based and age-invariant face recognition. The proposed methodology addressed the association of facial components with a relevant face, with commendable results.
Features extracted from CNNs were subsequently fed into two dimensionality reduction algorithms: the Fisher linear discriminant analysis (FLDA) and locality preserving projections (LPP). The reduced features were classi ed using the nearest neighbor algorithm. The features extracted using CNNs yielded accuracy of 91% and 90% respectively, better than that using histogram of gradient (HoG) and Gabor transform features. Other than feature extraction, CNN has been studied for detection, classi cation and segmentation tasks in medical research. For example, [8] proposed a CNN-based double-branched model wherein one branch was utilized for feature extraction, the other for segmentation for multiple abnormality detection from medical images. [9] proposed the CemrgApp, a CNN model, to classify cardiovascular properties from cardiovascular magnetic imaging (CMRI) scans of different cardiac patients, for e cient diagnosis and treatment. A multi-label CNN was used to segment out the atria and atrial structures from the CMRI scans. The proposed framework was trained on 207 manually annotated CMRI scans, ultimately achieving a Dice score of 0.91 ± 0.02 for atrial blood pool segmentation. [10] and [11] implemented CNNs and their variants in automatic lesion detection, and multiple abnormality detection from medical images. [12] developed a deconvolutional CNN for classi cation of acute lymphoblastic leukemia, a type of cancer of the white blood cell. [13] designed a multi-network feature extraction model using pre-trained deep CNNs to aid the breast cancer diagnosis. [14] offered a concise introduction to multiscale CNNs, and their applicability in the classi cation of cells from medical images. [15] developed multiscale all convolutional neural network (MA-CNN) for breast cancer classi cation using mammogram images. [16] designed deep CNN ensembles for the purpose of segmentation in infant brain MRI images. [17] segmented anomalies in abdominal CT images by CNN, and then classi ed them using fuzzy SVM.
While the success of CNN attracts great attention in medical research, it is not without limitation. As indicated in [18], clinical studies often have limited samples which posed great challenges to CNN model. One solution is transfer learning, a technique used to train deep networks on small datasets. Transfer learning refers to the migration of knowledge between applications. Owing to restrictions on sample quality, data availability, lack of domain knowledge et cetera, it is ofttimes challenging to develop robust models based only on the resources available for the purpose of the application. In such scenarios, researchers would train models that had previously acquired some knowledge from similar tasks, or data sets and transferred the pre-trained model to the dataset of interest. For example, [19] selected color optic disc-centered fundus images using active learning; subsequently identifying glaucoma using transfer learning on a deep CNN. In [20], a problem-based architecture of DCNN called ChestNet was proposed. This variant of DCNN, ChestNet, was pre-trained on a set of relevant and irrelevant data sets; before nally being trained on the Pediatric Chest X-ray dataset for detection of pulmonary consolidation. [21] gives a concise yet informative description about ChestNet, and its applicability in the detection of thoracic diseases on chest images. [22] developed an ensemble of ve of the most commonly used deep CNN models (AlexNet, DenseNet121, InceptionV3, ResNet18 and GoogLeNet) pre-trained on ImageNet, for the purpose of pneumonia detection in the Guangzhou Women and Children's Medical Center dataset of chest X-rays.
In many a case, it is challenging for humans or deep learning methodologies alone to extract the most important set of features from medical image data sets. Hence researchers oftentimes need to go with a combination of machine learning and deep learning approaches in order to utilize the representational capabilities of deep models, while at the same time not over tting the data. [23] has followed one such approach. The researchers have access to 58 in-house brain MR images, and 128 MR images from The Cancer Genome Atlas-all of patients with high-grade glioma. For each patient, the researchers calculated 348 hand-crafted radiomics features; and extracted 8192 using a pre-trained deep CNN. Next, they performed feature selection and Elastic Net-Cox modeling to classify patients into long-and short-term survivors. [24] was a detailed study of ROI-based opacity classi cation of diffuse lung diseases in chest CT images. It used the Cifar-10 and Cifar-100 data sets for pre-training deep CNNs; subsequently, a CT image data set of diffuse lung diseases for parameter tuning, and classi cation. It delved into the structure of CNN used, and how to implement pre-training and parameter tuning. This paper offered the insights about the relation between the type and characteristics of the data sets used for pre-training and parameter-tuning, and the effectiveness of the transfer learning model. In [25], the researchers implemented transfer learning on CNN; and extreme learning machine to classify between malignant and benign pulmonary nodules on CT images. The deep CNN, pre-trained with the ImageNet data set, was used to extract high-level features of pulmonary nodules. These features helped in the classi cation of benign and malignant pulmonary nodes, using the extreme learning machine (ELM) model. [ Conventional machine learning classi ers depend heavily on manually extracted features, which are most often not good representations of the data. CNNs, on the other hand, have an innate capability of extracting consequential feature vectors from images, even though the extracted features might not be the most appealing or meaningful to the human eyes. As reviewed above, existing research shows some success of using deep learning model (e.g., pre-trained model taking advantage of transfer learning) to extract features used in machine learning models. Motivated by this, we propose an integrated deep learning-machine learning pipeline to utilize the representational capabilities of CNNs, yet at the same time not over t the model on small prostate imaging dataset. Here we propose to extract features from raw images using a pre-trained CNN, which is subsequently used as a feature vector for classi cation by a classical machine learning classi er. Prior to classi cation, we perform feature selection procedures on the extracted features in order to mine out the most consequential features. The feature selection framework comprises a sequential combination of statistical and machine learning-related algorithms. The Random Forest algorithm is utilized as part of the feature selection framework to automatically select the most important features and generate their respective importance values in percentage. The reduced features yield a classi cation accuracy of 76.19%, precision of 81.08% and F1-score of 75.93% when classi ed using a decision tree classi er with 10-fold cross-validation.

Methodology
We have designed a deep learning-machine learning combined model wherein the deep learning module is employed for the purpose of feature extraction from the raw images; and the features are subsequently classi ed using a machine learning classi er. In this section, we present the various components, as well as the classi cation results of the deep learning-machine learning combined framework. Figure 1 provides a schematic representation of the overall work ow, starting from the raw images of lymph nodes, to their automatic categorization as malignant/ non-malignant. The methodology includes feature extraction, feature selection and classi cation. In the proposed model, we rst extract a 512-element feature vectors per image from the average pooling layer of the ResNet18 pre-trained model. Then we perform our feature selection algorithm on this feature vector. That leaves us with the most important set of features, which are subsequently classi ed utilizing a machine learning classi er. By employing such a combination of deep learning (for feature extraction) and machine learning (for classi cation) models, we can ensure that we have a model with a nuanced representation of the images, however at the same time assuaging the problem of over tting on the small dataset.
Feature extraction using deep learning The inability of the machine learning model in generating stable classi cation performance is owing to the fact that the second-order statistical features generated are not representative of the original image set. In other words, we need a higher dimensional representation of the image data, and we require a set of features that are capable of describing the raw scans in greater detail. However, on having performed classi cation using CNNs and pre-trained deep networks, we were quick to realize that our image dataset of 126 samples kept running into the over-tting problem. Therefore, we decided to extract features using a deep model, and then perform classi cation using a machine learning classi er.
A 71-layer ResNet18 model pre-trained on the ImageNet dataset was used to extract features from the raw scans. Weights from the fth and last pooling layer were extracted and used as weights for classi cation between malignant and non-malignant scans. Note that the weights obtained using this approach are concatenated to generate a 512-element feature vector corresponding to each lymph node scan. Figure 2 provides a pictorial representation of the ResNet18 model that has been utilized in this work. The model comprises a total of 71 layers, out of which the trained weights from the "average pooling" layers are used for the purpose of classi cation.

Feature selection mechanism
We employed an ensemble technique to mine out the most consequential features from the feature vector obtained using the pre-trained model, in order to perform classi cation of malignant and non-malignant lymph nodes. Please note that the feature matrix has been arranged in such a manner that the rows depict the samples, and the columns are representative of the features. The feature selection process is as depicted in table 1. We employ the Random Forest algorithm, one of the most widely used algorithms in machine learning for the purpose of classi cation as well as feature selection. In the Python programming language, the Random Forest library comes with an in-built importance generation function -which, based on the feature set and the dependent variable, returns the most important features along with their percentage of importance. Figure 3 provides a schematic depiction of the feature selection algorithm.

Electrophoretic gels and blots
The display of cropped gels and blots in the main paper is encouraged if it improves the clarity and conciseness of the presentation. In such cases, the cropping must be mentioned in the gure legend and the supplementary information should include full-length gels and blots wherever possible. These uncropped images should be labeled as in the main text and placed in a single supplementary gure. The manuscript's gure legends should state that 'full-length blots/gels are presented in Supplementary Figure X.'

Page 7/12
The results section can include subheadings.

Machine learning classi cation
The salient features obtained from the feature selection algorithm (section III.B) were fed into a machine learning classi er to differentiate the malignant versus non-malignant lymph nodes. 10-fold crossvalidation was conducted to avoid over tting. Five classi ers were explored including Decision Tree (DT), Discriminate Analysis (DA), Naïve-Bayes (NB), Support Vector Machine (SVM) and K-Nearest Neighbor (KNN).

Dataset
A radiological dataset was provided by Mayo Clinic, Arizona. The dataset comprised multiple de-identi ed gray-level MRI scans of lymph nodes, obtained using a varied range of image contrast types from a clinical trial of 15 prostate cancer patients. The patients underwent prostate MRI before prostatectomy and pelvic lymph node dissection as part of a trial. Tissue was submitted for pathologist review. The location of each lymph node was con rmed, and labels of the pre-operative MRI data were generated. The labels were "positive" (meaning harboring metastatic cancer cells) and "negative" (no cancer metastases were found). There was a total of 126 lymph node images: 41 positives and 85 negatives. The four MRI sequences, each with different tissue contrast characteristics, are as follows: Apparent Diffusion Coe cient (ADC), Fast Recovery Fast Spin Echo (FRFSE), Pelvis (T2 FatSat) and MRI with Gadolinium Contrast (Water-GAD). Figure 4 comprises a prostate MRI from the same patient; four MRI sequences shown. Figure 5 depicts the receiver operating characteristics curve on the reduced dimensional data, for ve classi ers with 10-fold cross validation: Five classi ers were explored including Decision Tree (DT), Discriminate Analysis (DA), Naïve-Bayes (NB), Support Vector Machine (SVM) and K-Nearest Neighbor (KNN). It was observed that DT model has the highest AUC (95.86%), followed by DA (93.41%), NB (87.64%), SVM (56.89%) and KNN (55.81%). The best performing model-the 10-fold DT classi er-yields 76.19% accuracy, 69.05% sensitivity, 81.08% precision, and 75.93% F1-score. Figure 6 depicts the region operating characteristics curve on the reduced dimensional data as well as the original CNN features, for 10-fold decision tree classi er. The AUC curve for the 10-fold decision tree classi er on the original, 512element feature matrix obtained from the ResNet18 model is 68.00%; whereas that on the selected feature set is 96.40%.

Discussion And Conclusions
This case study proposes an integrated pipeline to help detect malignant lymph nodes in patients with prostate cancer. The accuracy obtained by our approach had a higher sensitivity and comparable speci city compared to sensitivity and speci city reported in prior studies relying on imaging alone. For example, [5] states that the area under receiver operating curve (AUC) varies between 0.69 and 0.81 for prostate cancer detection over multi-parametric MRIs, which includes diffusion-weighted imaging (DWI). The authors of [29] have designed an automatic deep CNN-based architecture to detect prostate cancer on diffusion-weighted magnetic resonance imaging (DWI In medical imaging research, one of the primary challenges is the fact that interpretability varies greatly between radiologists. The PI-RADS architecture is pervasively utilized for the purpose of image interpretation. However, [31] exhibits the many impending issues of inter-observer interpretability associated with the PI-RADS model. Owing to the innate di culties associated with identifying anomalies in prostate MRIs, there seems to be varying consensus among researchers and radiologists alike, with respect to determining the best identi cation methodologies. Sample size-as in the case with our study as well-often proves to be an important factor that determines the selection of the classi cation model. While we know that machine learning performs better on datasets with limited samples, we also acknowledge the capability of deep models to extract more meaningful features. In [32], the authors demonstrated that on a dataset of multiparametric MRIs obtained from 52 prostate cancer patients, hierarchical clustering performed better than deep models in differentiating between normal and tumor prostate tissues. While the statistical measures that we have employed to evaluate our model seem to work well, we do acknowledge that there might be more medically sound metrics for measuring performance which-to a certain degree-predict the chance of survival as well. However, in order to evaluate such parameters, we need to have an in-depth understanding of associated biomedical processes, as well as access to a wide variety of radiological features. In this study, it is interesting to observe that Discriminant Analysis classi er performs well which is indicative of the fact that there might be two well-de ned Gaussian clusters present in the data. In the future, we intend to utilize this prospect and explore generative modeling based classi cation techniques. In addition, it is our intention to extend our analysis to subsequently be able to localize the lesion region in lymph node images where cancer has been detected. From previous experience with working on similar medical imaging datasets, we have noticed that having an additional set of "difference features" helps in not only localizing the lesion region, but also monitoring the progression of lesion over time. This basically involves subtracting time-variant images from a xed baseline data. In this case, since we have a control image of an unaffected portion of lymph node provided, corresponding to every cancerous region of the lymph node, we could use the control as a baseline image to then perform localization. However, for that to work effectively we need more positively labeled (or cancerous) samples; at the moment, we have too few lesion lymph node images to train a full-edged model to perform localization. However, on acquiring more samples, if this sees fruition, then we should be able to use the classi er in cascade with the localization framework.  Tables   Due to technical limitations, table 1 is only available as a download in the Supplemental Files section. Figure 1 A schematic representation of the overall work ow-starting from the raw images of lymph nodes, to their automatic categorization as malignant/ non-malignant. The methodology includes feature extraction, feature selection and classi cation.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.