2.1. Overview
The aim of this paper is to characterize two major types of brain tumors, metastases and glioblastomas, based on microstructural characteristics of peritumoral edema derived from their DTI-based free water volume fraction (FW-VF) map. We create a 2D CNN-based classifier trained on FW-VF patches extracted from random locations inside the peritumoral edema to distinguish brain metastases and glioblastomas. We first describe the patient data and then the details of the CNN-based classifier. The performance of the CNN trained on the FW-VF map is then compared to those trained on: i) standard fractional anisotropy (FA), ii) mean diffusivity (MD), iii) combination of FA and MD, iv) free water corrected fractional anisotropy (FW- FA), v) axial (FW-AX), vi) radial (FW-RAD) diffusivity, and vii) combination of FW-FV and FW-FA. Lastly, the CNN is compared with texture and radiomic features.
2.2. Patient data and creation of free water volume fraction (FW-VF)
This study was approved by the institutional review board of University of Pennsylvania. Informed consent was obtained from all participants or their legally authorized representative. All methods were carried out in accordance with relevant guidelines and regulations. The dataset included 143 patients with brain tumors: 89 glioblastomas and 54 metastases. The ages of the patients ranged from 19 to 87 years; there were 66 male and 77 females. dMRI / DTI data was acquired on two types of scanners, 123 patient with the Siemens 3T TrioTim and 20 patient with the Siemens 3T Verio, both with TR/TE=5000/86 ms, resolution = 1.72 x 1.72 x 3 mm, 3 b = 0 s/mm2 volumes, and 30 diffusion weighted volumes with b= 1000 s/mm2.
The dMRI data was pre-processed using local PCA denoising [22], eddy current and motion correction performed using FSL EDDY [23], and skull-stripping with BET [24]. Masks of the tumor and edema for each patient were created using GLISTR [25, 26], a semi-automated tumor segmentation tool. FA and MD maps were computed after DTI fitting with DIPY using weighted least squares [27]. For estimating the free water volume fraction from single shell DTI, we used Freewater EstimatoR using Interpolated Initialization (FERNET) [17], a free water elimination paradigm using a novel interpolated initialization approach, that estimates the free water compartment in single-shell diffusion MRI data. FERNET provides a free water volume fraction map (FW-VF), and free water corrected fractional anisotropy (FW-FA), axial diffusivity (FW-AX) and radial diffusivity (FW-RAD), for every patient from their pre-processed dMRI data [28].
2.3. CNN based classifier using FW-VF for discriminating metastatic tumors from glioblastomas
We created a CNN classifier trained on patches derived from the peritumoral region to assign a label of metastasis or glioblastoma. Figure 1 shows the pipeline of our approach. We extracted input patches for our CNN from random locations in the peritumoral area of metastatic and glioblastoma subjects. A set of (16 ×16) patches from peritumoral edema was extracted for every subject. This was the largest patch we could fit into peritumoral edema without overlapping into the tumor and was large enough for the CNN classifier to capture specific patterns. In the case it was more than one tumor in a patient, we selected patches from the peritumoral areas of all of the tumors. We iteratively chose patches in random locations in edema, and toward random coordinates, and discarded patches that included the tumor itself. The number of patches in every subject was estimated based on the number of edema voxels in that subject divided by the number of voxels in the patch (16 ×16=256). All patches extracted from a subject were assigned the tumor label of the subject, that is, metastasis or glioblastoma.
The classifier was based on convolutional neural networks (CNNs) [29], which are a special kind of neural network, composed of a set of convolutional and pooling layers in their architecture. Our convolutional (conv) layers were connected to local parts of input patches to detect local features from them. We put 6 convolutional layers followed by pooling (pool) layers that reduced the dimension of the features. We put a max pooling and a global average pooling layer that calculated the average of each feature map and prepared the feature vector for the classification layer. We put a softmax layer (soft) at the end that produced a probability value for every input patch which indicated its membership to each class (metastases or glioblastomas, in our case) [30]. As we trained our classifier on patches with different patterns of extracellular water, this number illustrated the local signature of extracellular water for each patch. The label for each patch was assigned to a class with maximum probability value.
The data was randomly divided into two different subsamples to train (113 patients) and test (30 patients) the classifier. Data augmentation was done on the patches by shifting them in different directions, allowing up to 20% overlap with healthy brain. This was to avoid overfitting of the classifier. We created ~6000 patches (~2700 Metastases and ~3300 Glioblastomas) from the 113 training subjects. CNN training was done for 200 epochs using the rmsprop optimizer and a cross entropy loss function.
For testing, the CNN classifier assigned labels to all patches in the peri-tumoral area of test subjects. The final label of a subject was calculated using majority voting among the classification on patches.
2.4. Evaluation of the classifier performance
The following measures were used to evaluate the performance of the CNN classifier:
With glioblastoma standing for our positive class, true positive (TP) represents the number of cases correctly recognized as glioblastoma, false positive (FP) represents the number of cases incorrectly recognized as glioblastoma, true negative (TN) represents the number of cases correctly recognized as metastasis and false negative (FN) represents the number of cases incorrectly recognized as metastasis. Thus, sensitivity represents the recall value for glioblastoma class and specificity represent recall the values for metastases class.
2.4.1. Cross validation and test results
We evaluated our CNN classifier in 5-fold cross-validation, and a test setting. For cross-validation, the patches from all the trainings were shuffled and randomly partitioned into 5 equally sized subsamples. For each run, a single subsample was retained as test data while the remaining 4 subsamples were used as training data, and this process was repeated for all 5 subsamples. The reported measures were averaged among all folds which represented the performance of 2D CNN classifier over the patches.
In addition to our cross-validation settings, the CNN classifier was evaluated on a set of 30 independent test subjects that were kept out of the training process. We used patches around all voxels in the peritumoral area of test cases and applied our CNN to them. The subject class label was calculated by majority voting among patches.
2.4.2. Comparison of efficacy of FW-VF classifier with those created from the other dMRI-derived maps
In order to demonstrate the superiority of FW-VF in discriminating metastases and glioblastomas, we retrained the CNN using patches derived from free-water-corrected fractional anisotropy (FW-FA), axial diffusivity (FW-AX), and radial diffusivity (FW-RAD), as well as the traditional mean diffusivity (MD) and fractional anisotropy (FA) maps. The results were compared based on accuracy, sensitivity, and specificity. We also created combination classifiers (FW-VF map and FW-FA) and compared the performance with the single feature classifiers. A combination classifier was also created for FA and MD maps.
2.4.3. Comparison of CNN with radiomic and texture-based classifiers
Finally, we compared our CNN classifier with those trained on traditional texture features and classifiers. We applied Gabor feature extractors and radiomic features [31] in combination with random forest classifiers. Gabor features were constructed from the response of applying Gabor filters made on several frequencies (scales) and orientations [32]. We applied Gabor filters with 4 directions and 4 scales. Radiomic features included size and shape-based features, descriptors of image intensity histogram, descriptors of the relationships between image voxels, textures extracted from filtered images, and fractal features [33]. We used the PyRadiomics [34] package to extract radiomic features followed by principal component analysis which reduced feature dimensions to cover 98% of variation in the data.