This experimental, analytic, comparative study aimed at assessing the reliability of a noninvasive tool for classifying BMs from an NSCLC source according to their EGFR status. Data were collected retrospectively from the medical files of the study population. Pathological data based on histology were extracted from pathological reports following tissue biopsy, and EGFR status, as part of the clinical process, was determined by reverse transcription polymerase chain reaction.
We retrospectively collected the records of all NSCLC patients with BMs who underwent resection of their BMs in 2 institutions, Tel-Aviv Medical Center, Tel-Aviv, Israel between 2006-2019 (46 patients), and Fondazione IRCCS Istituto Neurologico C. Besta, Milan, Italy (13 patients). The patients were divided into 2 groups based upon their EGFR status of being positive or negative for EGFR mutation.
Included were all diagnosed NSCLC patients with BMs who underwent resection of their BMs and for whom a histology-based pathological report, the molecular-based EGFR status, and a preoperative magnetic resonance imaging (MRI) study of sufficient quality were available. Our analysis was restricted to specific metastases which had been resected and for which their EGFR status had been determined. Preoperative MRI scans with major artifacts or of low quality, and scans of patients who had undergone radiation treatment to their BMs prior to the MRI were excluded. The study was approved by the local institutional review boards (IRB) in both centers, Tel-Aviv Medical Center, and Fondazio ne IRCCS Istituto Neurologico C. Besta, (IRB approval numbers 0200-10, and 81/2021, respectively).
Analysis was performed on the post contrast T1 weighted MRI images (T1W+c), and included bias field correction with an intensity inhomogeneity correction algorithm (SPM, part of MATLAB R2019b) , and intensity normalization by the equation: where xi is the value of given voxel in the image, are the mean and standard deviation of the brain extracted image. Tumor segmentation was performed by senior neurosurgeon and using commercial software (AnalyzeDirect 11.0) at the slice (2D) level. The extracted mask was then used to generate crop images (i.e., delimitation of the lesion mask and its surrounding).
The entire dataset was split at the subject level into 80% training and 20% validation datasets in a stratified 5-fold cross-validation manner proportional to group size, and ensuring that all images belonging to a given patient would be allocated to the same group.
DL model training and evaluation were performed by means of the Fast.ai framework built on top of the PyTorch environment .
The input data for the DL analysis were cropped images of the mid-tumor region and ±2 slices (total of 5 slices), all extracted from the normalized T1W+c image and resized to a 96X96 image size (Fig. 1). Data augmentation was performed in order to increase the dataset size and variance, and it included random rotations, zooming, and contrast modification. In addition, mixup augmentation  was applied for combining training samples by means of their linear combinations.
A ResNet-50  convolutional neural network was setas the network’s architecture. Network training was carried out by means of F1 loss function with an initial learning rate of 4e-2 and a batch size of 32. The metric for evaluating the model during training was the F1 score. Data oversampling was employed in order to cope with the imbalanced datasets, enabling sampling of the 2 groups in roughly equal amounts.
Due to the relatively small data size that was available for this study a transfer learning was performed, the network was trained using a pre-trained ResNet-50 model, trained on an ImageNet data set as previously described in detail elsewhere [31, 32]. Training was performed with a total of 40 epochs while preserving the model which achieved the best level of accuracy during the process.
Post-processing of the predicted results was performed at the subject level by calculation of a predication score for the adjacent slices and tested based upon median, maximum, minimum, and mean metrics (Fig. 1).
The classification results were evaluated on the validation datasets, for each one of the 5-folds, using accuracy, precision, recall, F1 score and receiver- operating characteristic curve (ROC).