Benign and malignant SGTs were identified using a digital database and the corresponding Hematoxylin & Eosin (H&E) stained slides were retrieved from the department archive (Unit of Oral and Maxillofacial Pathology, School of Clinical Dentistry, University of Sheffield, UK- Health and Care Research Wales ethics reference: 20/WS/0017). This department receives a large number of regional, national and international consult cases for expert opinion, and both internally and externally stained slides were included in the cohort to ensure robustness of training. The retrieved cases included two common benign SGT subtypes, i.e., pleomorphic adenoma (PA) and basal cell adenoma (BCA), and four malignant SGT subtypes, i.e., mucoepidermoid carcinoma (MEC), adenoid cystic carcinoma (AdCC), acinic cell carcinoma (ACC) and carcinoma ex pleomorphic adenoma (Ca-ex-PA). The diagnosis of these cases was confirmed, followed by anonymization of the slides. Whole slide images (WSIs) were generated using an Aperio CS2 scanner (Leica Biosystems, Nussloch, Germany) at 40x magnification. Calibration was done prior to each scanning session and images were stored on a dedicated server. Anonymised WSIs of cases to be analyzed were downloaded from the server for analysis.
An open-source bioimage analysis software (QuPath) was employed (QuPath is freely available to download for Windows, Mac, and Linux operating systems and supports most of the known WSI and image formats through Bio-Formats and OpenSlide libraries). In addition to cell detection and IHC quantification, it also allows users (including pathologists) to perform training on their own datasets and build ML algorithms for automated identification (classification) into different cell/tissue subtypes (14) as well as the location and delineation of these features (segmentation). A wide range of features were analyzed and compared between tumors, including cellularity, nuclear and cytoplasmic features such as circularity, eccentricity, hematoxylin and eosin (H&E) optical density and nucleus/cell area ratio.
Case Selection and Building of Machine Learning Classifiers
For comparison between benign and malignant SGT, WSIs of H&E stained sections were used, including 120 from benign tumors (PA and BCA) and 120 from malignant tumors (including MEC, AdCC, ACC, and Ca-ex-PA). We used 67% (n=160) of the cases for training and the remaining for testing (n=80). The training was performed on an equal number of benign and malignant cases (i.e., n=80 WSI for each category, 160 in total). To train the benign vs malignant (BvM) detection classifier, ROIs per WSI were selected using fixed-size areas of 142,884 μm2 (1,500 × 1,500 pixels) to ensure standardization across cases. Next, cell detection analysis was performed, following which the detected cells/nuclei were assigned to a specific class/ground truth (i.e., benign or malignant based on the histological review/diagnosis or the appropriate subtype). At least five different ROIs were used for training from each WSI, ensuring that morphologically different tumor-rich areas were included, e.g., cribriform pattern, clear cell areas, as well as solid or tubular patterns (where applicable). Using the ROIs in the training cases, a ML classifier (Random Forest/RF) was built and validated through visualization of nuclear segmentation in the unseen cases. 80 unseen WSIs with 400 ROIs of fixed-size areas were used to blindly test and validate the BvM classifier for automated classification, followed by quantification and statistical analysis of color and morphometric features leading to the classifier’s 'decision' (Figure 1) (Supplementary Table 1).
The second part of the study aimed to build an additional classifier for malignant tumor subtyping (MST) for automated identification of the more common malignant SGT. For this part of the study, 120 WSIs were used for training and testing. Although these cases were the same as those used in the previous part of the study, a new classifier was built to ensure that the test cases remained 'unseen'. Two-thirds (n=80) WSIs were used for training and the remainder for testing (n=40). A RF classifier was trained using 80 WSIs with 400 ROIs, including four different tumors (MEC, AdCC, ACC, and Ca-ex-PA) (n=20 WSIs of each tumor). The ROI dimensions were maintained at 1,500 x 1,500 pixels, and cell/nuclear detection was performed as described previously. All detected cells in each tumor type ROI were assigned to a specific tumor type class (i.e., MEC, AdCC, ACC, Ca-ex-PA). 40 unseen WSIs (10 of each SGT) with 200 ROIs were used to test the MST classifier and to perform analysis and quantification of features (Figure 1).
Comparison with Customized Machine Learning Models
We additionally aimed to understand whether using optimized ML methods (outside of the QuPath environment) would provide superior performance across the two tasks (i.e., malignant vs benign, and tumor subtyping). Since all of the previous work was performed in QuPath (e.g., nuclei segmentation, feature extraction, classification), the performance of the QuPath based RF classifier was compared with the same classifier trained/tested outside QuPath using the Scikit-learn 1.0.1 toolbox with Python (15). The above two tasks were repeated by extracting the QuPath-generated nuclear segmentation and features followed by optimization of the RF models on the training cohort before testing them on the unseen test set.
Comparison with Deep Learning Networks
Next, the utility of DL methods for classification was analyzed. For direct prediction with DL, the ROIs were tessellated into smaller patches (256 x 256, at 40X) before training/testing multiple state-of-the-art convolutional neural networks (CNNs) for automated prediction. Here, we used ResNet-18, ResNet-50, Efficient-NetB0 and Efficient-NetB3 models, built with PyTorch 1.10 (16, 17). On inference, the maximum argument was taken across all patches per subject to achieve predictions.
In this work, we have therefore performed three different sets of experiments:
- Feature generation using QuPath and employing the built in RF for classification
- Feature generation using QuPath followed by using an optimized RF (outside the platform) for classification
- DL for classification (based on raw image patches)
Spatial analysis
Dynamic interactions between tissue features result in topographical characteristics that can explain relationships between different structures. The spatial analysis was performed using a set of features related to the orientation of objects at a certain location. This included proximity by measuring centroid distance between cells as well as cluster Delaunay analysis, including neighbouring cells, intercellular distance, and mean triangle area. Delaunay triangulation is a geometric calculation indicating a set of points that can be found in optimal time and position (18).
Statistical Analysis
T-Test and multiple comparison one-way ANOVA were used to measure statistical significance between different geometrical, spatial, and staining features. Microsoft Excel 2016 (Microsoft Office Software, USA) was used to organize exported data and perform statistical analyses. The performance of detection classifiers was measured using precision, recall, F1 score, and AUROC, generated at the case level.
Precision = True positive / true positive + false positive
Recall = True positive / true positive + false negative
F1 score = 2x (precision x recall/ precision + recall)
AUROC= (x-axis) 1 – specificity (= false positive fraction = FP/(FP+TN)), (y-axis) sensitivity (= true positive fraction = TP/(TP+FN))