Artificial intelligence-based morphologic classification and molecular characterization of neuroblastic tumors from digital histopathology

A deep learning model using attention-based multiple instance learning (aMIL) and self-supervised learning (SSL) was developed to perform pathologic classification of neuroblastic tumors and assess MYCN-amplification status using H&E-stained whole slide digital images. The model demonstrated strong performance in identifying diagnostic category, grade, mitosis-karyorrhexis index (MKI), and MYCN-amplification on an external test dataset. This AI-based approach establishes a valuable tool for automating diagnosis and precise classification of neuroblastoma tumors.


Introduction
Neuroblastoma is a neuroblastic tumor (NT) and the most common extracranial pediatric solid tumor, affecting nearly 800 children in the United States annually. 1To select optimal treatment strategies, patients are risk-strati ed according to prognostic clinical, pathologic, and molecular variables including age, stage, histopathology, and MYCN-ampli cation. 2,3Approximately 40% of patients with neuroblastoma are classi ed as high-risk, which carries a 60% overall three-year likelihood of event free survival.4 MYCN-ampli cation is present in 20% of NTs and, when identi ed, places the patient in the high-risk category. 5e pathologic classi cation of NTs is a major contributor to risk strati cation.The International Neuroblastoma Pathology Committee (INPC) uses combinations of four features-age, diagnostic category (neuroblastoma, ganglioneuroblastoma intermixed, ganglioneuroma, or ganglioneuroblastoma nodular), grade of differentiation, and mitosis-karyorrhexis index (MKI) -to classify tumors as favorable or unfavorable histology. 6INPC classi cation has signi cant prognostic ability unto itself, as those with unfavorable histology have a four times higher likelihood of relapse compared to those with favorable histology. 28][9] Machine learning algorithms have been used to analyze NT digitized histology as early as 2009, with models that segmented cells and extracted texture features from histology images to predict tumor grade. 10More recently, convolutional neural networks (CNNs) have been deployed on NT histology risk strati cation. 11ing our open-source deep learning analysis pipeline, Slide ow (2.3.1),3][14] In contrast to conventional CNNs, aMIL models rely on pre-trained features to begin model training (Fig. 1).These features are obtained by passing images through a feature extractor network that has been pre-trained on either domain-speci c or non-speci c images.CTransPath is a domain-speci c model that has been trained on unlabeled H&E-stained slides from The Cancer Genome Atlas (TCGA). 12For limited datasets such as those obtainable in rare diseases, using domain-speci c features to train an aMIL can offer signi cant performance advantages over nonspeci c models such as ImageNet. 15,16 this study, we leveraged the largest reported study cohort of digitized NTs analyzed with these stateof-the-art deep learning methods.We generated a training dataset of whole slide images (WSIs) from patients from the University of Chicago and the Children's Oncology Group.These WSIs were used to develop models for predicting diagnostic category, grade, MKI, and MYCN-ampli cation status.Model performance was validated on an external test dataset of WSIs from patients seen at Lurie Children's Hospital.We aimed to demonstrate the feasibility of using aMILs to aid in NT classi cation and risk strati cation.
The median age of patients with digitalized NT in the training dataset (n = 172) was 2.63 years (SD = 4.37).Among patients with additional known clinical information, 84 of 138 (60.2%) had metastatic disease and 94 of 133 (70.7%) were high-risk.For diagnostic category, the dataset includes 24 ganglioneuroblastomas and 148 neuroblastomas which were con rmed by pathologists (KD, HS, PP).Of the 148 tumors with a diagnostic category of neuroblastoma, 93.2% were poorly differentiated and 25% had high MKI.Of the 135 tumors with known MYCN status, 40 were ampli ed (29.6%).The median age of the external test dataset (n = 25) was 3.33 years (SD = 2.90).All patients in the test dataset were highrisk and 23 of 25 (92%) had metastatic disease.Of the 23 tumors classi ed as neuroblastoma, all were poorly differentiated.Eleven of these 23 tumors (48%) had a high MKI.Eight of the 25 tumors (32%) were MYCN-ampli ed.
The nal models demonstrated highly accurate performance across all outcomes in the training cohort (Fig. 2).Area Under the Receiver Operator Curve (AUROC) for diagnostic category, grade, MKI, and MYCN were 0.96, 0.85, 0.71, and 0.77, respectively, and (Area Under the Precision Recall Curve) AUPRC was 0.99, 0.99, 0.88, and 0.89, respectively.The model had the most success identifying diagnostic categories, with a sensitivity of 0.93 and speci city of 0.92.For MYCN status, a sensitivity of 0.75 and speci city of 0.73 was demonstrated in the analysis.
Expert pathologist (PP) review of the model's attention heatmaps, generated using GRAD-CAM, revealed that the models were primarily focusing on neoplastic areas of the tumor, rather than relying on nontumor tissues such as brosis, brovascular stroma, or adrenal tissue.While in most cases the model accurately identi ed and focused on the relevant tumor regions, in some instances correlation was unevenly distributed across the relevant tumor area.This suggests that this variation in attention may correlate with less well characterized diffuse histopathological signatures that have unclear associations with standard pathologic descriptions.Further investigation into these attention patterns is necessary to elucidate novel morphological features or subtypes within neuroblastoma tumors. 17Overall, the pathologist's analysis con rmed that the model was generally making predictions based on the most relevant areas within the neoplastic regions of each sample.
We show the feasibility of using small datasets of H&E-stained WSIs to develop models for morphologic classi cation of NTs and accurate assessment of MYCN-ampli cation status at diagnosis using an aMIL deep learning model.While prior deep learning models for NTs relied heavily on morphological feature extraction and labeled data, our method used unlabeled data in conjunction with SSL methods to improve model performance when working with a small dataset. 10,11The model achieved notable performance in identifying diagnostic category and a strong ability to identify MYCN-ampli cation.The highly accurate automatic classi cation produced by the model could be re ned with additional data to eventually streamline pathologist work ows.The model's ability to identify MYCN-ampli cation status from histology is an encouraging result, particularly given the limited data used to train the model.This suggests models could also be built to predict other relevant genomic features such as copy number variations and ploidy.As 50% of high-risk NTs do not harbor MYCN-ampli cation and typically have other ndings such as 11q aberrations, a deep learning approach may also provide the ability to readily identify features that drive aggressive growth in non-MYCN-ampli ed high-risk tumors. 18Unlike immunohistochemistry or uorescence in situ hybridization where a single gene aberration is probed, deep learning models analyze the image at a global level and may be able to more readily identify morphological signatures produced by combinations of gene alterations that could further aid in stratifying NTs.
Limitations of this study arise largely from data availability.As NTs are rare, it remains di cult to collect su cient samples to train a robust deep learning model.Our approach makes use of a network architecture that seeks to overcome this limitation.However, the model could further be improved with more data.Additionally, this study seeks to aid molecular pathology diagnostics and does not constitute a pathologist replacement.The model's predictions act as a second pair of eyes and could alert a pathologist to review speci c, notable aspects of the histology.
This work provides an important step forward in automating diagnosis and precise classi cation of NTs with the addition of deep learning-based image analysis.Ultimately, this can increase global access to molecular and pathological classi cation for tumors in regions without access to experts.We also demonstrate the ability of aMIL models to perform well on small datasets; this model architecture could be extended to other rare cancers that suffer from low data availability.This arti cial intelligence-based approach establishes another data modality in the pathologist's toolbox for NT classi cation.

Dataset description
H&E-stained slides from the time of initial diagnosis were obtained from the University of Chicago (n = 102), the Children's Oncology Group (n = 70), and Lurie Children's Hospital (n = 25).The images were reviewed by trained pathologists (HS, PP, KD) who annotated the tumor regions and de ned the diagnostic category (ganglioneuroblastoma/neuroblastoma), grade (differentiating/poorly differentiating), and MKI (low/intermediate and high).MYCN status was abstracted from patient records (ampli ed/non-ampli ed).This study was approved by the University of Chicago (IRB20-0659) and Lurie Children's Hospital Internal Review Boards (IRB 2021-4498).

Image processing
WSIs captured using an Aperio AT2 DX WSI Scanner.To remove normal background tissue and maximize cancer-speci c training, image tiles were extracted from within pathologist-annotated regions of tumor.Image tiles were extracted from WSIs with a width of 302µ and 299 x 299 pixels using Slide ow version 2.3.1 and the libvips backend.Grayspace ltering, Otsu's thresholding, and gaussian blur ltering (sigma = 3, threshold = 0.02) were used to remove background.

Classi er training
Extracted tiles were converted feature vectors using CTransPath with 'reinhard mask' normalization applied. 12aMIL models were trained on extracted features in Slide ow with the FastAI API and Pytorch.
The aMIL model parameters were: weight decay of 1e − 5 , bag size of 256, batch size of 32, and training for 10 epochs.aMIL models were evaluated with 5-fold cross validation and by calculating the average AUROC, AUPRC, sensitivity, speci city, and F-1 score.Patients were excluded from a given model if the measure of interest was unknown.

Model Validation
The aMIL model developed during training was used on the unseen external test dataset.Samples were evaluated in one run without any hyperparameter tuning on test data to ensure no validation leakage.
Model performance was assessed as above.

Pathologist Explainability Assessment
Explainability heatmaps were generated using 19 PP reviewed the heatmaps to identify whether tumor regions that the model found important for outcome prediction had clinical correlation to the given outcome.

Figure 1 attention
Figure 1

Figure 2 Model
Figure 2