Clinical Signi cance and Molecular Annotation of Cellular Morphometric Subtypes in Lower Grade Gliomas discovered by Machine Learning: a retrospective multicentric study


 Lower grade gliomas (LGGs) are heterogenous diseases by clinical, histological and molecular criteria. Here, we developed a machine learning pipeline to extract cellular morphometric biomarkers from whole slide images of tissue histology; and identified and externally validated robust cellular morphometric subtypes of LGGs in multi-center cohorts. The subtypes have significantly independent predictive power for overall survival across all three independent cohorts. In the TCGA-LGG cohort, we found that patients within the poor-prognosis subtype responded poorly to primary therapy and follow-up treatment. Furthermore, LGGs within the poor-prognosis subtype were characterized by higher mutational burden, higher frequencies of copy number alterations, and higher level of tumor-infiltrating lymphocytes and immune checkpoint genes. Higher level of PD-1/PD-L1/CTLA-4 was confirmed by immunohistochemical staining. In addition, the subtypes learned from LGG demonstrates translational impact on Glioblastoma (GBM). Overall, we developed and validated a framework for the cellular morphometric subtype discovery in LGGs associated with specific molecular alterations, immune micro-environment, prognosis and treatment response.


Introduction
Gliomas are the most common primary central nervous system (CNS) malignant tumor. According to statistics from the Central Brain Tumor Registry of the United States (CBTRUS), gliomas account for about 80% of all central nervous system malignancies 1,2 . Historically, according to the 2007 WHO classi cation criteria for CNS tumors, gliomas were categorized into grade 1, 2, 3 or 4 3 . With recent integration of molecular alterations in the classi cation of CNS tumor, the newest WHO classi cation (2016) introduced a paradigm shift in the classi cation of CNS tumors combining both histopathologic and genotypic features 4 to reveal an "integrated" diagnosis that better correlates with patient outcome.
LGGs grow relatively slowly, and due to their characteristic diffuse and in ltrative growth, they are prone to recurrence after surgical resection, occasionally accompanied by an increase in malignancy grade, and have a high clinical disability and mortality rate 5 . In contrast, glioblastoma multiforme (GBM, grade 4) displays a much more unfavorable course.
It is reported that age > 40 years, astrocytic subtype, tumor maximum diameter > 6 cm, tumors crossing the midline, and the patient's degree of neurological impairment are risk factors affecting overall survival (OS) 6,7 . For patients with low risk LGG, the median OS can reach 10.8 years, while for patients with high risk LGG (2 or more aforementioned risk factors), the median OS is only 3.9 years. Moreover, LGGs are highly heterogeneous both at the histopathological and molecular level 4,8 , re ected in signi cant variability in clinical outcome 8,9 . Therefore, to personalize care and treatment of LGG patients, accurate and robust patient strati cation, which is signi cantly associated with clinical outcomes, is mandatory.
It is well known that cellular morphometric properties play a key role in cancer diagnosis and cancer prognosis together with important molecular factors. Although deep neural networks (e.g., CNN-like models) have been successfully applied in many glioma-related studies, including diagnosis [10][11][12] , grade prediction 13,14 , and mutation prediction 15,16 , based on computed tomography (CT), magnetic resonance imaging (MRI), and whole slide images (WSI), the quantitative pro ling and molecular association of cellular morphometric landscape from whole slide images is inadequately investigated due to both technical and conceptual limitations.
To capture the heterogeneous cytoarchitecture of gliomas, we developed an high-throughput computational pipeline that enables quanti cation of tissue histology at the cellular level, insensitive to variations in xation and staining protocols at different laboratories 17 . The extracted cellular morphometric features can be used for tumor classi cation 18 , and molecular association 19 . In addition, our previous study introduced stacked predictive sparse decomposition (SPSD) 20 that could be used to mine underlying cellular morphometric properties within whole slide images (WSI). In the present study, we applied SPSD to LGG cohorts to discover clinically relevant cellular morphometric subtypes (CMSs), and evaluated clinical impacts and molecular correlation of CMSs.

Study Design and Characteristics of Patient Cohorts
We used three retrospective cohorts of LGG patients to evaluate and independently validate the prognostic impact of cellular morphometric subtypes, which were identi ed from WSIs powered by arti cial intelligence. And we also used two retrospective cohorts of GBM patients to evaluate the generalization capability and translational impact of cellular morphometric subtypes learned from LGG onto GBM (Fig. 1). Speci cally, the TCGA-LGG cohort served as the discovery set and included a total of 488 LGG patients with both diagnostic histopathological slides and clinical information. There were 271 (55.5%) male and 217 (44.5%) female patients, with a median age of 41 years (range: 14-87 years) (Supplementary Table 1 Table 5).

Identi cation of Cellular Morphometric Biomarkers using Unsupervised Representation Learning
In this study, our cellular delineation pipeline 17 recognized and delineated over 400 million cellular objects from 841 diagnostic slides of 488 TCGA-LGG patients; over 25 million cellular objects from 75 diagnostic slides of 70 ZN-LGG patients; over 10 million cellular objects from 37 diagnostic slides of 37 SU-LGG patients; over 400 million cellular objects from 848 diagnostic slides of 380 TCGA-GBM patients; and over 25 million cellular objects from 85 diagnostic slides of 79 ZN-GBM patients, where each cellular object was represented with 15 morphometric properties (Extended Data Fig. 1a and Supplementary   Table 6).
Next, we optimized and trained our SPSD 20 model based on pre-quanti ed cellular objects randomly selected from TCGA-LGG cohort to discovery the underlying cellular morphometric biomarkers (Extended Data Fig. 2). After training, the pre-built SPSD model reconstructed each cellular object as a sparse combination of the pre-identi ed 256 cellular morphometric biomarkers, which led to the novel representation of each single cellular object as the 256 sparse code (reconstruction coe cient); and thereafter, the corresponding 256-dimentinal cellular morphometric context representation of each patient as an aggregation (Extended Data Fig. 1b) of all delineated cellular objects belonging to the same patient (Supplementary Tables 7-11). The nal patient-level cellular morphometric context representation was optimized by using the top 30 (sparsity constraint of SPSD model) out of 256 cellular morphometric biomarkers with the largest variations, which contributed to 98.84% of the total variations of the data.

Clinical and Biological Evaluation of Cellular Morphometric Biomarkers
We next evaluated association of the 30 cellular morphometric biomarkers with respect to histological meanings, prognosis as well as cancer biology. Our survival analysis revealed that 20 out of 30 cellular morphometric biomarkers had signi cant prognostic impact (FDR < 0.05), where 5 of them were prognostically favorable (hazard ratio (HR) < 1) and 15 prognostically unfavorable (HR > 1) (Fig. 2a

Identi cation and Validation of Cellular Morphometric Subtypes
Consensus cluster analysis using 30 cellular morphometric biomarkers identi ed three subtypes from TCGA-LGG cohort with a signi cantly different prognosis (Log-rank P < 0.0001; Extended Data Fig. 7).
Given the small number of patients (n=4) in subtype 3, as well as its prognostic similarity to subtype 2 patients, we merged subtypes 3 and 2, and thereafter refer this combination as subtype 2 in the rest of this study (Fig. 3a). Accordingly, the TCGA-LGG cohort contained 389 subtype 1 and 99 subtype 2 patients. The patient-level cellular morphometric context representation in the TCGA-LGG cohort formed signi cantly distinct clusters (P = 0.001, Fig. 3b). Importantly, two CMSs, predicted with pre-built subtype model, were portioned in two validation sets. Speci cally, the ZN-LGG cohort strati ed into subtype 1 (38 patients) and subtype 2 (32 patients); while the SU-LGG cohort strati ed into subtype 1 (16 patients) and subtype 2 (21 patients). Moreover, the patient-level cellular morphometric context representation in both validation cohorts also formed signi cantly distinct clusters (P=0.001, Fig. 3e, h).

Clinical signi cance of Cellular Morphometric Subtypes
We examined the association between CMSs and clinical and tumor characteristics in the TCGA-LGG cohort. Surprisingly, there was no signi cant association between CMSs and any clinical or molecular prognostic factor (including, age, grade, histological type, IDH mutation, 1p/19q codeletion, MGMT promoter status, TERT promoter status and ATRX status) (Supplementary Table 1). This nding was con rmed in both validation cohorts (Supplementary Tables 2 and 3).
In the TCGA-LGG cohort where genetic alteration burden information was available, Maftool analysis showed signi cantly higher TMB (P=0.003) and SCNA score (P=0.012) in subtype 2 patients (Extended Data Fig. 8), indicating higher level of genomic instability of tumors from subtype 2.
Kaplan-Meier analysis showed signi cantly shorter OS of subtype 2 patients than subtype 1 patients (P=0.001, Fig. 3c). Furthermore, univariate and multivariate CoxPH models indicated the independent prognostic impact of CMSs in the TCGA-LGG cohort after adjusting for other signi cant clinical and molecular factors including age, histological type, grade, IDH mutation status, and ATRX mutation status LGG patients, which was further con rmed by calibration analysis on the training (Fig. 4b,c) and testing set (Fig. 4d,e), respectively. In clinical practice, physicians can draw a vertical line from each variable to the "Point" line ( Fig. 4a) to calculate the score of the corresponding variable, and thereafter the total score of a patient as the sum of all individual score above, with which, the clinician can predict the 3-year and 5-year OS rates of a new LGG patient. Meanwhile, a dynamic nomogram further demonstrates its potential clinical implications at: https://liuxiaoping.shinyapps.io/LGG_nomogram/. In addition, Chi-square test showed signi cantly poor response of subtype 2 patients with respect to both primary therapy (P<0.001) and follow-up treatment (P=0.002) (Supplementary Table 1).
Importantly, the double-blind deployment of the pre-built patient subtype model on both validation cohorts followed by independent survival analysis con rmed the signi cantly worse OS of subtype 2 patients (P=0.027 in ZN-LGG, P=0.005 in SU-LGG, Fig. 3f, i). Furthermore, univariate and multivariate CoxPH models con rmed the independent prognostic impact of CMSs after adjustment for other Moreover, the T cell in ltration score (P=0.00097) and overall immune in ltration score (P=0.029) were signi cantly higher in subtypes 2 (Fig. 7). To reveal the possibility of immune escape in subtype 2 patients, we examined the expression levels of two immune inhibitory receptor CTLA4 and PD-1 and the ligand of PD-1 (i.e., PD-L1). In the TCGA-LGG cohort (Fig. 8a), the expression of PD-1 (P=0.00044), PD-L1 (P=0.03), and CTLA4 (P=0.17) were signi cantly or tended to be higher in patients with subtype 2. Finally, we validated the expression levels of these immune inhibitory molecular markers in the ZN-LGG cohort using IHC, and the results con rmed the signi cant upregulation of PD-1 (P=8e-05), PD-L1 (P=0.018) and CTLA4 (P=0.00089) in subtype 2 (Fig. 8b,c). Overall, these results indicated possible mechanisms for immune escape or immune tolerance in subtype 2 tumors, which could explain the poor prognosis of subtype 2 patients.

Discussion
In this study, we extracted the cellular morphometric biomarkers from WSIs of LGG patients through unsupervised representation learning strategy, and subsequently de ned two CMSs based on these biomarkers. The reproducibility and robustness of CMSs was demonstrated in two independent validation cohorts of LGG patients, and the transferability of CMSs was demonstrated in two independent validation cohorts of GBM patients. Moreover, the importance of CMSs lies in its independent prognostic signi cance after adjusting for other clinical and molecular factors, the association with treatment response, and the relation to underlying molecular and phenotypic alterations. Furthermore, we constructed a nomogram for predicting the survival rate of LGG patients at 3 and 5 years based on important clinical factors, molecular factors, and CMS.
Different from many end-to-end CNN-like systems, which mainly focuses on the prediction of clinical/molecular knowledge, the emphasis of present study focuses on novel knowledge discovery with interpretability, robustness and independent clinical value through multicentric validation. As a further justi cation, we evaluated a superior CNN-like system (i.e., SCNN: survival convolutional neural networks), which was speci cally designed and optimized for the prediction of cancer outcomes from brain tumor histology and genomics 22 . Interestingly, SCNN risk score does not provide independent and signi cant prognostic value on both TCGA-LGG (P=0.182, Extended Data Fig. 9b) and TCGA-GBM (P=0.533, Extended Data Fig. 9c) cohorts, in the presence of our CMS subtype and other important clinical/molecular factors, which suggest that our patient strat cation stratergy out-performs the supervised CNN-like system (i..e, SCNN) for precision prognosis.
SCNA score, closely related to the occurrence and progression of many tumors (including glioma), is related to poor prognosis 23,24 . Meanwhile, TMB levels, closely related to degree of malignancy and poor prognosis of glioma, is often used as a biomarker predicting the e cacy of anti-PD1 therapy [25][26][27] . And our study con rmed the signi cantly higher SCNA score and TMB level in subtype 2 patients, which explains the poor prognosis, and provides justi cation for further investigation of anti-PD-1 immunotherapy for subtype 2 patients, especially when subtype 2 patients were associated with poor response to primary therapy and follow-up treatment.
The tumor immune microenvironment is closely related to the progression of tumor cells. In the glioma microenvironment, NK cells, macrophages, neutrophils, CD4 + T cells, CD8 + T cells, regulatory T cells, etc.
in uence disease outcome 28

. Our study showed, T cells (including CD4 + T cells, CD8 + T cells, gamma delta T Cell, regulatory T cells), B cells, plasma cells, macrophages, NK cells, neutrophils, mast cells, etc.
were higher in subtype type 2 patients, suggesting a higher immune in ltration in subtype 2 patients.
Given the signi cantly higher expression levels of PD-1, PD-L1, and CTLA4 as well as the relatively high TMB in subtype 2 patients, subtype 2 patients may bene t from anti-PD-1 immunotherapy.

CTLA4 inhibits T cell activation by inducing antigen-presenting cells to express CD80 and CD86 29 .
Regulatory T cells can inhibit T cell function by secreting IL-10 and TGF-β 30 . Studies have reported that neutrophils in ltration in tumor tissues can promote tumor progression and metastasis, and in glioma, neutrophils can promote tumor proliferation by inducing angiogenesis [31][32][33] . NK cells are important component of the human immune system. However, Poli et al. showed that NK cells are in a state of inactivation in glioma 34 . The above analysis of immune cell components showed that although subtype 2 is enriched for immune cells, due to some immunosuppressive cell (regulatory T cell) in ltration, T cell function inactivation and other factors, subtype 2 may be in immune tolerance. This study has some shortcomings. First, relatively few LGG patients were included in the independent ZN-LGG and SU-LGG validation cohorts, so the conclusions of this study need to be further veri ed in large scale clinical studies. Second, our ndings raise the possibility that subtype 2 LGG patients could bene t from anti-PD-1 immunotherapy; however, since LGG patients have not been recommended for anti-PD-1 immunotherapy based on existing clinical practice, we could not nd any retrospective dataset to test this and will investigate it in our future prospective study.
In conclusion, we developed a pathology-image-based LGG subtyping, that seems to stratify LGG patients into two groups with different OS associated with treatment responses, copy number alterations and TMB levels and immune tolerance.

Data collection
The patient data in this retrospective study, including diagnostic slides of tissue histology and the corresponding clinical information, were collected from TCGA-LGG cohort (Supplementary Table 1 Table 5) to form the discovery cohort and the multi-center independent validation cohorts, respectively. The inclusion criteria were primary LGG and GBM with diagnostic slides and overall survival information available. This study was approved by the institutional review board (IRB) of both Zhongnan Hospital of Wuhan University and Stanford University.
De nition of treatment response in TCGA-LGG cohort According to published literature 35 , the treatment response in TCGA project was assessed using Response Evaluation Criteria in Solid Tumors (RECIST) measure that evaluates the size of measurable lesions of solid tumors 36 . Based on RECIST, the patient's response is divided into four registrations: complete remission, partial remission, progressive disease and stable disease. Since we focus on whether the patients responded to treatments, the treatment response in TCGA-LGG cohort is categorized as: Response (including complete remission and partial remission), and non-Response (including progressive disease and stable disease) groups.

Identi cation of Cellular Morphometric Biomarkers
We developed an unsupervised feature learning pipeline based on SPSD 20 for the unsupervised discovery of underlying cellular morphometric characteristics from the 15 cellular morphometric features extracted from the diagnostic slides of TCGA-LGG cohort. And thereafter identi ed 256 cellular morphometric biomarkers for cellular object representation. Speci cally, in this study, we used single network layer with 256 dictionary elements (i.e., cellular morphometric biomarkers) and sparsity constraint 30 at a xed random sampling rate of 1000 cellular objects per diagnostic slide from TCGA-LGG cohort (Extended Data Fig. 2a), where the network parameters (i.e., dictionary size and sparsity) were experimentally optimized to maintain the data reconstruction error ratio under certain threshold (i.e., 10% in this study, Extended Data Fig. 2b,c). The pre-trained SPSD model reconstructs each cellular region (represented as a vector of 15 morphometric properties) as a sparse combination of pre-identi ed 256 cellular morphometric biomarkers, and thereafter represents each cellular object as the sparse code (i.e., sparse coe cients) during reconstruction, where the sparsity constraint leads to the reconstruction mainly contributed by the top 30 cellular morphometric biomarkers.

Clinical and Biological Evaluation of Cellular Morphometric Biomarkers
We evaluated the prognostic impact of top 30 out of 256 CMBs with largest variations mined from TCGA- LGG cohort with Cox proportional hazards regression (CoxPH) model (survival package in R, Version 3.2-3), and examined the effects of high or low levels of each prognostic signi cant CMB on OS using Kaplan-Meier analysis (survminer package in R, Version 0.4.8) and log-rank test (survival package in R, Version 3.2-3), where TCGA-LGG cohort was divided into CMB-high and CMB-low groups per CMB (survminer package in R, Version 0.4.8). Meanwhile, we evaluated biological signi cance between these CMB-high and CMB-low groups by assessing their relationship with biological factors available in the TCGA-LGG cohort using Mann-Whitney test.

Construction of Patient-Level Cellular Morphometric Context Representation
The patient-level cellular morphometric context representation was constructed based on pre-identi ed 256 cellular morphometric biomarkers as an aggregation (i.e., max-pooling) of all the cellular sparse codes extracted via pre-built SPSD model from the cellular objects belonging to the diagnostic slides of the same patients. Speci cally, it consists of steps as follows, (1) delineation of cellular architecture and extraction of cellular morphometric properties from diagnostic slides of each patient; (2) construction of cellular sparse codes for the cellular objects belonging to each patient based on pre-identi ed 256 cellular morphometric biomarkers and pre-built SPSD model; (3) aggregation (i.e., max-pooling) of all cellular sparse codes belonging to the same patient to form the patient-level cellular morphometric representation; and (4) selection of the top 30 cellular morphometric biomarkers with the largest variations identi ed in TCGA-LGG cohort as the nal patient-level cellular morphometric representation.

Identi cation and Application of Patient Subtype
The patient subtype was identi ed based on patient-level cellular morphometric context representation through consensus clustering strategy 37 (ConsensusClusterPlus package in R, Version 1.50.0) with hierarchical clustering, Pearson's correlation and 500 bootstrapping iterations; and the optimal number of subtypes was determined by the consistency of cluster assignment (consensus matrix) and the prognostic impact of subtypes. For a new patient, the subtype was assigned as follows: (1) construct patient-level cellular morphometric context representation with pre-built cellular morphometric biomarkers and SPSD model; (2) calculate the Pearson's distances between the new patient's representation and the mean representation of each pre-identi ed patient subtype; and (3) assign the new patient to its closest subtype yielding smallest Pearson's distance.

Clinical Evaluation and Validation of Patient Subtype
We evaluated and independently validated the clinical impact of pre-identi ed patient subtype from TCGA-LGG cohort, ZN-LGG cohort, SU-LGG cohort, TCGA-GBM cohort, and ZN-GBM cohort, respectively, where the latest clinical data of TCGA-LGG and TCGA-GBM cohorts was downloaded from Genomic Data Commons (GDC, https://portal.gdc.cancer.gov/), and the subtype assignment of each patient in independent validation cohorts (i.e., ZN-LGG, SU-LGG, TCGA-GBM, and ZN-GBM) was achieved through the application of pre-built TCGA-LGG patient subtype model as described previously. The evaluation and validation reside in three folds as follows, (1)  LGG patents, where the multivariate CoxPH model was constructed with selected variables (i.e., clinical factors, molecular factors, and patient subtype) based on their signi cant and independent prognostic impact. Speci cally, during nomogram construction and validation, the patients in TCGA-LGG cohort were randomly partitioned into training set (60% patients) and testing set (40% patients) through strati ed sampling strategy. Then, a nomogram was constructed (rms package in R, Version 6.0-1) on the training set to predict the 3-year, and 5-year overall patient survival. The performance of nomogram was evaluated based on concordance-index (C-index) with 1000 bootstraps on TCGA-LGG training set and test set, followed by calibration analysis to calibrate the performance of the nomogram; and (3) Treatment response. The treatment response was categorized as: Response (including complete remission and partial remission); and Non-response (including progressive disease and stable disease). And the differences in treatment response were assessed with Chi-square test for both primary therapy and follow-up treatment.

Differences in Gene expression, Mutation load, and Immune microenvironment between Subtypes
Differentially expressed genes (DEGs) between patient subtypes were estimated (edgeR package in R, in ltration score were estimated via R package "ConsensusTME" (version: 0.0.1.9000) 42 , and total T cell in ltration score was calculated according to the method introduced by Senbabaoglu et al. 43 .

Immunohistochemical (IHC) Staining
IHC staining was carried out on 4-μm sections of formalin-xed and para n-embedded tissues according to the standard protocol on the entire ZN-LGG cohort (70 patients in total). Brie y, sections were dewaxed and rehydrated in serial alcohol washes, and then the endogenous peroxidase activities were blocked.
After the nonspeci c sites were saturated with 5% normal goat serum, the sections were incubated Statistical Analysis Survival differences between subtypes or groups were examined using log-rank test. Differences in the treatment response of primary therapy and follow-up treatment between subtypes were examined using Chi-square test. Differences in respect of the expression of four negative immune regulators CTLA4, PD-1 and PD-L1, the immune cell in ltration, and genomic heterogeneity (tumor mutation burden, somatic copy number alteration) between subtypes were analyzed with Mann-Whitney non-parametric test. P value (FDR corrected if applicable) less than 0.05 was considered to be statistically signi cant. All analysis was performed with R ( Competing interests: The authors declare no con icts of interest Data and materials availability: TCGA-LGG and TCGA-GBM data are available at TCGA-LGG and TCGA-GBM projects, respectively; Patient-level cellular morphometric context data as well as the corresponding clinical data are available as supplementary materials; WSIs from ZN-GBM, ZN-LGG and SU-LGG cohorts are available upon request to the corresponding authors after the IRB approval from the corresponding hospitals; Matlab code for cellular morphometric biomarker detection and application with pre-trained SPSD model will be available at https://bbds.lbl.gov/project/lgg-cms upon the acceptance of manuscript.

Figure 1
Graphical illustration of our study design with knowledge mining and double-blind validation.  LGG patient subtype provides signi cant and independent prognostic impact.     Application of LGG cellular morphometric biomarkers and subtypes to GBM cohorts leads to distinct patient strati cation (a-b) with signi cant differences in OS (c-d) and independent prognostic values on TCGA-GBM (e) and ZN-GBM (f) cohorts, respectively. Note, G-CIMP and MGMT status are non-signi cant prognostic factors via univariate CoxPH analysis in ZN-GBM cohort, and therefore was not included in multivariate CoxPH analysis as shown in (f).

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.