Superpixel Image Segmentation of the Tumoral Microenvironment in Colorectal Cancer: Frontiers Beyond the Microscope

Colorectal cancer (CRC) is the most common malignancy of the gastrointestinal (GI) tract and accounts for 9% of all cancers. The stroma and the tumoral microenvironment represent brave new frontiers for patients with colorectal cancer. Here we demonstrate novel superpixel image segmentation (SIS) techniques for whole slide images (WSI) to unravel this biology. Findings of signicance include the association of low proportionated stromal area (PSA), high immature stromal percentage (ISP) and high myxoid stromal ratio (MSR) with worse prognostic outcomes in our CRC patients. Overall, stromal markers outperformed all others at predicting clinical outcomes. In particular, MSR may be able to prognosticate patients independent of tumor stage and may be the most optimal way to effectively prognosticate CRC patients which circumvents the need for more extensive deep learning (DL) based computational proling. Approaches demonstrated here can be performed by a trained pathologist and very easily recorded during synoptic cancer reporting with appropriate quality assurance. Future well-designed, robust clinical trials will have the ultimate say in determining whether digital image analysis and superpixel image segmentation can better tailor the need for adjuvant therapy in patients with colorectal cancer.


Introduction
Colorectal cancer (CRC) is a diverse disease entity notable for being genetically inherited, although it mostly occurs sporadically [1]. Today, the eld of histopathology is transitioning from traditional, isolated tumoral differentiation, towards more holistic approaches [2]. Ones which encompass the tumoral microenvironment, incorporate molecular genetics and provide computational modeling to treat and prognosticate patients with CRC. The tumoral microenvironment is an ecosystem-like network of interacting cells, molecules, which regulate the extracellular matrix, as well cancer immune cell functioning [3].
Previous work has been undertaken to discover the essence of tumor budding (TB) [4] and stromal differentiation (SD) [2] in the CRC tumoral microenvironment. Histologically, SD has been demonstrated to be clinically signi cant in cancers of the breast [5], cervix [6] and stomach [7]; however, TB and SD has been most intensively study in colorectal cancer. Overall, immature SD and high TB are seen as bad players, being associated with reduced survival and higher pathologic stage [8].
The tumor-stroma-ratio (TSR) has also been examined by many groups in colon cancer [9]. It represents the ratio of the stroma to the tumor and measured by conventional light microscopy (CLM) manually by a pathologist and is restricted to the invasive tumoral front. Questions regarding manual assessment and reproducibility are concerning and traditional TSR is performed only in the invasive front of the tumor. It does not measure the tumor in its entirety. Despite this, the TSR is seen as a promising prognostic tool and tumours which harbor signi cant amounts of stroma have worse prognostic outcomes, suggesting a role for tumour-node-metastasis classi cation [9]. Current trends are pushing for more computation based approaches as we enter the digital era of pathology. Most recently, Guedj et al. [10] evaluated the prognostic potential of digital image analysis (DIA) for quantifying the stromal compartments of intrahepatic cholangiocarcinoma into order to calculate a proportionated stromal area (PSA) and found low values to be associated with worse outcomes.
In breast cancer, DIA has found triple negative biomarker status and high stroma was associated with poor prognosis, while in luminal tumors, high stroma was associated with favorable prognostic outcomes [11]. This suggests that the role of stroma may be tumoral subtype dependent in cancer, justifying the importance of validating DIA in CRC.
A pitfall of previous publications is that they did not differentiated stroma based on its myxoid differentiation. It is possible that it is not only important to look at stroma quantitatively (PSA), but to also look at the stroma qualitatively by assessing the degree of myxoid stroma differentiation.
Building on recent attempts to classify the proportionated stromal area (PSA) [10], here we will introduce two new concepts to DIA in the setting of stroma: immature stromal percentage (ISP) and myxoid stromal ratio (MSR). It is possible that tumors with immature (myxoid) differentiation may have worse survival outcomes, and if so, then both the quality and quantity of the stroma in CRC is important.
For pathologists, looking at the invasive front in isolation may be performed, but cautiously, digital image analysis may allow for a more holistic assessment of the tumoral microenvironment, which in addition to showing tumor-to-stroma ratios, also quanti es the degree of immature, myxoid stromal differentiation (degeneration).
Current paradigms are shifting away from manual qualitative assessment and towards quantitative scoring [12]. With so many open access digital pathology solutions, more opportunities are becoming available for transitioning to digital based methodologies. The tumoral microenvironment harbors signi cant biological diversity, which now must be unraveled from whole slide images (WSI).

Materials And Methods
Institutional Review Board: This study received approval for all experimental protocols by the Institutional Review Board (IRB) of the Human Research Protection Program licensing committee at Northwell Health. All methods were carried out in accordance with all guidelines and regulations. Patient consent was not required due to the retrospective nature of the study (Northwell Health IRB number: 18-0128).

Study Design:
A total of 60 cases of colorectal carcinoma diagnosed in our health system were retrospectively analyzed and cases were collected from 2012-2017. Cases were only selected if they were primary resection specimens, had nil adjuvant therapy and availability of su cient non-frozen, formalin-xed, para nembedded tumor material for research purposes. One representative block was selected per case a single hematoxylin & eosin (H&E) slide was stained on the largest portion of tumor. Quality assurance steps were taken to further review slides for any folded, blurred, or obstructed morphologies. Slides were also evaluated for stromal differentiation, tumor budding and tumor-in ltrating lymphocytes. Further clinicopathological data was collected from the electronic medical records for each case and cancer free surivival (CFS) data was provided by the cancer registry of Northwell Health. In following variables were included for exploratory analyses: cancer-free survival, age, gender, tumor budding score, pathologic stage, tumor-in ltrating lymphocytes (TIL), nodal status, tumor grade, stromal differentiation, Mis-match repair (MMR) status, Ki-ras2 Kirsten rat sarcoma viral oncogene homolog (KRAS) and B-Raf Proto-Oncogene (BRAF).

Slide Digitalization:
Histological features in this study were assessed using virtual slides. Slides were scanned on Leica Aperio AT2 (Leica Biosystems, Buffalo Grove, Illinois, USA) whole slide scanner at 20×. Aperio vendor agnostic whole slide image viewer was used for pathologists to assess histology features described in our study. Digital slides were stored in the TIFF image format with JPEG image compression.

Superpixel Image Segmentation:
We incorporated QuPath [13], an open source for whole slides image analysis to evaluate the tumoral microenvironment. Qupath comes with built in trainable machine learning algorithms. All WSI les and were imported and orientated appropriately. This was performed on QuPath version 0.2.1 and superpixel image segmentation (SIS) was used to quantify stromal (myxoid and non-myxoid) and tumoral components. Superpixel image segmentation automatically grouped pixel similarities between different labeled cellular populations. In our study this was demonstrated as red (tumor), yellow (non-myxoid stroma), and blue (myxoid stroma). The entire area of the tumor was selected as shown in Figure 1. In this publication we classi ed the proportionated straoma area (PSA) as performed by Guedj et al. [10] and de ned by the total stromal to tumor area ratio. We developed the theoretical concepts of immature stromal percentage (ISP) and myxoid stromal ratio (MSR) for the purpose of this study. ISP is de ned as the percentage of immature (myxoid) stroma to the entire tumoral area, while MSP is de ned as the ratio of myxoid to non-myxoid stroma in the tumoral microenvironment.
Tumor Budding: For tumor budding assessment, a detailed search was done for the area having the highest grade of tumour budding. After that, the counting of the buds took place in the hotspot region (20× objective lens).
Tumor-In ltrating Lymphocytes: TILs were de ned as small mononuclear in ammatory cells which in ltrated between tumor cells. Tumors were assessed based on the 4-tier classi cation system previously validated for the quanti cation of in ammatory cells in colorectal cancer by Klintrup et al. [15]. Scoring was assessed at the deepest point of the invasive tumor. A score of 0 meant no in ammatory cells, 1 denoted a mild patchy increase in mononuclear cells, while a score of 2 and denoted a moderate (band-like) and orid (cuplike) in ltrate respectively, often accompanied by the destruction of cancer cell islands. Scoring was classi ed as low grade (0-1) or high grade (2-3).

Stromal Differentiation:
For stromal differentiation, scoring was primarily based on the grading system proposed by Ueno et al [16]. We analyzed the extramural desmoplastic front at low magni cation. Myxoid stroma was de ned as an amorphous stromal substance made of amphophilic material with a basophilic to grey extracellular matrix. This was usually intermingled with randomly oriented hyalinized collagen. Stroma was regarded as immature when at least a 40x eld of myxoid change was observed. We categorized stroma as mature when the brotic stroma did not contain myxoid and most commonly comprised of ne mature collagen bers strati ed into multiple layers.

Mis Match Repair Status:
Immunohistochemistry was performed in order to determine MMR status on formalin-xed and para n- Next Generation Genomic Sequencing: Molecular testing was performed on a subset of patients: 9 cases underwent BRAF and 12 cases underwent KRAS molecular testing. Genomic alterations of BRAF and KRAS were tested by next generation genomic sequencing on formalin-xed, para n embedded tissue and mutational analysis was performed at Genpath laboratories (Elwood Park, NJ).
Statistical Analysis: Comparative analysis was performed using the non-paired T test to examine differences in the clinicopathologic pro le. The Kaplan-Meier method was used to evaluate cancer-free survival rate as a function of time. The log-rank method was used to compare differences between groups. These statistical Analysis was performed using Prism Graphpad version 8.4.2 Univariate and multivariate analyses of cancer free survival using Cox proportional-hazard regression were performed on SPSS statistics version 23.0.0.3. A p-value < 0.05 was considered statistically signi cant.

Clinicopathologic and Patient Characteristics:
This study comprised data from 60 patients with colorectal adenocarcinoma who underwent surgical resection at our health system. Surgeries including block resection (1), right hemicolectomy (20), left hemicolectomy (27), transverse colectomy (5), rectosigmoidectomy (7 We found a spectrum of tumoral, non-myxoid and myxoid stromal components by superpixel image segmentation. Heatmaps for percentages can be viewed in Figure 2a, while non-linear regression curves can be seen in Figure 2b for PSA, ISP and MSR respectively. Based off non-linear regression, cut-off vales were designated as 0.9 (PSA), 10% (ISP) and 0.19 (MSR). Examples of superpixel hotspots with scores is demonstrated in Figure 3.
Means were calculated for PSA, ISP and MSR in order to perform T-test analysis for the clinicopathologic pro le. Firstly, tumor budding was found to be associated with a lower PSA (0.0116), high ISP (0.05) and high MSR (0.044). Pathologic stage was found the be associated with a lower PSA (0.0103) and a high ISP (0.013). Lymph node stage was found to be associated with high ISP (0.030) and high MSR (0.04). Tumor grade was found to be associated with high MSR (<0.001). Stromal Differentiation was found to be associated with high ISP (<0.0001) and high MSR (<0.0001). MMR de ciency was found to correlate with PSA (0.0132) and ISP (0.04). KRAS mutation status (0.015). The remaining variables were not signi cant. The results for T-test analyses can be viewed in Table 1. Based upon cox-regression of cancer-free survival (CFS), pathologic stage was found to be a poor prognostic factor on univariate analysis (P=0.03) but not on univariate (P=0.062). High TILs was found to have prognostic signi cant only on multivariate analysis (P=0.029). Low PSA, was associated with worse prognostic outcomes on univariate (0.003) and multivariate analyses (P=0.0018). High ISP was found to be associated with poor prognostic outcomes on univariate (P=<0.001) and multivariate analysis (P<0.003). High MSR was found to be associated with poor outcomes on univariate (P=0.028) and multivariate (P=0.03) analysis. Stromal differentiation was found to have prognostic signi cance on univariate (0.048) and multivariate analysis (0.019). The remaining clinicopathological variables were not signi cant (P>0.05) on cox proportional hazard regression analysis ( Table 2).

Discussion
Digital pathology may nally be coming of age. Whole slide scanners are becoming ubiquitous to the practice of surgical pathology, this will be pave way for new opportunities in computational pathology and digital image analysis.
The advent of open access image analysis software has been paramount to the success of digital pathology. Qupath [13] works to meet the need for a friendly, understandable digital pathology solution. It offers a comprehensive panel for image analysis and here we demonstrate the utility of SIS for calculating PSA and myxoid stroma percentages.
Importantly, PSA is not the same as the tumoral-stromal-ratio (TSR) [9]. The PSA technique includes the entire tumor area is computational, while traditional TSR is calculated only at the invasive tumoral border and performed manually by a pathologist on glass slides, allowing for observer bias.
In colon cancer, TSR has been found to correlate with prognostic outcomes, where high stroma ratios at the invasive tumor front are associated with poor outcomes [9]. However, we found the opposite effect, with low PSA values having worse prognostic outcomes. This may be secondary to the fact that PSA encompasses the entire tumor, not just the invasive front.
If they are just looking at the invasive front in isolation, then this may overestimate the stroma to tumor ratio. DIA may allow for a more holistic assessment of the tumoral microenvironment, which in addition to showing tumor-to-stroma ratios, also quanti es the ratio (MSR) and percentage (ISP) of immature, myxoid stromal differentiation (degeneration). For the PSA, it is possible that more tumor probably means more tumor budding, which correlates with worse prognosis. Whereas for the TSR performed at the tumor invasive front, more stroma could mean more immature SD, also associated with a with worse prognostic outcomes [17].
Regarding DIA, our ndings are also consistent with what was found in luminal tumors of the breast [11] and in intrahepatic cholangiocarcinoma [10], where higher proportions of tumor (low PSA) were found to be associated with worse overall survival outcomes.
A key point to consider is the variation in DIA methods between the two mentioned studies and this publication. In the breast publication [11] a single region of interest (ROI) was identi ed from a WSI and tissue microarrays (TMA) were created from tumor hotpots. In this publication we performed DIA on the entire tumor from a complete WSI, also performed on the study of intrahepatic cholangiocarcinoma [10]. WSIs may be the better solution, especially considering the signi cant tumor heterogeneity present in CRC, we feel this is the more pragmatic approach. Here we choose to use the slide with the largest section of tumor as we felt this most accurately represented the tumoral microenvironment.
Superpixel methods [18] are gaining traction in the eld of medicine and recent attempts have been made to combine SIS and DL. In dermatology the combination of superpixels and deep learning models outperformed other competing methods [19]. The advantages for SIS in the deep learning space is that is can represent the structure of an image in adaptive sizes and shapes, with the ability to improve classi cation performance, especially for noisy classi cation (corrupted labels), as well as boundary misclassi cation [20]. Future studies could combine applications in deep learning with superpixel methods in order to further analyze the tumoral microenvironment. This is the road to precision oncology.
Recent advances have been made in DL for CRC by Skrede et al. [21] who applied ten convolutional neural networks on supersized heterogeneous WSI. The DoMore-v.1 assay was able to differentiate prognostic groups stage independently and tested on on large patient populations, suggesting its superiority to other genomic and pathological prognostic markers. The algorithm predicted cancer-speci c survival in stage II patients as (HR 2·71, 95% CI 1·25 to 5·86, p=0·011) and stage III (4·09, 2·77 to 6·03, p<0·0001) [21].
More practically, the differentiation of the extracellular matrix leads to characteristic immature, myxoid stroma seen on routine histologic evaluation. Deciphering the ECM in CRC will facilitate novel therapeutic approaches [3]; most likely, myxoid degeneration decreases a tumors physical barrier, leading to improved therapeutic delivery. Mature SD may act as a barrier, suggested by preclinical studies, which have shown ECM degradation to improve drug uptake and response [3].
In our study we were able to more accurately characterize SD though DIA by calculating the ISP and MSR. Through SIS we were able to predict patient outcomes and clinical pro les better than manual analysis of SD. Importantly, these techniques could be used to tailor the need for adjuvant chemotherapy, although well-designed, robust clinical studies will be needed to determine this.
Interestingly, MSR was not found to be associated with tumor stage, suggesting that it may be able to predict clinical outcomes independent of tumor stage, it may be the key to unraveling the clinical heterogeneity present in CRC. Today, the use of adjuvant treatment in stage II colon cancer has garnered much controversy [22] and recommendations options range from observation, to single agent chemotherapy, to combination regimens. We need to better tailor the need for adjuvant therapy.
For patients with colorectal cancer, a rudimentary assessment of tumoral differentiation and stage may be insu cient. Looking forward, pathologists will have to borrow concepts from biology and develop new, novel approaches to prognosticate and treat patients with CRC. The tumoral microenvironment is a prime candidate for future applications in digital pathology, and the techniques described in this body of work can be performed easily by a surgical pathologist.

Declarations Data Availability
Pathology data and the statistical analyses for the current study are available from the corresponding author upon reasonable request.

Funding
No funding was provided for the production of this manuscript.   and super-pixelated for tumor (red), non-myxoid stroma (green) and myxoid stroma (blue). Only areas selected for annotation (yellow) are included into component percentages.