Machine-Learning Approach Using FDG-PET-Based Radiomics in the Characterization of Mediastinal Bulky Lymphomas.

Mediastinal bulky involvement is common in lymphomas, particularly in classical Hodgkin lymphoma (cHL), primary mediastinal B-cell lymphoma (PMBCL) and grey zone lymphoma (GZL). Despite advanced methods of assessment, traditional imaging appears insufficient in differentiating histology. This study tested the diagnostic value of 18 F-FDG PET/CT volumetric and texture parameters in the histological differentiation of mediastinal bulky due to GZL, PMBCL and cHL, also using a machine-learning approach. Methods: We retrospectively reviewed patients with mediastinal bulky disease with a histopathological diagnosis of cHL, PMBCL or GZL who underwent pre-treatment 18 F-FDG PET/CT. Lesions were delineated using a fully automated preselection of 18 F- FDG avid structures defined by a threshold SUV ≥ 2.5. Volumetric and radiomic parameters were measured using LIFEx software both for bulky lesion (BL) and for all lesions (AL) on 18 F-FDG PET/CT. Analysis of selected radiomic features was performed with Machine Learning classifiers based on Logistic Regression. Results : We reviewed 117 patients (29 PMBCL, 80 cHL, 8 GZL). The analysis showed significant differences between the 3 lymphoma groups regarding SUV max , SUV mean , BL/AL-MTV and BL/AL-TLG. Several PET textural features both of first order and of second order grey-level showed significant differences between the 3 groups. Finally, machine-learning classifier provided good accuracy in the discrimination between groups. Logistic Regression showed good performance, confirming true positive rate (TPR) and true negative rate (TNR) greater than 80% in the characterization of PMBCL and cHL. The multiclass classifier showed TPR greater than 70% and TNR lower 5% in the identification of PMBCL and cHL and TPR of 44% in GZL. as an imaging biomarker, and the use of radiomic features for early characterization of mediastinal bulky lymphoma.


Introduction
Mediastinal bulky involvement in aggressive lymphomas poses several clinical and diagnostic challenges, for reasons both related to the risks of a voluminous mass in that anatomical district and for the technical difficulties in obtaining a diagnostic sample without an aggressive surgical approach. Bulky disease is recognized as a negative prognostic factor, so its early recognition and accurate characterization is crucial in the staging process of lymphoma [1]- [4]. Several definitions and measurement methods have been suggested to define the most appropriate dimensions, but without a consensus both for Hodgkin Lymphoma (HL) and for non-Hodgkin Lymphoma (NHL) [5][6][7][8].
Another diagnostic challenge to overcome in bulky disease is biopsy. Satisfactory sampling is essential for diagnosis and classification of malignant lymphoma. Fundamental steps are identification of the best site, the easiest access, and the choice of the most appropriate technique, to avoid inconclusive results [9][10][11]. The surgically excised tissue biopsy is widely accepted as the gold standard for the diagnosis of lymphoma based upon the current international guidelines, however it probes only confined parts of the tumor, especially in bulky disease [1,3]. Mediastinal bulky lymphoma could be due to several lymphoma histologies, and even in the same bulky mass different histologies may coexist. Both inter-and intra-patient heterogeneity may be found, varying with the histopathological pattern of lymphoma. [2] Finally, despite the similarities in clinical symptoms and dimensions of mediastinal bulky lymphomas, treatment strategies and clinical outcomes highly differ between the 3 most frequent histologies, which are cHL, primary mediastinal B-cell lymphoma (PMBCL) and a rare entity entitled B-cell lymphoma unclassifiable with features intermediate between diffuse large B-cell lymphoma (DLBCL) and cHL, also known as grey zone lymphoma (GZL).
Because the treatment approaches are different, the pathologist must make these distinctions precisely, and the importance of differentiating histology has been highlighted. Often this differentiation is not at all simple and obvious, in particular for GZL, which differential diagnosis may be difficult even in nodal samples for a spectrum of morphologies between DLBCL and cHL with non-concordant immunohistochemical findings that may be observed within the same tumor specimen. [12] In this scenario, morphological and functional imaging represents a fundamental diagnostic method to fully characterize bulky mediastinal lymphoma. Computed Tomography with contrast medium (ceMDCT) is the most widely used technique during staging as it allows to evaluate both superficial and deep lymph node stations with good accuracy, identifying lymph nodes with altered morphology as pathological. 18 F-FDG PET/CT is also considered an indispensable tool for staging and defining the response to treatments in different lymphomas. Its role derives both from the intrinsic characteristics of the method (glucose metabolism) capable of providing information on specific functional patterns, but also through a continuous refinement of many methodological and interpretative aspects [1,4,6].
Currently, texture analysis is a newly developed, high throughput way to extract from images numerical information that naked eyes cannot detect and can thus quantify more image features than standard analyses [13]. Literature on functional imaging radiomics for lymphoma is still scanty. Nevertheless, published research in this field highlights two main areas of clinical interest: outcome prediction and histological differentiation from other types of cancer [13][14][15].
In this study, we assessed the diagnostic value of 18 F-FDG PET volumetric and texture parameters in the histological differentiation of mediastinal bulky lymphomas due to GZL, PMBCL and cHL. This study uses machine-learning algorithms on relevant features to predict the different histological types of the mediastinal bulky masses. The aim is to understand how PET radiomic features extracted from bulky masses may predict lymphoma histology and in the future support their histological diagnosis.

Population
From January 2010 to April 2021, a cohort of untreated patients with a histopathological diagnosis of PMBCL, cHL, or GZL was retrospectively evaluated in a monocentric study conducted by the Nuclear Medicine and Haematology divisions of Azienda Ospedaliera Universitaria Careggi (Florence, Italy). The study was approved by the Institutional ethics Review Board of the centre and all patients provided written informed consent. Eligibility criteria included ≥18 years, histologically proven PMBCL, cHL,and GZL. Relapse or refractory patients were excluded from the study.
Demographic patient data, histology, disease stage, and presence of bulky disease were recorded from electronic patient charts and radiology information system.
All patients underwent 18 F-FDG PET/CT at baseline (Fig.1). Staging was assessed according to Ann Arbor classification and Lugano criteria . [6] Image acquisition All patients fasted for at least 4 h before examination and blood glucose levels were at least <200 mg/dl. Images from the mid skull to pelvis were obtained on a dedicated PET/CT scanner (Philips Gemini TF 16 PET/CT), 60 min after intravenous injection of 0.1 mCi/kg of 18 F-FDG . Before PET scans, a low dose CT (120 kV; 50-80 mA) was acquired to allow attenuation correction and lesions localization. PET images were reconstructed using an iterative algorithm (3D LOR RAMLA reconstruction with TOF, FOV: 576, matrix: 144×144, voxel dimension: 4×4×4 mm) and fused images of matching pairs of PET and CT images were available for review in axial, coronal, and sagittal planes and in maximum intensity projections (MIPs).

Data analysis
All available pre-treatment imaging studies, including ceMDCT and low-dose CT performed in conjunction with 18 F-FDG PET imaging were reviewed. The longest diameter of the largest individual or conglomerate lymph node mass was measured in the transverse plane and coronal plane to confirm bulk disease.
PET images were independently evaluated by a nuclear medicine physician and a resident in nuclear medicine, using the open-source software platforms LifeX (Orlhac F, Nioche C, Buvat I. LIFExuser guide. LIFEx https://www.lifexsoft.org) [16].
Each measurement was performed with the SUV thresholding method, based on the absolute SUV value >2.5 on both bulky lesion (BL) and all lesions (AL), if present. SUVmax and SUVmean of the BL were also evaluated, calculated in SUVbw units for both variables.
To make segmentation as user independent as possible, we kept default settings suggested by the manufacturers in the user guide. The regions containing only physiological 18 F-FDG uptake (brain, bladder, heart, kidney, intestine, etc.) were edited out. Furthermore, when the pathological tissue was automatically incorporated into the same volume of interest (VOI) with adjacent physiologically active tissue, the VOI was redesigned by defining the limits of the slices in which the lesion was included and redrawing it semi-automatically.
AL-MTV was defined as the sum of the metabolic volumes of every individual lesion. For each lesion, tumor lesion glycolysis was calculated as the product of the lesion volume by the SUVmean within the lesion, and total lesion glycolysis was obtained by summing tumor lesion glycolysis over all lesions.

Texture analysis and Machine Learning
From PET images, we extracted the features defined as first-order and second-order, or higher-order using Local Image Features Extraction (LIFEx) software (http://www.lifexsoft.org) on the BL VOI.
First-order features are typically obtained from the histogram of grey-level values obtained from the considered VOIs and describe the overall variation of the signal in the VOI regardless of the relative position of the voxels. These features are fairly invariant to geometric transforms, therefore robust with respect to image reconstruction and filtering.
They include basic statistics such as mean, median, range, standard deviation, skewness, and kurtosis.
Second-order features are those that are usually referred to as texture features, since they consider the spatial relationship between neighbouring VOIs in an image, and thus they are capable to capture details regarding the heterogeneity of the lesions. They include parameters from Gray-level co-occurrence matrix (GLCM), neighbourhood Gray-level different matrix (NGLDM), Gray-level run length matrix (GLRLM), and Gray-level zone length matrix (GLZLM), and indices from sphericity and histogram. A detailed description of the various texture parameters evaluated can be found at http://www.lifexsoft.org. Conventional and advanced metabolic tumor parameters as well as radiomics features assessed are summarized in Table 1.
Radiomic features from PET/CT images were fed into Machine Learning models in order to build a histological representation of mediastinal lymphomas, able to help during the diagnostic phase [17][18][19]. The use of Machine Learning techniques assumes that the training dataset is statistically significant, not only in terms of instance multiplicity but also in terms of the discriminant power brought by each instance [20][21][22]. Thus, before training any models, an intense data cleaning over texture features has been performed, removing all the features with strong correlations (|PCC| > 0.75) and the ones without any discriminant power.
Lymphoma type identification pushes the Machine Learning technique also due to the separation of non-completely disjoint histological classes. This histological-based limitation prevents to directly perform multiclass studies, requiring more complex solution for classification. Binary classification between PMBCL and cHL was carried out, then the resulting classifier was enhanced with multiclass capabilities thanks to the definition of decision regions embedding the mixed characteristics of GZL.

Statistical analysis
Statistical analysis was performed with the SPSS software (SPSS25) by means of ANOVA to evaluate the differences of the means between the measured variables. Quantitative data are presented as mean ± standard deviation (SD). A p value <0.05 was considered significant.

Patients
One-hundred thirty-one patients with histologically proven PMBCL, cHL or GZL were included in the study. Fourteen patients were excluded from the study because of missing 18 F-FDG PET/CT at baseline. Seven of the analyzed patients (all cHL) were excluded from the clinical evaluation because they were treated outside of our Institution but were included in the texture assessment for availability of staging 18 F-FDG PET/CT. Patient characteristics are summarized in Table 2.
In this study we considered as tumor bulk all the mediastinal masses with a diameter ≥ 5 cm. Median largest mediastinal diameter measured for PMBCL was 11.0 cm (range, 6.

Radiomic features
Several conventional and advanced metabolic parameters, as well as texture features, were significantly different among the three groups. SUVmax and TLG were significantly different among groups, with highest values in PMBCL, intermediate in GZL and lowest in cHL (Fig.2). PMBCL and GZL had similar MTV values, but significantly higher than cHL.
Among first order variables, histogram skewness and kurtosis were significantly different among groups (p<0.001).
PMBCL showed significantly lower (negative) values as compared to the other groups, indicating asymmetrical histograms with left-sided skewness. On the contrary, cHL showed significantly higher positive values than the other groups, indicating asymmetrical histograms with right-sided skewness.
Regarding kurtosis, cHL showed significantly higher kurtosis values, indicating a value distribution with heavier tail, or a greater number of outliers, as compared to PMBCL, which showed significantly lower kurtosis values.
Among the features extracted from NGLDM, only Contrast resulted significantly different, with PMBCL showing significantly higher contrast than GZ and cHL, and GZL significantly higher than cHL.
In GLZLM, which provided information on the size of homogeneous zones for each Gray-level in 3 dimensions, several texture features resulted significantly different among groups (Fig.3)

Machine Learning Analysis
The Machine Learning analysis started with the reduction of the size of the dataset for training and testing models to 101 (28 PMBCL, 65 cHL and 8 GZL), because of missing data for some features of the study.
The first step of lymphoma type inference was the binary classification PMBCL-cHL. A stratified splitting in training and test sets (80/20) was performed, resulting in 73 instances (21 PMBCL and 52 cHL) to train models. Due to the small number of instances for training, we used a simple classification technique like Logistic Regression, implemented in Python through the Scikit-Learn package [23]. The classifier was trained on 17 parameters including conventional and advanced metabolic indices, together with texture and other features listed in Table 1. The five most important features for lymphomas discrimination are reported in Table 3, together with the list of parameters dropped due to their correlations [24].
Once trained the model, the decision boundary for predictions was set to maximize the PMBCL identification (TP) within the training set, while ensuring a tolerable misidentification for cHL (FP). Logistic Regression with custom decision boundary showed good performance on the testing set as well, confirming a TPR greater than 80% as reported on the confusion matrix of Figure 4A.
GZL predictions have been added to the binary classifier capabilities with the construction of three decision regions.
Since its mixed characteristics, the GZ region was built around the binary decision boundary with an asymmetric interval (Fig. 4B).
As reported in the confusion matrix, the so-built multiclass classifier showed TPR greater than 70% in the identification of PMBCL and cHL. GZL was identified with TPR of 44% (Fig.4C).

Discussion
In this study we demonstrated the diagnostic value of 18  proliferative lymphomas, such as PMBCL, often shows very high metabolic activity [25][26][27]. Our results are consistent with these findings, confirming that differences in proliferative activity may be a possible explanation for the different 18 F-FDG uptake in various histotypes. Moreover, both SUVmax and SUVmean values were significantly lower in cHL compared with PMBCL and GZL, consistently with its peculiar neoplastic tissue architecture, composed of few and scattered neoplastic cells called Hodgkin and Reed-Sternberg (HRS) cells, accounting for less than 1% of the total cell count, surrounded by an overwhelming population of non-neoplastic mononuclear bystander cells with a high metabolism [28,29]. Such histopatological features may explain the differences in metabolic activity of cHL as compared to PMBCL. Finally, we found intermediate behaviour in GZL group, again consistent with its biological characteristics. Indeed, GZL could show a wide range of cellular patterns, with coexistence of cells resembling HRS cells, cells with centroblastic or immunoblastic cytology and marked nuclear pleomorphism, and cells with more monomorphic cytology, resembling PMBCL [30].
Beyond SUV values, there were significant differences among lymphoma types also regarding volume-based variables, such as MTV and TLG. MTV represents one important prognostic factor in lymphoma, beside stage and the other clinical factors such as LDH, Performance Status, age and presence of extranodal disease. Not surprisingly, most studies have demonstrated that MTV was an adverse prognostic factor, regardless of lymphoma histology and choice of quantification method [8,[31][32][33].
Interestingly, our data are in line with the concept that volumetric parameters have greater values in aggressive histotypes, with consequent worse prognosis. Indeed, in our study PMBCL and GZL showed higher values of both MTV and TLG than cHL and interestingly PMBCL significantly differ from GZL in TLG.
In addition, one of the most interesting findings our study is the significant difference between lymphoma types regarding several texture features extracted from 18 F-FDG PET images. Metabolic heterogeneity is a remarkable feature of malignancy, and texture analysis provides tools for quantifying tumor heterogeneity [34]. Those features measuring increasing heterogeneity within tumors may be associated with differences in regional tumor cellularity, proliferation, hypoxia, angiogenesis, and necrosis, as well as genomic variety, all factors that have been independently associated with more aggressive behaviour, poorer response to treatment and worse prognosis [35][36][37].
We chose to focus on patients presenting with mediastinal bulky lymphoma since it is well-established that the complementary information provided by textural analysis strongly depends on tumor size. Although the information increases substantially with larger volumes, however, the level of correlation between the variables tends to decrease substantially [30,38].
In our study, the high co-occurrences features, analysing inter-relationships between pairs of voxels and thus characterizing local non-uniformities, were able to significantly differentiate the three lymphoma groups. Orlhac et al. reported that LZEGLZLM exhibited higher values in visually homogeneous lesions than in heterogeneous lesions [39].
They identified two different sets of texture indices able to reflect the different characteristics of uptake heterogeneity.
The first one included homogeneity and entropy derived from GLCM, and run emphasis features derived from GLRLM, all of which were sensitive to the presence of uptake heterogeneity. The second set comprised gray-level zones for GLZLM, which were mostly sensitive to the average uptake rather than to the §local heterogeneity. So homogeneous lesions like cHL had a higher value of homogeneityGLCM and LGZEGLZLM and a lower value of entropyGLCM and HGZEGLZLM than heterogeneous lesions as PMBCL and GZL. As expected, entropy, which reflects disorder, varies in an opposite direction to homogeneity. Another important aspect is how SUV measurements and entropy are strongly correlated and this is also reflected in the trend of our results, with higher values in PMBCL and lower in cHL [40,41]. The more chaotic the image is the higher the entropy [13,42]. The other features evaluated in this study highlight tumor heterogeneity, depending on the type of matrix used and the kind of feature computed on this matrix. Consequently, whereas a single feature cannot be directly linked to a specific biologic process, one could assume that a combination of textural parameters may be closely related to underlying physiologic processes, such as vascularization, neoangiogenesis, perfusion, tumor aggressiveness, or hypoxia [43,44].
Finally referring to the GLZM matrix, the feature ZLNU, which measures the variability of the intensity values of the gray level in the image, indicated greater homogeneity in cHL than in PMBCL, showing on the contrary very high values.
Literature studying the possibility of pathological predictions of histological subtypes based on radiomic features from 18 F-FDG PET/CT images have produced conflicting results. Previous studies excluded the possibility of creating a close correlation between histology and radiomics, even if texture indices may provide some useful information in the spatial organization of tumor cells [45]. More recently, the application of machine learning methods to radiomics allowed to produce a classification modelling to predict the different subtypes of malignant lymphomas [17,46]. Our data are consistent with these latter results, demonstrating that machine learning analyses can create a classification model with good discriminant power.
Of all available indices, we observed that all top five features (SUV mean , MTV, SREGLRLM, coarsenessNGDLM, ZLNUGLZLM) obtained with our best performing model (Logistic regression) are mostly activity dependent, suggesting that metabolic and average differences regardless of spatial relationships are essential for the differentiation of tumor lesions. Nevertheless, Machine Learning can fulfill good performances only through the synergism between metabolic activity parameters and high order features, confirming the complex discrimination of biological processes in pathological tissue. The other selected features are reasonably considered relevant in discrimination, but with less direct influence in the algorithms' performances.
The so-built multiclass classifier showed promising performance in the separation of the three lymphoma types, paying with an unavoidable drop in performance with respect to the binary classification. The model assumption that GZL has intermediate characteristics between cHL and PMBCL derives from its biological traits and is fully reflected by the observed distribution radiomic variables (Fig. 4). Therefore, despite the presence of histological subtypes that are not completely disjointed, Logistic Regression was able to obtain very good classification results: 1) very high TPR value in the identification of PMBCL, which represents the histotype with the worst prognosis, and therefore the most significant one to be identified by haematologists; 2) just under 5% of PMBCL are erroneously classified as cHL; 3) the discrimination values of cHL are slightly lower than PMBCL, but nevertheless valid (TPR=73%).
Our radiomic data, in conjunction with machine learning analyses, demonstrate that textural features derived from PET imaging correlate with histologic diagnosis in patients with mediastinal bulky lymphoma and thus may support anatomopathological analyses. In this context, 18 F-FDG PET could aim to be used in several additional ways over the standard clinical use. On one hand, it could represent the best method to guide biopsy in a voluminous bulky mass (where and how many samples?), eventually identifying the metabolic patterns possibly corresponding to specific histologies. On the other hand, in the future 18 F-FDG PET radiomics could help clarifying possible discordances between the histopathological diagnosis and clinical characteristics of the patient. In this last case, radiomic features of the bulky mass could represent an additional variable that could help the clinicians and the pathologists to define the diagnosis or guide the decision to perform a second biopsy in a more suspect area of the bulky lesion.
The present study had some limitations. First, this was a retrospective study. Secondly, PMBCL and GZL cohorts are limited, especially as compared to cHL group; however, such proportions respect the incidence of these diseases. Thus, it is important to validate these results with an enlarged sample size or a prospective trial in the future. Finally, the relationship between texture parameters and histopathological findings requires further study, to find out each tumor biological dependency of each textural feature.

CONCLUSION
Different subtypes of bulky mediastinal lymphoma, namely cHL, GZL and PMBCL, demonstrated different 18 F-FDG PET characteristics and metabolic heterogeneity examined by radiomics and texture analysis. Machine learning could be successfully combined with radiomics, providing good discriminative sensitivity, especially for cHL, and PMBCL.
In the era of precision medicine and personalized treatment, this preliminary study supports the potential of metabolic texture analyses as future imaging biomarker, with growing role in the clinical diagnosis.   GLCM gray-level co-occurrence matrix, NGLDM neighborhood gray-level difference matrix, GLRLM gray-level run length matrix, SRE/LRE short/long-run emphasis, LGRE/HGRE low/high gray-level run emphasis, SRLGE/SRHGE short run low/ high gray-level emphasis, LRLGE/ LRHGE long run low/high gray-level emphasis, GLNUr/RLNU gray-level nonuniformity for run/run length non-uniformity, RP run percentage, GLZLM gray-level zone length matrix, SZE/LZE short/long-zone emphasis, LGZE/HGZE low/high gray-level zone emphasis, SZLGE/SZHGE short-zone low/high gray-level emphasis, LZLGE/LZHGE long-zone low/ high gray-level emphasis, GLNUz/ZLNU gray-level nonuniformity for zone or zone length nonuniformity, ZP zone percentage