Prognostic Parameters on Baseline and Interim F-18-FDG-PET/CT in Diffuse Large B-cell Lymphoma Patients

Purpose FDG-PET/CT is a widely used imaging method in the management of diffuse large B-cell lymphomas (DLBCL). Our aim was to investigate the prognostic performance of different PET-biomarkers in a multicentre setting. Methods We investigated baseline volumetric values (MTV and TLG, also normalized for body weight) segmented with three different methods (>SUV4 [glob4]; 41% isocontour [41pc], and a gradiant-based lesion growing algorithm [grad]) and interim parameters (Deauville-score, ΔSUVmax, modied qPET, and rPET) alongside clinical parameters (stage, R-IPI), using 24-month progression-free survival as the clinical endpoint. Receiver operating characteristics analyses were performed to dene optimal cut-off points for the continous PET-parameters. Results 107 DLBCL patients were included (54 women; mean age: 53.7 years). MTV and TLG calculations showed good correlation among glob4, 41pc and grad methods, however, optimal cut-off points were markedly different. Signicantly different PFS was observed between low- and high-risk groups according to baseline MTV, bwaMTV, TLG, bwaTLG, as well as interim parameters Deauville-score, ΔSUVmax, mqPET, and rPET. Univariate Cox-regression analyses showed hazard ratios lowest for bwaMTVglob4 (HR=2.3) and highest for rPET (HR=9.09). In a multivariate Cox-regression model, rPET was shown to be an independent predictor of PFS (p=0.041; HR=9.15). A combined analysis showed that ΔSUVmax positive patients with high MTV formed a group with distinctly poor PFS (35.3%). Baseline MTV and TLG values and optimal cut-off points achieved with different segmentation methods varied markedly and showed limited prognostic impact. Interim PET/CT parameters provided more accurate prognostic information with semiquantitative „Deauville-like” parameters performing best in the present study.


Introduction
Diffuse Large B-cell Lymphoma (DLBCL) is a clinically, pathologically, and molecularly heterogenous haematological malignancy, considered the most common subtype of non-Hodgkin lymphomas [1]. In its initial clinical staging, the utility of F-18-uoro-deoxy-glucose (FDG) positron emission tomography/computed tomography (PET/CT) examination has gained vast evidence and is incorporated in current recommendations [2].
Aside from well-researched clinical, pathological and molecular prognostic factors, several F-18-FDG-PET/CT-based biomarkers have emerged in the last decade, also carrying prognostic information (beyond its inherent prognostic value in de ning the clinical stage of DLBCL).
Beyond its utility as baseline investigation, FDG-PET/CT plays an important role in the evaluation of treatment response at the end of therapy, or even in an early assessment, interim setting. Robust and wide-spread evaluation criteria based on the Deauville-5-point-scale have been established to decide the presence or absence of complete metabolic remission [9,10]. Aside from the ordinal Deauville-score (DS), continuous values have been investigated in highgrade lymphomas, most notably the proportianal decrease of lesion maximal standardized uptake value (SUVmax) and, to a lesser extent, semiquantitative "Deauville-like" parameters, such as qPET and rPET [11][12][13][14][15][16][17][18].
Our aim was to investigate the prognostic performance of baseline volumetric values (MTV and TLG) and interim parameters (DS and semiquantitative) derived from the FDG-PET/CT scans of DLBCL patients in a multicentre setting.

Methods
We investigated the baseline and interim PET/CT scans of DLBCL patients included in a multicentric study coordinated by the International Atomic Energy Agency (IAEA) who received R-CHOP (rituximab combined with cyclophosphamide, doxorubicin, vincristine, and prednisolone) immunochemotherapy. The study design was elaborately described before [19], this time a reduced number of patients was included in our sample after the following exclusion criteria: 1) treatment other than R-CHOP; 2) studies performed on a stand-alone PET scanner; 3) studies performed on different PET/CT scanners in baseline and interim setting; 4) missing or compromised imaging data; 5) event-free follow-up lasting less than 24 months. Ten centers in the same number of countries (Brazil, Chile, Hungary, India, Italy, Pakistan, the Philippines, South Korea, Thailand, and Turkey) participated in the IAEA study. The research was approved by the respective ethical review board of each participating centre and all subjects signed an informed consent form.
Clinical stage was determined by the baseline PET/CT scans according to the Lugano criteria and R-IPI was calculated for each patient [2,7]. Lymphoma lesions on baseline PET images were delineated with three different methods: 1) >SUV4 (glob4); 41% isocontour VOI around the local maximum point (41pc); a vendor-speci c gradiant-based lesion growing algorithm (grad), performed with Mediso InterView Fusion software (Mediso Medical Imaging Systems, Budapest, Hungary). MTV was calculated as the sum of all lymphoma lesions' volume on PET images, and TLG was determined as the sum of the product of each lesion's metabolic volume and SUVmean. Both MTV and TLG values were normalized for patient body weight, thus introdicing bwaMTV and bwaTLG values. Receiver operating characteristics (ROC) analyses were performed to de ne optimal cut-off points for MTV, TLG, bwaMTV, and bwaTLG for the three different segmentation methods.
Interim PET/CT scans were analyzed visually according to the Deauville criteria, resulting in Deauville-scores (DS) 1-5, and semiquantitatively. Deauville-score Page 3/12 percents between the baseline and interim scans (ΔSUVmax), and two semiquantitative "Deauville-like" parameters for which a 3 cm diameter spheric VOI was placed in the unaffected part of the right liver lobe. Modi ed qPET (mqPET) is the proportion of the hottest lesion's SUVpeak (the SUVmean of the hottest 1 cm 3 in the lesion VOI) and the SUVmean of the liver VOI -the original qPET value, described rst by Hasenclever et al. in pediatric Hodgkin's lyphoma used the mean SUV of the hottest 4 adjacent voxels in the lesion. Our use of the 1 cm 3 SUVpeak was based on the lack of adequate software as well as the hypothesis that in adult patients this volume would not lead to considerable distortion in the results. The ratio PET (rPET), as described before, is the proportion of the SUVmax in the hottest lesion and the liver reference VOI [16,17].
When establishing the diagnostic performance of the above different prognostic biomarkers, 24-month progression-free survival was the clinical endpoint.
Statistical calculations were performed in the R environment (The R Foundation, https://www.r-project.org) with R Studio software (RStudio PBC; Boston, MA, USA).

Patient characteristics
107 patients were included in the present study (mean age: 53.7; range: 16-83 years) with 53 women and 54 men among them. The majority of patients were from Hungary (57) and Chile (36), while eight, four, and two of them were from Thailand, the Philippines, and Italy, respectively. 58% of the patients presented with advanced stage disease. Further patient information is provided in Table 1. Comparison of volumetric parameters achieved by different delineation methods MTV and TLG calculations showed good correlation among glob4, 41pc and grad methods (Table 2), despite occasionally resulting in markedly different volumes ( Figure 1). ROC analyses yielded markedly different optimal cut-off points for MTV, TLG, bwaMTV, and bwaTLG with the three different segmentation methods (Table 3). Areas under the curve (AUCs) did not show a signi cant difference between MTV vs. bwaMTV and TLG vs. bwaTLG with the corresponding segmentation methods, the values ranging between 0.62 and 0.68 (Table 3). More diverse values in sensitivity, speci city, positive and negative predictive values, and diagnostic accuracy could be observed, primarily among the same volumetric parameters with different segmentation methods and not between traditional and body weight-adjusted MTV or TLG.

Prognostic value of baseline and interim biomarkers
With the aim of a more transparent data presentation, only the >SUV4-method-based (glob4) volumetric values (MTV, TLG, bwaMTV, bwaTLG) are presented, as it is considered the most easily reproducible segmentation method.
ROC analyses were performed to de ne optimal cut-off points for I-PET semiquantitative values, yielding values of -77.22%, 1.32, and 1.54 for ΔSUVmax, mqPET, and rPET, respectively. Sensitivity, speci city, positive and negative predictive values, and diagnostic accuracy of interim parameters are detailed in Table 4. Progression-free survival in the whole cohort was 75% ( Figure 2). Interestingly, log-rank survival analysis did not show a signi cant difference between the PFS of early and advanced stage patients (82% vs. 69%). Dividing the patients into two groups according to calculated optimal cut-offs or prede ned values (in case of DS) resulted in signi cantly different PFS for baseline MTV, bwaMTV, TLG, bwaTLG, as well as interim parameters DS (1-3 vs. 4-5), ΔSUVmax, mqPET, and rPET (Table 5, Figure 3). Univariate Cox-regression analyses showed a signi cant difference between low-and high-risk groups except for early/advanced stage and low/high bwaTLGglob4, with calculated hazard ratios (HRs) the lowest for bwaMTVglob4 (HR=2.3) and the highest for rPET (HR=9.09) among the remaining prognostic parameters (Table 6). In a multivariate Cox-regression model including DS (1-3 vs. 4-5), ΔSUVmax, rPET, MTV, and clinical stage (early vs. advanced) only rPET was shown to be a signi cant independent predictor of PFS (p=0.041; HR=9.15) ( Figure 4).  Figure 5).

Discussion
Several different segmentation algorithms have been used to determine baseline MTV in DLBCL patients. Ilyas at el investigated the SUV≥2.5, the 41%, and the "PERCIST" (≥1.5 x mean SUV + 2 standard deviations in a 3 cm 3 right liver lobe VOI) methods. The three segmentation methods yielded different optimal cut-off points for predicting PFS, ranging 166-400 cm 3 which is similar to our results of 123-345 cm 3 [20]. The same tendency can be observed in MTV measurements of solid tumours as shown by Zhuang et al. who performed eight different segmentations in non-smal cell lung cancer patients that yielded signi cantly different MTV values [21].
Our data indicate that although MTV and TLG yielded only moderately promising prognostic performance and areas-under-the-curve on ROC analyses, the gradient-based segmentation algorithm resulted in the best values, especially in terms of sensitivity and diagnostic accuracy. However, as this latter algorithm is vendor-speci c, its wide-spread use might be limited. TLG did not have better prognostic performance than MTV with the corresponding segmentation methods.
Apart from optimal cut-off points varying in the same patient cohort, MTV also shows a sample-dependency as markedly different values can be found among studies performed with the same (or highly similar) segmentation methodology, as in standalone studies referenced in the Ilyas paper and in a metaanalyses by Xie et al. and Guo et al., with optimal cut-off points ranging between 66 and 601.2 cm 3 for the SUV≥2.5 method and between 16.1 and 550 cm 3 for the 40-41% methods [4,20,[22][23][24][25][26].
As radiomics become more prevalent in several imaging research elds, standardization is paramount and the authors would recommend and support collaborations similar to the Image Biomarker Standardization Initiative to make PET imaging parameters more reliable and comparable among centres [27]. Still, as a basis of nearly all calculations, SUVs are also highly variable among studies and this points to a limitation of the current multicentric study as devices had not been cross-calibrated. At present, the reproducibility of SUVs can be supported by the implementation of EARL Harmonization Programme, however, our study had been concluded before its introduction [28].
To the authors' best knowledge, it is the rst time that body weight-adjusted (bwa) MTV and TLG values are published. The aim behind the introduction of this normalization was to enable a personalized and more accurate measurement of the impact of tumour burden (normalization to body surface area or lean body mass would also be a feasible option, however, our current dataset did not include patient height in all cases thus making such calculations impossible). Despite bwaMTV and bwaTLG not yielding improved prognostic values over MTV and TLG, respectively, there were a selected few cases where body weightadjusted MTV strati ed the patient in the correct risk-group as opposed to regular MTV ( Figure 6). These values could be further investigated in larger cohorts as their calculation can be easily carried out. Moreover, body surface area could also serve as a parameter for MTV normalization.
ΔSUVmax as a prognostic factor has gained a wider presence in the literature in recent years, the majority of the studies nding optimal cut-off points around 66% which our nding of 71.22% is close to [12]. Interestingly, in our study, ΔSUVmax evaluation did not result in better prognostic values than the visual Deauville-score method in the whole patient cohort.
Semiquantitative "Deauville-like" parameters may be more robust than ΔSUVmax in a multicentric setting as the variability in SUVs is at least partially mitigated by using ratios with a reference region. Neither qPET nor rPET values have an extensive literature in DLBCL, especially not in multicentric studies [13][14][15][16][17][18]. The optimal cut-off for mqPET was 1.32 in our DLBCL cohort which is highly similar to the established qPET cut-off in pediatric Hodgkin's lymphoma patiens based on a 4-voxel-SUVpeak. The optimal cut-off for rPET of 1.54 was higher than the 1.14 and 1.4 values published by Annunziata et al. and Toledano et al., respectively, and close to Fan and coworkers' nding of 1.6 [16-18]. In our study, both mqPET and rPET evaluation yielded moderately more accurate prognostic results than DS strati cation.
Interim parameters had a higher hazard ratio in univariate Cox-regression analyses than baseline volumetric parameters while multivariate Cox-regression analysis resulted in rPET as the only independent predictor of PFS. Also, combined analyses showed that good early treatment response (i.e. DS 1-3) has a higher impact on PFS than baseline MTV. This nding is contradictory to that of published by Mikhaeel et al. who found that patients with MTV≥400 cm 3 had a worse prognosis, irrespective of DS on interim scans [2]. Furthermore, in the present study the combination of baseline MTV and ΔSUVmax enabled to de ne a group with particularly poor prognosis (i.e. patients with high baseline MTV and DS4-5 on interim scan).

Conclusion
Baseline MTV values and optimal cut-off points achieved with different segmentation methods varied markedly and showed limited prognostic impact. Interim PET/CT parameters provided more accurate prognostic information with semiquantitative "Deauville-like" parameters (mqPET and rPET) performing best in the present study. A combination of baseline MTV and ΔSUVmax allowed to separate a patient group with particularly poor prognosis.

Declarations Funding
The study was funded and supported by the International Atomic Energy Agency (Coordinated Research Project E1.50.20).
Compliance with ethical standards

Con ict of interest
The authors declare that they have no con ict of interest.

Ethics approval
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Informed consent
Informed consent was obtained from all individual participants included in the study.

Availability of data and material
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

Authors' contributions
All authors contributed to the material preparation, data collection and analysis. The rst draft of the manuscript was written by Sándor Czibor and all authors commented on previous versions of the manuscript. All authors read and approved the nal manuscript. Progression-free survival curve of the patient population