Patient cohort
In total 251 LN from 63 patients were assessed histologically and quantified multiparametrically one by one with [18F]FDG PET/MRI. 211/251 LN were located within the FOV of the delayed scan, thus enabling dual-time-point [18F]FDG kinetic calculation. 219/251 LN had a sufficient size for ADC calculation. 79/251 LN from 54/63 patients met the criteria for SLN in [99mTc]Tc-Nanocolloide SPECT/CT. Detailed patient characteristics are presented in Table 1.
Prevalence of LNM dependent on stage and grade of primary tumors
In two patients and 6 LN respectively, no grading of the primary tumor was reported owing to conizations performed at other centers and no tumor left when performing the (radical) hysterectomy in our center. The prevalence of LNM increased with the T-stage of the primary tumor as presented in Figure 1. Patient based prevalence of LNM was not significantly higher in patients with G3 (40%) than G2 (29.6%) tumors (p=0.35) in this cohort. No LNM occurred in patients with G1 tumors.
Interrelationships of histology and PET/MRI parameters
LNM demonstrated a higher SUV, larger diameters and higher RI and ∆SUV than benign LN as listed in detail in Supplemental Table 2.
Moreover, this effect was amplified by the grade of the primary tumor as presented in Figure 2 and Supplemental Table 2. In particular, LNM from G3 tumors presented with significantly higher SUV as well as significantly higher FDG-dynamics between early and delayed scan measured with RI-SUVavg (p=0.03) and ∆SUVavg (p=0.02) compared to LNM from G2 tumors (p<0.01) as presented in detail in Supplemental Table 3. Furthermore, G3 LNM presented with a wider short-axis diameter vs. G2 LNM (p<0.01) and a slight increase in sphericity (p=0.08) while ADC revealed no significant difference.
LN short-axis diameter correlated significantly with SUVe, SUVd, BPCSUVe, BPCSUVd, ∆SUVpeak (p<0.01, r:0.477-0.716) but not with RI-SUVpeak (r=0.085) or ADC (r=0.241).
G3 LNM revealed an increase of [18F]FDG uptake between early and delayed scan compared to benign LN (RI-SUVpeak and ∆SUVpeak: p<0.01 and 0.02) as presented in representative cases in Figures 3a and 3b. A similar trend was observed for RI-SUVpeak in G2 LNM, though not reaching significance (p=0.19).
PET/MRI Parameter evaluation
PET demonstrated a high accuracy in differentiating between LNM and benign LN using a SUV based quantification with an AUC up to 0.809 (as presented in detail in Figure 4 and Supplemental Table 2 without significant differences between SUV quantification parameters SUVemax, SUVepeak or SUVemean (p≥0.54).
The delayed PET scan did not result in a significant higher AUC than the early PET scan (p≥0.55). Blood pool correction improved the AUC in the delayed PET slightly, but did not reach significance (SUVeavg: 0.784 vs. 0.766, SUVdavg: 0.741 vs.0.767, p=0.73).
Primary tumor grade revealed a crucial impact on accuracy of LNM detection in PET with a significant decrease of discriminatory power in G2 versus G3 tumors (SUVeavg G2: 0.673; G3: 0.901, p<0.01). Error rate (ER= false positive + false negative rate = 1-accuracy) was more than twice as high for G2 LNM (65.5%) than for G3 LNM (30.4%) at their individual optimal SUVeavg cut-off values as shown in Supplemental Figure 1, while prevalence was comparable (G2: 17.5% vs. G3: 23.0%).
Dual-time-point kinetic calculated with RI and ∆SUV significantly correlated with malignancy, especially in G3 tumors with an AUC up to 0.791 (p<0.01). The SUVpeak quantification method achieved the highest AUCs but required blood pool correction. Overall, the ∆SUV calculation method was comparable to the RI-SUV, but performed slightly and non-significantly better in G3 tumors (G3 SUVavg: 0.791 vs. 0.718, p=0.48).
Morphologic LN diameter revealed a significant discriminatory power for short- axis (0.741) and the long-axis (0.777) measurements and performed best in LNM from G3 tumors (AUC 0.904 and 0.881). LN sphericity was no significant stand-alone parameter for LNM detection, neither in G2 nor G3 tumors (p≥0.269).
ADC presented a borderline significant discriminatory power (AUC 0.600, p=0.05), with a significantly lower AUC compared to the SUVavg and short-axis diameter (p<0.01 and p=0.03, n=162).
Multiparametric approach
The parameters ADC, sphericity, bpcSUVeavg and tumor grade of the primary tumor were identified as interindependent predictors for LNM and were included in the calculation of the MS as described above. The response variables of the model were the probabilities of being malignant predicted by the model, calculated as a sum of the predictor values weighted according to their (fixed effect) regression coefficients. After listwise exclusion of cases with missing parameters, the sample size was 171 LN with a prevalence of 21.1%.
Using MS resulted in a high discriminatory power between malign and benign LN (AUC: 0.820, 95%CI: 0.736-0.879). At the optimal cut-off value (Youden optimum: 0.042) the MS improved sensitivity from 63.5% to 72.2% compared to SUVeavg at a specificity of 80.7%.
Furthermore, error rates could be lowered (47.0%) and kept constant over a wider cut-off range compared to the best single parameter SUVeavg (52.7%) as presented in Supplemental Figure 1.
Further subanalysis focusing on the grade of the primary tumor, revealed a significantly (p<0.01) better prediction of LNM in G3 tumors (AUC 0.850, 95%CI: 0.755-0.945) compared to G2 tumors (AUC 0.695, 95%CI: 0.526-0.863). In particular, the parameter SUVe showed markedly different predictivity for LNM in G2 compared to G3 tumors (Log-odds: SUV:1.5/17.7, p=0.01).
Additional value of dual-time-point [18F]FDG kinetic
The implementation of dual-time-point parameters significantly improved the Model fit. Best parameters were ∆SUVpeak (Loglikelihood: -42.66; χ2 difference = 7.11; p<0.01) as well as the RI-SUVpeak (Loglikelihood: -43.44; χ2 difference = 5.20; p=0.02; Loglikelihood comparison of the comparison model without these dual-time-point parameters: -46.21, n=144).
Implementing dual-time-point [18F]FDG kinetic parameter ∆SUVpeak in the MS lowered error rates in G2 tumors by one third from 65.5% to 44.5% compared to the best single parameter SUVavg as presented in Supplemental Figure 1.
Implementation of ∆SUVpeak and RI-SUVpeak resulted in a slightly but not significantly increased discriminatory power (MS + ∆SUVpeak: AUC 0.837, sensitivity 79.3%, specificity 75.7%) compared to standard MS model (AUC 0.820, sensitivity 72.2%, specificity 80.7%).
Visual vs. multiparametric evaluation
Specificity was set by the visual evaluation and corresponding sensitivity was compared between visual and multiparametric LN evaluation using MS.Applying MS increased the overall sensitivity from 31.0 to 37.9% compared to the expert consensus at a set specificity of 98.3% (n=144, prevalence 20.1%); although the defined specificity was far from the Youden optimum of the MS (sensitivity 79.3%, specificity 75.6% at cut-off 0.0042).
For G3 tumors MS revealed a higher sensitivity (47.1% vs. 58.8%) compared to the human reader at a set specificity of 96.3% (n= 71, prevalence 23.9%), which was close to the Youden optimum (sensitivity of 76.4% at a specificity of 85.1%, cut-off: 0.0908).
For G2 LNM, using MS revealed an identical sensitivity of 8.3% at a set specificity of 100% (n=73, prevalence 16.4%). However, sensitivity increased from 8.3% to 83.3% if adjusted to the Youden optimum at a specificity of 72.1% (cut-off: 0.0435).