A Machine-learning Signature Performs as Well as Experts' Consensus to Diagnose Pelvic Lymph Node Involvement in Bladder Cancer on [18F]-FDG PET/CT: A Pathology-controlled Study

Purpose To develop and validate an objective machine-learning-based signature for the diagnosis of pelvic lymph node (LN) involvement based on [ 18 F]-FDG PET/CT before radical cystectomy in patients with muscle-invasive bladder cancer (MIBC). Methods Between 2010-2017, consecutive patients with localized MIBC were retrospectively and randomly assigned to a training (n=115) and a validation (n=58) sets. The reference standard used for LN involvement was pathological status after extended pelvic LN dissection. In the training set a signature was obtained with a random forest algorithm built from 21 morphological and metabolic imaging features. The signature performance was evaluated using areas under ROC curves (AUCs). The prevalence of pelvic LN involvement was 26% (n=30/115) in the training set, and 21% (n=12/58) in the validation set. In the training set, the top 5 features selected by the algorithm were derived from pelvic LNs (SUVmax, sum of the products of diameters, product of diameters of the largest LN), primary bladder tumor (tumor largest diameter), and urine (SUVmax). In the training set, the AUC of the signature was 0.79 (95%CI 0.71-0.88), versus 0.72 (95%CI 0.62-0.81) with the consensus of experts, p=0.24. In the validation set, AUCs were 0.67 (95%CI 0.50-0.84) and 0.62 (95%CI 0.48-0.77) p=0.70, respectively. The diagnostic performance of a machine-learning signature was not inferior to the expert consensus review for pelvic LN staging, providing an objective and accurate tool for this task.


Introduction
Bladder cancer is the tenth most common cancer worldwide, with more than half a million new cases diagnosed globally and 200,000 related deaths per year [1]. At diagnosis, approximately 25% of the patients have muscle invasive bladder cancer (MIBC). This diagnosis is achieved through transurethral resection of the bladder tumor (TURBT). MIBC is an aggressive disease characterized by a high rate of nodal and/or distant metastases [2]. Accurate preoperative staging with abdomen CT scan or MRI is standard practice. In patients with clinically localized disease (cT2-T4aN0M0) and adequate renal function, the current standard of care is radical cystectomy (RC) with neoadjuvant cisplatin-based chemotherapy (NAC). For patients with more advanced disease (i.e. node positive and/or solid organ metastases) cytotoxic chemotherapy remains the standard of care for rst-line systemic treatment, while immune checkpoint blockade antibodies have been recently approved as rst-line treatment in cisplatin-ineligible patients [3].
There is increasing data supporting the role of 2-deoxy-2-( 18 F) uoro-D-glucose ([ 18 F]-FDG) positron emission tomography/computed tomography (PET/CT) for the evaluation of metastatic urothelial carcinoma [4][5][6][7][8][9][10][11]. For detection of pelvic lymph node (LN) involvement this modality has a sensitivity and speci city of 50-55% and 92%, respectively [12], though there is high inter-center variability due to heterogeneous interpretation criteria. Machine-learning (ML) techniques are able to use data gathered from PET images to extract objective imaging biomarkers that are otherwise invisible to the naked eye [13]. ML de ned diagnostic criteria for pelvic LN staging may improve reproducibility, provided that such criteria do not lower diagnostic performance for the detection of pelvic LN disease spread.
The aim of this study was to compare the diagnostic performance obtained with ML de ned criteria based on [ 18 F]-FDG PET/CT to an expert consensus review for initial pelvic LN staging of MIBC.

Materials And Methods
Selection of patients A total of 173 consecutive patients diagnosed with MIBC between October 2010 and May 2017 (≥pT2 proven by TURBT) were retrospectively included. Inclusion criteria were: ≥18 years of age; history of RC with extended pelvic lymph node dissection (ePLND); available [ 18 F]-FDG PET/CT completed for initial staging ≤90 days prior to surgery. Exclusion criteria were history of previous pelvic or genitourinary cancer and prior neoadjuvant chemotherapy (NAC). This study was approved by an institutional ethical committee (IRB00012437). All the patients included were informed and did not object to participate in this study.
[ 18 F]-FDG PET/CT protocol A 6-h fasting and blood glucose level of <12 mM/L were required before intravenous injection of [ 18 F]-FDG (4 MBq/kg). Furosemide (20 mg) IV was administered immediately after [ 18 F]-FDG infusion in patients without contraindications such as urinary incontinence. Sixty minutes post-injection and after the patient voided, PET and CT images were acquired in supine position, with the arms above the head, from vertex to mid-thigh, on integrated PET/CT systems (Discovery 610 Elite and Discovery STE, GE Healthcare) with iterative reconstruction using CTbased attenuation correction. No contrast agent was used for CT.

Clinical staging and surgical technique
In addition to [ 18 F]-FDG PET/CT, preoperative clinical staging included bimanual examination, cystoscopy, and contrast-enhanced CT scan. All patients underwent standardized open radical cystectomy and ePLND. LNs were dissected along the ilio-obturator region and from the internal and external iliac arteries up to the common iliac vessels and the aortic bifurcation, according to the template de ned by the European Urology Association [3].

Pathology analysis
Pathological review was centralized and performed by a specialized uropathologist (C.R.). Each LN was isolated from the fresh sample and processed according to accepted standards [14]. LNs were xed overnight in 4% formalin and embedded in para n. Samples were sectioned into 3-mm thick tissue slices and stained with hematoxylin, eosin and saffron.
Step sectioning to 2.5µm-thick slices were examined for evidence of cancer involvement. The number of LNs positive for metastatic disease and the total number of LNs examined microscopically were reported.

Image analysis
All PET/CT images were independently reread by two senior nuclear medicine physicians (A.G and O.D.) who were blinded to the pathological results. They subjectively assessed for metastatic involvement of the LNs in each pelvic area based on both metabolic and morphological criteria. Additional metabolic and morphological features of the bladder tumor, LNs and urine were also described (Supplementary Material 1). In the case of disagreement regarding LN involvement between the two readers, consensus was reached through joint review.

Statistical analysis
Patients were randomly assigned to the training (n=115) or the validation (n=58) set. Statistical analyses were performed using R version 4.0.4 (The R Foundation for Statistical Computing), and MedCalc® version 12.5.0.0 (Medcalc Software, Ostend, Belgium). The random forest signature was generated with the "RandomForest" package with a forest of 500 decision trees, and explained with the "randomForestExplainer" package in R. Twenty-one [ 18 F]-FDG PET and CT features were initially assessed (Supplementary Material 1).
Areas under the receiver operating characteristics curves (AUCs) for each parameter to predict pathologic pelvic LN involvement were created. Pearson's correlation coe cients were calculated between each pair of features. Statistical signi cance was set at p=0.05 (twotailed). In the training and the validation sets, diagnostic performance (AUC) for detection of pelvic LN involvement provided by ML-based criteria was compared to the one obtained by a consensus of experts.

Patients'characteristics
There were no statistically signi cant differences in preoperative characteristics between patients in the training and validation sets ( Table   1). The mean age of the complete cohort (n=173) was 70 years old, with a standard deviation (SD) of 10 years old, and a male-to-female sex ratio of 7.2. The mean time between [ 18 F]-FDG PET/CT and RC plus ePLND was 31 ± 21 days. The mean number of LNs yielded per patient was 16 ± 8 LNs. LN metastases from bladder cancer were pathologically identi ed in 42 (24.3%) patients. Signature predicting pathological LN status Model building in the training set Several models were created utilizing up to 21 quantitative and qualitative imaging features related to the morphological and metabolic characteristics of the LNs. There was no signi cant improvement in performance by using more than 5 variables (Fig. 1), presumably due to co-linearity between variables (Fig. 2). The 5 most important variables for the diagnosis of LN involvement as ranked by the ML algorithm were: 1) the product of diameters of the largest pelvic LN; 2) the sum of the products of diameters of all pelvic LNs; 3) the SUVmax of pelvic LNs; 4) SUVmax of urine; and 5) the largest diameter of residual primary tumor (Table 2; Supplementary Material 2). The nal signature was obtained from a random forest using regression methods in 500 trees and evaluating the 5 variables mentioned above in each split to explain 63.3% of the variance with a mean of the squared residuals of 0.0092. Using these 5 variables, the signature reached an AUC of 0.79 (95%CI 0.71-0.88) for the prediction of LN status. This was not signi cantly different (p=0.24) from the performance of consensus of experts, which resulted in an AUC of 0.72 (95%CI 0.62-0.81).
Performance evaluation in the validation set In the validation set, the signature reached an AUC of 0.67 (95%CI 0.50-0.84) which was not signi cantly different (p=0.70) from the consensus of experts with an AUC of 0.62 (95%CI 0.48-0.77). The performance of each variable included in the signature, as well as optimal thresholds, are included in Table 3. An example of discordance between subjective and ML-based conclusion for LN staging is described in Fig. 3.

Signature predicting the experts' conclusion regarding LN status
As an ancillary study, we evaluated the performance of ML algorithms for the prediction of pelvic LN status using as a reference standard the consensus of the two expert radiologists. In doing so we essentially trained the ML algorithm to mimic the interpretation practices of expert radiologists. The model achieving the highest performance in the training set was selected as the signature. The machine learning algorithm ranked the importance of each variable for the prediction of LN status (Supplementary Material 3). In the validation set, performance provided by the ML-based signature provided an AUC of 1.00 (0.99-1.00) to mimic the conclusion from the consensus of experts (Supplementary Material 4).

Discussion
In this study, we created a ML-based signature to predict pelvic LN metastases using preoperative Bladder cancer is an aggressive disease, and LN involvement confers a particularly bad prognosis even when only one LN is affected [15]. In clinically localized tumors (pT2-4 pN0 cM0), 10-year cancer-speci c survival after RC without NAC or adjuvant chemotherapy may reach 77%, whereas LN involvement (pN1-3) dramatically drops it to 17% [15]. Therefore, an accurate detection of LN metastases at diagnosis has a major impact on treatment decisions and future prognosis. Current guidelines recommendations [3] are in favor of targeting micrometastases by administering NAC to all cisplatin-eligible patients who have clinically localized MIBC. However, these guidelines still suffer from poor adherence in routine clinical practice [16]. This non-compliance with recommended practice is mostly the result of the reported low bene t of NAC in terms of survival for ≤pT2N0 tumors [17], as well as the limited use of highly reliable imaging tools for nodal staging. Due to the modest ability of morphological imaging to detect LN involvement, [ 18 F]-FDG PET/CT may play a decisive role in patient management [5]. However, heterogeneous interpretation criteria among institutions hamper the reproducibility of [ 18 F]-FDG PET/CT, and thus limits its widespread use.
In the present study, an ML-based signature and an expert consensus diagnostic performance did not show statistically signi cant difference for LN staging using [ 18 F]-FDG PET/CT. These data suggest that ML-based signatures may improve the reproducibility of [ 18 F]-FDG PET/CT image interpretation without loss of performance. Moreover, the developed ML-based signature could increase accuracy in non-expert centers it can complement subjective human interpretation.
This signature relied on ve selected top variables measured from pelvic LNs (the SUVmax of pelvic LNs, the product of diameters of the largest pelvic LN, and the sum of the products of diameters of all pelvic LNs), the primary bladder tumor (the largest diameter of residual primary tumor after TURBT), and the urine (SUVmax of urine). Metabolic (SUVmax) and morphological (product of diameters of the largest LN) characteristics of pelvic LNs have already been reported in several studies as objective criteria for LN staging with [ 18 F]-FDG PET/CT [6,18]. To our knowledge, the sum of the products of diameters of pelvic LNs has not previously been reported in the literature as a criterion of interest. It nevertheless seems logical that this variable related to the pelvic LN tumor volume was included in the top 5 features. Interestingly, the two remaining top 5 features are not directly measured from pelvic LN. First, the largest diameter of the primary bladder tumor was a selected predictor, and patients with primary tumor larger than 49-57 mm had higher risk of pelvic LN involvement. This is in line with previously published studies that reported the primary bladder tumor size to be associated with the risk of LN involvement [19] and distant metastases [20]. Of note, the largest diameter of primary tumor was measured when visible on PET and/or non-contrast enhanced CT from [ 18 F]-FDG PET/CT performed after TURBT. Thus, it may slightly differ from the total tumor size before resection. Finally, the radioactivity of urine (SUVmax of urine) was unexpectedly one of the top ve features. It does not have an obvious pathophysiological explanation but might be explained by the PET protocol itself. This latter included diuretic administration for all patients with no urine incontinence (45% of patients in this study), to lower urine radioactivity due to physiological urinary excretion of [ 18 F]-FDG in order to optimize detection of primary tumor and pelvic LN involvement. Thus, patients with continence problems did not receive diuretic, and as a consequence had higher [ 18 F]-FDG concentration in urine. One hypothesis is that patients with advanced disease might be more often subject to continence disorders, did not receive diuretic before [ 18 F]-FDG PET/CT, and thus might have had a higher SUVmax in urine.
Additionally, we developed another ML-based signature to mimic the conclusion of a consensus of two expert radiologists. Interestingly, the top 5 selected variables differed from the ones used to predict pathological pelvic LN involvement. Metabolic characteristics of pelvic LNs (pelvic LN [ 18 F]-FDG uptake higher than background, and their SUVmax) was deemed most important by the experts, followed by the size of the tumor (small diameter of residual primary tumor (mm)) and of the pelvic LN (small diameter of the largest pelvic LN, and the product of diameters of the largest pelvic LN). Interestingly, metabolic characteristics of pelvic LNs were by far the most important factors to state the conclusion of experts.
This study has several limitations. First, the retrospective design of the study and the moderate size of the sample may hamper the statistical power of the comparison between the ML signature and the experts consensus. However, this is the largest cohort of MIBC patients studied with [ 18 F]-FDG PET/CT who did not receive NAC, which allowed accurate matching of LNs between imaging and to pathological analysis. Second, emphasis should be given to the imbalance between positive (pN+) and negative (pN-) subjects in both datasets, related to the small number of events (n=42). This has led us to under t the ML model by using a limited subset of features. The prevalence of pathological pelvic LN involvement in this sample (24%) was in the same range as those reported in the literature [15]. Finally, the size of the tumor has been described as one of the most relevant factors for predicting the LN positivity. However, the tumor size measurement included only the remaining tumor visible on PET/CT performed after TURBT, rather than the whole tumor. This probably had a limited impact on the results since the exophytic volume of the tumor is not related with its degree of invasion. Additionally, measuring only the remaining tumor accurately re ects real clinical practice wherein the tumor is initially resected partially or completely.

Conclusions
In this study, we developed a ML-based signature for the diagnostic of pelvic LN metastasis with [ 18 F]-FDG PET/CT imaging in patients with MIBC. This signature relied on ve key metabolic and morphological features related to pelvic LNs, primary tumor and urine. In an independent validation set, the diagnostic performance of the signature was found to be non-inferior to a consensus review by experts. Such an objective ML-based interpretation tool provides good accuracy and may enhance reproducibility of [ 18 F]-FDG PET/CT interpretation for pelvic LN staging in MIBC.

Declarations
Funding: Not applicable Con icts of interest/Competing interests: The authors do not have any con ict of interest to disclose Availability of data and material: Available by request to the corresponding author Code availability: Not applicable Ethics approval: This retrospective study was approved by an institutional review board (IRB00012437).
Consent to participate/Consent for publication: Since this study was retrospective, each participant received written information about the study and publications and was offered the opportunity to refuse to participate at any time.  Table 2 legend. A variable of importance has lower mean minimal depth, higher number of nodes, lower mean squared error increase, higher node purity increase, higher number of trees, and higher times at root Table 3. Performance of the signature and of its feature for the prediction of the pathological lymph node status Table 3 legend. Our primary endpoint was to evaluate the performance of the signature using AUC. The interest of imaging biomarkers is that they are continuous values, hence they class patients along a continuum of probability to be positive on pathology. The aim of this study was not to determine a speci c decision threshold since in practice, some clinicians may emphasize the importance of high sensitivity vs. high speci city. We compared the optimal thresholds to guide clinical decisions. Using optimal threshold in both datasets, we see that while some values are"clearly" positive" and "clearly negative", there is a "grey area" for most biomarkers such as LN SUVmax (1. * values higher than "optimal thresholds" were associated with higher risk of pathological LN involvement. Correlogram (Pearson coe cient). Paired correlation with p-value > 0.05 are hidden. [18F]-FDG PET/CT images from a patient who was falsely classi ed cN+ based on subjective human interpretation, whereas he was classi ed cN0 with machine learning-based pelvic LN staging. Primary tumor uptake was SUVmax 7.6 with 116 mm of largest axial diameter, and one pelvic lymph node was visible in the left external iliac area (red arrow head) with moderate uptake of SUVmax 2.0 and 10 mm x 7 mm axial diameters. SUVmax of urine was 5.8.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.