We conducted a retrospective pilot study, analyzing 18F-FDG PET/CT images from patients with metastatic melanoma who were treated with ICI (anti-CTLA-4 or anti-PD1) at the Institute of Oncology Ljubljana (OIL), Slovenia (January 2016-January 2019) or at the University of Wisconsin Carbone Cancer Centre (UWCCC), Madison, WI, USA (June 2012-June 2019). All available 18F-FDG PET/CT data acquired before and during ICI treatment was collected for review. We determined the date of clinical irAE detection via chart review. If the irAE grade was not explicitly documented in the chart, when possible, irAE grading was assigned retrospectively based on the available clinical course documentation following Common Terminology Criteria for Adverse Events (CTCAE, v.5.0) [22]. Clinical and demographic data were collected from both hospital databases. Clinical and imaging data were anonymized and stored in a secure LabKey database server [23].
The study was approved by the Institutional Review Board Committee of both Institutions (Approval number: 2016-0418 in Madison, USA; ERIDKE-0005/2020 in Ljubljana, Slovenia) and was conducted following the ethical standards defined by the Declaration of Helsinki. At OIL, patients have signed informed consent for treatment and consent allowing the usage of their data for scientific purposes. At UWCCC, the study was approved with a waiver of informed consent.
PET Acquisition
PET scans were primarily performed for immunotherapy treatment response evaluation in melanoma patients. Images were acquired on five PET/CT scanners: GE Discovery 710, GE Discovery STE, GE Discovery IQ, GE Discovery MI (General Electric, Waukesha, WI) and mCT (Siemens, Knoxville, TN). In all cases, the imaging protocol required patients to fast for 6 hours prior to injection of the radiotracer and have a blood glucose level below 200 mg/dl (UWCCC) or 6-10 mmol/l (OIL) at the time of the scan. Patients were required to hold diabetic medication for 6 hours prior to radiotracer injection. On the GE Discovery IQ, patients were injected with 259±52 MBq of 18F-FDG, while on other scanners, patients were injected with a weight-based dose of 5 (OIL) to 5.2 MBq per kilogram and a minimum 370 MBq (UWCCC) of 18F-FDG. Scans were acquired 60±10 minutes post-injection. For UWCCC patients, the CT used in segmentation was a low-dose CT acquired for attenuation correction. At OIL, CT that meets RECIST analysis needs was acquired according to adjusted protocol including SAFIR reconstruction to minimize dose. Following reconstruction, images were normalized by patient weight and injected dose to compute Standardized Uptake Values (SUV). If available, TOF reconstruction was used.
18F-FDG PET/CT Image Analysis
To quantify organ 18F-FDG uptake, a CNN was trained to segment the thyroid, lungs, and bowel from the low-dose CT component of patients’ PET/CT imaging data. A CNN was chosen for segmentation for the ability to segment irregular and variable structures and for the ability to successfully segment multiple target structures with very different sizes (e.g. thyroid versus lung) [21]. The network architecture used was DeepMedic, a 3-D, patch-based CNN with multi-resolution pathways [24]. The loss function used was Dice similarity coefficient (DSC). The optimizer was RMSprop [25]. Sixty manual contours of the bowel, lung, and thyroid were produced using a public dataset of N=20 patients from the VISCERAL.eu Anatomy3 benchmark [26], and an additional private institutional dataset of N=40 patients by an experienced graduate student using 3D Slicer [27]. Labelled data were split 80%/20% (n=48/n=12) for CNN training/validation. Images were resampled to a cubic 2mm grid and normalized to have a mean of 0 and variance of 1 within the patient. Data augmentation via histogram shifting, histogram scaling, and random rotation was used to increase the effective training dataset size. The CNN was trained using a workstation with one NVIDIA Titan Xp GPU with 12 GB of memory.
The trained CNN was used to perform inference on the CT component from the 18F-FDG PET scans and produce contours of the thyroid, lung, and bowel. The contours were then applied to the PET image to quantify 18F-FDG uptake within the three target organs. To determine the ability of PET to detect irAE, percentiles of the distribution of SUV from within each target organ (SUVX%) were extracted. Percentiles of the distribution of organ SUV were pursued as potential biomarkers of irAE due to their improved reliability as compared to SUVmax [28].
Receiver operating characteristic (ROC) analysis was performed to determine the value of organ SUV percentiles as potential quanitative imaging biomarkers of irAE development. This was done by comparing organ SUV percentiles with the clinical irAE status as determined by chart review. For patients who had multiple 18F-FDG PET/CT exams during ICI treatment, the maximum organ SUV percentile value was used as a predictor of irAE. The optimal organ SUV percentile (SUVOPT%) was defined to be the percentile that maximized the area under the ROC curve (AUROC) for predicting irAE status (Eq 1).
(1)
where SUVX% are the set of percentiles of the distribution of organ SUV. SUVOPT% was measured on all available 18F-FDG PET scans and tracked longitudinally to assess if changes in target organ 18F-FDG uptake may precede clinical irAE identification.
Target organ 18F-FDG uptake was also assessed in patients who did not experience irAE. This was done to establish normal ranges for organ 18F-FDG SUVOPT% values against which SUVOPT% values from patients with irAE can be compared. The 95% confidence interval for SUVOPT% was determined for each target organ using the baseline PET images of N=15 patients who did not experience any irAE (Eq 2).
CI95=[μ-1.96σ,μ+1.96σ] (2)
Where μ and σ are the mean and standard deviation of baseline SUVOPT% values of patients who did not experience irAE, respectively.
Statistical Analysis
For each target organ (bowel, lung, thyroid), patients were divided into two groups: patients who experienced irAE and patients who did not experience irAE. Differences in SUV metric by irAE status were assessed with Wilcoxon rank-sum tests. P<0.05 was considered to be statistically significant. ROC analysis was performed to determine the ability of SUV metrics to detect irAE. Optimal cutoff values for detecting irAE were assigned to maximize the Youden’s index (sensitivity+specificity-1). Image analysis and statistical testing was done using MATLAB R2020b (The MathWorks, Inc., Natick, MA, USA).