Patients
Training cohort
All 156 patients who had undergone a staging PET/CT between 2011 and 2016 at Sahlgrenska University Hospital with biopsy-proven HL were retrospectively included. All were newly diagnosed and un-treated patients. Two patients were excluded due to incomplete image sets, and one patient was excluded due to failed skeletal segmentation. The final group consisted of 153 patients (Table 1). These examinations were used in the development of the AI-based method.
Test cohort
All 49 patients who had undergone staging with FDG-PET/CT between 2017 and 2018 at Sahlgrenska University Hospital with biopsy-proven HL were retrospectively included. All were newly diagnosed and un-treated patients. One patient was excluded due to a falsely reported injection time, and 48 patients were evaluated as the test cohort (Table 1). These examinations were used for the testing of the AI-based method and for inter-observer classifications.
Image acquisition
PET/CT data were obtained using an integrated PET/CT system (Siemens Biograph 64 Truepoint). The adult patients were injected with 4 MBq/kg 18F-FDG (maximum 400 MBq) and fasted for at least 6 hours before the injection of FDG. The injected amount of radioactivity for children was according to the EANM Dosage Card (Version 5.7.2016). The standard accumulation time was 60 minutes. Images were acquired with 3 minutes per bed position from the base of the skull to the mid-thigh. PET images were reconstructed with a slice thickness of 5 mm and slice spacing of 3 mm with an iterative OSEM 3D algorithm (4 iterations and 8 subsets) and matrix size of 168x168. CT-based attenuation and scatter corrections were applied. A low-dose CT scan (64-slice helical, 120 kV, 30 mAs, 512x512 matrix) was obtained covering the same part of the patient as the PET scan. The CT was reconstructed using a filtered back-projection algorithm with a slice thickness and spacing matching those of the PET scan [10].
Artificial intelligence-based classification
A convolutional neural network (CNN) was used by Lindgren et al. to segment the skeletal anatomy [11]. Based on this CNN, the bone marrow was defined by excluding the edges from each individual bone; more precisely, 7 mm was excluded from the humeri and femora, 5 mm was excluded from the vertebrae and hip bones, and 3 mm was excluded from the remaining bones.
Focal skeleton/bone marrow uptake
It was noticed from the images in the training group, that certain bones were much more likely to show diffuse BMU. To address this issue without decreasing the sensitivity in other bones, the set of segmented bones was divided into two groups that were analysed separately:
- “spine” – defined as the vertebrae, sacrum, and coccyx as well as regions in the hip bones within 50 mm from these locations, i.e., including the sacroiliac joints.
- “other bones” - defined as the humeri, scapulae, clavicles, ribs, sternum, femora, and the remaining parts of the hip bones.
For each group, the focal standardized uptake values (SUVs) were quantified using the following steps:
- Threshold computation. A threshold (THR) was computed using the mean and standard deviation (SD) of the SUV inside the bone marrow. The threshold was set to
THR = SUVmean+ 2 SD.
- Abnormal bone region. The abnormal bone region was defined in the following way:
First, only the pixels segmented as bone and where SUV > THR were considered. Next, a watershed transform was used to assign each of these pixels to a local maximum in the PET image. If this maximum was outside the bone mask, the uptake was assumed to be leaking into the bone from other tissues and was removed. Finally, uptake regions smaller than 0.1 mL were removed.
- Abnormal bone SUV quantification. The mean squared abnormal uptake (MSAU) was first calculated as
MSAU = mean of (SUV - THR)2 over the abnormal bone region.
Then, the total amount of abnormal uptake was quantified using the total squared abnormal uptake (TSAU)
TSAU = MSAU × (volume of the abnormal bone region).
This calculation leads to two TSAU values; one for the “spine” and one for the “other bones”. As the TSAU value can be nonzero even for patients without focal uptake, cut-off values were tuned using the training cohort. The AI method was adjusted in the training group to have a positive predictive value of 65% and a negative predictive value of 98%. For the “spine”, a cut-off of 0.5 was used, and for the “other bones”, a cut-off of 3.0 was used. If one of the TSAU values was higher than the corresponding cut-off, the patient was considered to have focal uptake.
Diffuse bone marrow uptake
The SUVmedian in the vertebral bone marrow was automatically computed and compared to the median uptake in the liver. The latter was also segmented using a CNN according to [10]. If the ratio between the median “spine” BMU and liver uptake was greater than 1.0, the patient was considered to have diffuse BMU [1].
Quality control was performed in the test patients regarding the automated AI calculations of diffuse BMU. An experienced technologist manually placed the region of interest (ROI) in the marrow area of lumbar vertebrae L3/L4 and at the upper homogenous right part of the liver (excluding the edges). These regions were chosen according to the study by Pedersen et al. [1].
Image interpretations
Training
The original interpretation of the PET/CT examinations was performed by a nuclear medicine physician and a radiologist, who wrote the final report sent to the referring department. A trained technologist extracted information regarding skeleton and/or bone marrow involvement from these PET/CT reports and from the digital medical records. All the cases with focal or suspicious focal uptake in skeletal and/or bone marrow were reviewed again by a nuclear medicine specialist.
Test
Ten nuclear medicine physicians with 2-12 years of experience in PET/CT working in three different hospitals (two in Sweden (Malmö/Lund and Gothenburg) and one in India (Chandigarh)) were invited to participate. They separately classified the 48 FDG-PET/CT images regarding diffuse uptake in bone marrow and focal uptake in skeletal/bone marrow in the following four categories [1].
- Low diffuse bone marrow uptake and no focal lesion(s)
- Low diffuse bone marrow uptake and focal lesion(s)
- High diffuse bone marrow uptake and no focal lesion(s)
- High diffuse bone marrow uptake and focal lesion(s)
The cases were presented in a different computer-generated randomized order to each physician. Information regarding sex, age, and investigations involving untreated staging HL patients was provided. The physicians were instructed to classify the cases as they normally do in the clinical setting. The review process was performed using RECOMIA software (recomia.org), and every case was presented with CT images, PET images, fused PET/CT images, and MIP images. The interpreter was also able to shift between sagittal, coronal and transverse planes. The PET images could be displayed in different colours with the images scaled to an upper SUV threshold of 5, and the latter could also be changed. The CT images could be shifted to the skeleton window.
The study was approved by the ethics committee at Gothenburg University, and the need for written informed consent was waived (#2019-01274). We certify that the study was performed in accordance with the ethical standards laid down in the 1964 Declaration of Helsinki and its later amendments.
Statistical analyses
The percentage agreement (PA) and Kappa were used in the inter-observer comparisons between physicians for the classifications of both focal skeletal/BMU and diffuse BMU. Kappa takes into account chance agreement, and some suggested interpretation guidelines are as follows; values < 0 indicate no agreement, values between 0 and 0.20 indicate slight agreement, values between 0.21 and 0.40 indicate fair agreement, values between 0.41 and 0.60 indicate moderate agreement, values between 0.61 and 0.80 indicate substantial agreement, and values between 0.81 and 1 indicate almost perfect agreement [12].