2.1 Study population
This retrospective study was conducted at the First Affiliated Hospital of Xi 'an Jiaotong University (NCT05826197), and the study protocol was approved by the Ethics Committee of Xi 'an Jiaotong University (IRB-SOP-AF-16).
Between November 2016 to June 2020, a total of 129 female patients with BC who underwent 18F-FDG PET/CT examinations were retrospectively studied. Inclusion criteria were: 1) BC confirmed through preoperative puncture or postoperative pathology; 2) underwent 18F-FDG PET/CT examination; and 3) available Ki67 expression. Exclusion criteria were: 1) the primary lesion was too small to detect by 18F-FDG PET/CT or occult BC patients (n = 5); 2) diffuse lesion on unilateral side or multifocal lesions in bilateral side (n = 4); 3) benign breast lesions (n = 1); and 4) neo-adjuvant chemotherapy or anti-tumor treatment performed before imaging (n = 4).
A flowchart of this process is shown in Fig. 1. A total of 114 patients were enrolled in the study. Patients were randomly assigned to the training group (79 patients) and test group (35 patients) at a ratio of 7 to 3 using a stratified sampling method to ensure the balance of positive and negative samples in both groups. The age, location (right side or left side) of the BC, and the menopausal status (premenopausal, or during and beyond menopausal) of the patient were recorded.
2.2 Immunohistochemistry
Formalin-fixed paraffin-embedded tissue samples from BC cases were used for Ki67 assessment performed by an experienced pathologist who was blinded to the PET/CT results. Ki67 levels were divided into the 0.5%, 10%, 15%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, and 100% groups according to the degree of expression, and an expression index ≥ 20% was considered positive.
2.3 PET/CT data acquisition
All examinations were performed using a 64-detector scanner (Gemini TF PET/CT, Philips, Netherlands). 18F-FDG was synthesized using a small cyclotron (GE MINItrace) and an FDG synthesis module. Radiochemical purity was > 99%. Both endotoxin and bacteriological tests were negative, which met the radiopharmaceutical requirements.
The patients fasted for more than 6 h before intravenous injection of 18F-FDG (3.7 MBq/kg). The fasting blood glucose level should be lower than 12.0 mmol/L. After resting for 60 min, the patients underwent whole-body PET/CT. The scan scope was from the top of the skull or the level of the first thoracic vertebra to the upper femur. PET collects 6–10 beds with 1.5 min/bed. CT scans (tube voltage, 120 kV; automatic milliampere second; matrix, 512×512; layer thickness, 5 mm) were performed for lesions’ location and attenuation correction of the PET image. MIP (Maximum Intensity Projection), PET, CT, and fusion images were displayed on the Extended Brilliance Workstation (EBW) workstation.
2.4 Image analysis
Manually-defined features, including tumor morphology (regular, or irregular), necrosis (with or without), and calcification (with or without), as well as the N (N0, or N1) and M (M0, or M1) stages, were determined in a double-blind manner by two experts (Hui Ding, reader 1; Cong Shen, reader 2) with more than 10 years of PET/CT interpretation experience. Any disagreement between the two radiologists was resolved by another experienced radiologist.
The longest diameter (mm) of the cancer was measured at the maximal horizontal position. The volume of interest (VOI) was automatically delineated with a 40% maximum standardized uptake value (SUVmax) as the threshold on the EBW workstation. The VOIs were reviewed and allowed manual correction when the tumor border was not satisfied. SUVmax, mean SUV (SUVmean), standard deviation (SD) of SUV, and metabolic tumor volume (MTV) were calculated, see Fig. 2.
2.5 Feature extraction
The VOIs were saved as nifty files. Radiomics feature extraction was implemented using the Philips Radiomics Tool (Philips Healthcare, China), and the core feature calculation was based on pyRadiomics (3.0.1)16. A total of 704 three-dimensional (3D) radiomics features were extracted, including original direct (n = 83), wavelet transforms (n = 552), and logarithm transforms (n = 69). Details of the feature extraction are shown in the https://pyradiomics.readthedocs.io/en/latest/features.html#. There were no missing data for clinical and radiomics features.
2.6 Feature reduction
In the training group, for all thirteen clinical features (general characteristics and manually measured features), a univariate logistic regression analysis test was applied to select features with a P value < 0.1 for the subsequent analysis. The maximum relevance minimum redundancy (mRMR, top 30 features were selected) and least absolute shrinkage and selection operator (LASSO) with the optimal 𝝺 were applied to select the most discriminative radiomics features for predicting Ki67 status. After that, the Spearman test with a threshold of 0.8 was used for the clinical features and radiomics features respectively to delete the collinear features. Then, the selected clinical features and selected radiomics features were combined, and the LASSO regression model was applied again to delete the collinear features.
2.6 Statistical analysis, model construction, and evaluation
Data were analyzed using SPSS® v. 25.0. (IBM Corp., New York, NY, USA) and python V 3.9 (URL https://www.python.org/). Continuous variables with abnormal distributions were expressed as median [25%, 75%] and were tested using the Mann–Whitney U test. Categorical variables were compared using the χ2 test or Fisher’s exact test. Statistical significance was set at p < 0.05.
Three models, including the clinical model, the radiomics model, and the combined clinical-radiomics model were constructed using a logistic regression model. Models were assessed using the receiver operating characteristic (ROC) curve for discriminating Ki67 + cases from Ki67- cases. Indexes including area under the curve (AUC), sensitivity, specificity, and accuracy (ACC) were calculated. The DeLong test was used to compare the difference between ROC curves.
The calibration curves and Hosmer-Lemeshow test were utilized to assess the agreement between predicted and actual probabilities of various models, and a P value > 0.05 mean good consistency. Nomogram was constructed to visualize the predictive model for Ki67 expression. The decision curve analysis (DCA) was used to was conducted to determine the clinical usefulness of the nomogram by quantifying the net benefits at different threshold probabilities.