Radiomics Methods to Differentiate Metastasis and Primary Lung Cancer of Breast Cancer Patients in PET/CT

Purpose ： Positron emission tomography (PET) with integrated computed tomography (PET/CT) is a whole - body imaging method providing information the entire body. When it was used in staging breast cancer patients, quite a few patients were found to have a second primary lung cancer(PLC), which was has few distinguishing features from breast cancer metastasis(MBC). Therefore, based on CT, LDCT and PET images, combined with pathological features, we established radiomics models to distinguish between MBC and PLC. Methods ： We retrospectively collected CT, LDCT, and PET images, and pathology features of 100 breast cancer patients, including 60 metastases of breast cancer(MBC) and 40 primary lung cancers(PLC). The two radiologists manually drew a region of interest around the whole visible tumor in consensus. Python 3.8 and Pyradiomics toolkit are used to extract features from CT, LDCT, and PET. The linear discriminant analysis (LDA) classifier was used to build the radiomics model. The receiver operating characteristic (ROC) curve and area under the ROC curve (AUC) were used to evaluate the classification performance. Results ： Total 12, 13, and 9 features were selected from the CT, LDCT, and PET respectively. The model based on the LDCT and PET obtained the same highest AUC (0.9479). The combination with CT and pathology features showed a highest AUC of 0.9583 with a sensitivity of 1.000 and a specificity of 0.8333. Conclusion ： Overall, the results are encouraging that radiomics models based on CT, LDCT and PET can differentiate between MBC and PLC pathological features could significantly improve the AUC and ACC of CT model.


Introduction
Breast cancer, as the most common malignant tumor in women, greatly harms women's health.
Treatment of breast cancer has progressed extraordinarily with the introduction of more and more personalized therapies [1] . Although major advances have been made in diagnosis and treatment of primary breast cancer, metastasis, typically to bone, lung, liver, and brain, is still the leading reason for mortality of cancer patients [2] . The lung is often the first site of metastasis for BC, especially among the clinically aggressive basal subtype of breast cancer [3] . A increasing opportunity for the development of second caners since the better survival is a matter of course.
By 10 years after diagnosis, approximately 10% of breast cancer survivors have developed a subsequent malignancy [4] . Lung cancer accounts for approximately 5% of second primary cancers among breast cancer survivors. Therefore, the identification of metastatic breast cancer patients(MBC) and primary lung cancer (PLC) is crucial in clinic practice, because MBC is the malignant lung tumor in its advanced stage [5] . According to several studies, there were significant differences between the PLC and MBC with respect to overall survival, disease-free survival or recurrence control [6,] . The presentation of lung lesions in patients with breast cancer warrants a comprehensive evaluation of primary lung cancer and metastatic breast cancers, because diagnosis and resection at an early stage is associated with improved survival [Error! Bookmark not defined.] . Therefore, there is a need for imaging techniques that can predict PLC and MBC of breast cancer patients.
Fluorine-18-fluorodeoxyglucose ( 18 F-FDG) positron emission tomography (PET) with integrated computed tomography (PET/CT) is not only combining functional and anatomical imaging within a single examination in a hybrid technique, but also providing information of the entire body in one imaging method. It has became a very sensitive and accurate technique for tumor diagnosis and staging. LDCT has emerged as a promising mass screening method for the early diagnosis of lung neoplasms [8,9] since its simplicity and its high sensitivity [10,11] . Thus, PET/CT and CT scans are the preferred modality for lung nodule no matter whether it's a metastasis or a primary cancer of the breast cancer patient. And current guidelines on breast cancer recommend that the staging evaluation of women who present with breast cancer includes PET/CT and LDCT [12,13] .
However, in the actual diagnosis process, morphological features of CT and PET/CT are not specific enough to differentiate between PLC and MBC, there are certain limitations in the diagnosis. Several studies have reported that radiomics may help distinguish benign and malignant pulmonary nodules [Error! Bookmark not defined., 14] or detecting lung metastases in patients with cancers [15,16] based on CT and PET. And there were researches on the ability of PET and CT radiomics features to differentiate between primary and metastatic lung lesions [17,18] . However, the radiomics approach has not yet been tested in the differential diagnosis between PLC and MBC. Therefore, the main aims of our study is to evaluate the ability of PET and CT-based radiomics to predict the probability that a breast cancer patient with lung lesions has a primary lung cancer or a metastasis.

Patients
In this retrospective single-center investigation, we collected 158 breast cancer or breast cancer history patients with lung lesions who were examined by FDG PET/CT during the 14 days before Tumors were considered HER2 positive if they had a score of 3+ at immunohistochemical examination. If HER2 status was equivocal (score, 2+) at immunohistochemical examination, fluorescence in situ hybridization analysis was performed to confirm the diagnosis [19] .
FDGPET/CT and low-dose CT All patients were fasted for 6h and blood glucose levels were required to be <10mmol/l before scanning by using an integrated PET/CT scanner (Discovery ST: GE Medical systems, Milwaukee, WI, USA). All PET images were corrected for attenuation using the acquired CT data. The tube voltage is 120kV, the tube current is 200mAs, and the slice thickness is 3.75mm. PET collection uses 3D mode PET scanning, 2.5min/bed, and generally scans 6-8 beds. Image recombination reconstructs images using the ordered subset maximum expectation method. Low-dose CT(LDCT) was performed directly after receiving PET/CT scan with parameters of 120kv tube voltage, 50mAs tube current, and 1.00mm slice thickness. All 18 F-FDG PET/CT and LDCT images were reviewed by two experienced senior physicians who were aware of the patient's clinical history and laboratory results.

The volume of interest (VOI)
Data management was performed by two independent radiologists (radiologists A and B) with 6 and 12 years of experience in PET/CT diagnosis. Both radiologists had access to all PET/CT images to assist in locating the lesions and verifying the lesion boundaries. The two radiologists manually drew a region of interest around the whole visible tumor in consensus by ITK-SNAP 3.8.0 (as shown in figure 1). If the cancer was multifocal or multicentric, the region of interest was measured at the tumor with the largest size. The Intra-class correlation coeffificients (ICCs) were adopted to evaluate the intra-and inter-observer agreement of the extracted features. An ICC greater than 0.6 are considered to meet the criteria for feature reproducibility, and greater than 0.8 are considered tobe almost perfect agreement. Radiologist A drew the ROIs twice at two-month intervals, and radiologist B drew the ROIs independently according to the same tumor boundary definition. Good intra-and inter-observer agreement was achieved with an ICC range of 0.691-0.995. Feature Extraction Python 3.8 and Pyradiomics toolkit [20] are used to extract features from CT, LDCT, and PET. For each modality, a total of 851 features, including first-order, shape, texture, and wavelet features, were extracted. The first order features show the image intensity knowledge, the shape features reflect the morphological characteristics of lesions, and the texture features describe the texture changes inside and outside the lesions. Note that, the wavelet features include all the above features except shape features, which were extracted from the wavelet filtered images of the original image. The clinical features were ER, PR, HER-2, and Ki67. All the image features are shown in Table 1.

Feature Selection
To avoid overfitting, the feature selection method from coarse to fine was adopted. First, the highly correlated features were grouped by the Spearman correlation test (|ρ|≥0.95), and a representative feature was selected from each group with the highest AUC value. Second, the least absolute shrinkage and selection operator algorithm (LASSO) [21] was used to select the most discriminative features by the largest AUC value in 10 times of 10-fold cross-validation in training set with the LASSO-Logistic regression model.

Radiomics model
The linear discriminant analysis (LDA) classifier [22] was used to build the radiomics model. The LDA classifier was constructed by the key features from the training set and evaluated in the independent testing set. To evaluate the performance of models, the AUC, accuracy, sensitivity, and specificity had been calculated. We used features from each modality (CT, LDCT, PET, and clinical information) to build single-modality models. At the same time, we also tried to build models based on fusions of each imaging modality with clinical features, which were CT+Clinical model, LDCT+Clinical model, and PET+Clinical model.

Statistical analysis
The SPSS software, version 21 (SPSS, Chicago, IL, USA), was used for the statistical analyses. The normality of the variable distribution was validated by the Kolmogorov-Smirnov test. The Student's t-test or Mann-Whitney U test was used to compare the statistical differences of quantitative variables, when appropriate. The chi-square test was used to compare the statistical differences of qualitative variables. All statistical tests were two-sided and p values less than 0.05 were considered statistically significant.

Patient characteristics
The study cohort included 100 breast cancer patients, including 60 metastases of breast cancer(MBC) and 40 primary lung cancers(PLC). The mean age of MBC was 53.7±7.2 years, range from 32-77. The mean age of PLC was 60.5±5.4 years, range from 39-82.

Pathology Characteristics
The pathology of breast cancer in 60 MBC patients were 57 cases of invasive ductal carcinoma, one metaplastic squamous cell carcinoma, one mucinous carcinoma, and one invasive  Table 1.

Selected Features
A total of 851 fetaures including first-order, shape, texture, and wavelet features, were extracted from CT, LDCT, and PET as listed in Table2. Regarding the features of the optimal radiomics signature model, 34 features were selected, including 1 first-order features and 33 wavelet features shown in Table 3. Through feature selection, 12, 13, and 9 features were selected from the CT, LDCT, and PET respectively. Except for 1 first-order feature selected for LDCT, all the other feature were not selected in all models. The mean,median, and variance values of the 34 selected features are also listed in Table 4. Furthermore, 10 of 12 features from CT, 10 of 13 features from LDCT, and all 9 features from PET were significantly different between metastasis of breast cancer and primary lung cancer (P < 0.05).
The accuracy, sensitivity, specificity, and ROC analysis results from 7 feature sets are shown and compared in figure 2 and Table 5. The model based on the LDCT and PET obtained the same highest AUC (0.9479) among the four single-sequence models. Then the 3 imaging features combined with pathology features separately were modeled as two-sequence models. The combination with CT and clinical features showed a highest AUC of 0.9583 with a sensitivity of 1.000 and a specificity of 0.8333. The combination with LDCT and pathology features also showed a higher AUC of 0.9375, a sensitivity of 0.75, and a specificity of 1.0 (shown in figure 3).

Discussion
This study attempted to noninvasively distinguish lung metastasis and primary lung cancer of bresat cancer patients using a CT, LDCT, and PET radiomics features model with pathology features of primary breast cancer, which demonstrated a promising diagnostic performance in an independent validation set.
According to the AUCs, the models of the 3 individual radiomics features sequences all showed reasonable diagnostic efficiency, and the features extracted from LDCT and PET contributed the most to the differentiation of lung lesions. As previous research proved that LDCT has a very good ability in the identification of lung lesions [Error! Bookmark not defined., 23,24] . And in our study, LDCT has a thinner scan thickness, and its resolution is higher than that of CT and PET images, so it can display more details, and extracting an modeling better of radiomics features. 18 FDG-PET quantified by standardized uptake values (SUV), has been mostly applied in functional imaging form for clinical practices [25,26] . It is generally known to be strongly related to Glucose-transporter family (GLUT)-expression in tumors, which convey the glucose uptake into cells. There is an moderate relatedness between GLUT 1 expression and SUV values derived from 18 FDG-PET [27-] . Expression of Glut-1 is heterogeneous in different sub-types breast cancers [30-] .
Thus, the degree of 18FDG uptake seems to be more heterogeneous in comparison with many other cancers, which has been the main reason for the present investigation of the causal mechanism that determines 18 FDG uptake in breast cancer [33] . Present available literature recommends that the use of 18 F-FDG PET and PET/CT leads to significant modification of staging and treatment in diagnosed breast cancer patients [34][35][36] . Therefore, PET imaging may be applied for clinical routine use in the breast cancer initial staging. On the other hand, it can find an unexpected second primary cancer [37-] , among which lung cancer is one of the most common primary cancers of breast cancer patients [40,41] . Lung cancer, especially lung adenocarcinoma, which is the largest number of lung cancer types in our study, also has certain heterogeneity in FDG uptake and SUVmax values [42] . As another aspect, it was exemplarily shown for lung cancer SUVmax seems to be related to vessel density in tissues [43,44] . In addition, whether they are the MBC or the second PLC, they are all malignant tumors, although studies have shown that the SUVmax value of metastatic lesions in the lung is slightly higher than that of the primary cancer [45,46] . Considering that the research object in our study is breast cancer. We can see that listed in table 1 the SUVmax values of MBC and PLC are 8.7±5.6 and 7.9±6.0, respectively (p=0.568), that were not statistically significant. As shown in figure 4, the SUVmax of MBC and PLC are 3.2 and 2.8. Therefore, 18 FDG-PET is necessary for breast cancer patients, but it could not be used to distinguish between MBC and PLC along.
Not only that, due to the heterogeneity of breast cancer, the appearance of its metastases on CT is also very different. Therefore, as shown in figure 4, we can see metastases with rough edges or with a density of ground glass. These findings are similar to the imaging findings of primary lung adenocarcinoma [47] . At the same time, in our results, we did not select tumor size, shape and texture features, because these features are not helpful to our model. Therefore, when radiologists distinguish between MBC and PLC solely through CT images, it may produce some diagnostic bias.
Radiomics has the potential to uncover disease characteristics by extracting numerous parameters/features from tomographic images within a region of interest using mathematical algorithms [48] . In our research, except for one first-order feature in LDCT, the other 33 features are all wavelet features. The wavelet features obtained through wavelet transformation of the image can obtain image features that are difficult to obtain from the original image [49-] . In previous studies, radiomics analysis based on wavelet features has been widely used in the identification, prognostic analysis, and elaboration of therapeutic protocols of lung lesions [52-] , and it has a good predictive ability. Therefore in our study, the single models based on wavelet features of CT, LDCT, and PET also have high accuracy, and the AUC is 0.8854, 0.9479, and 0.9479 respectively. This is similar to the research results of other scholars(from 0.71 to 0.98), and even better than some research models [56-] .
Among the pathology features, PR and Ki67 have certain significance in identifying lung lesions .
In MBC, PR-and KI-67+ have a larger number(65.00% and 81.67%), which is consistent with the conclusion of previous scholars that PR-and Ki-67+ have high propensities to metastasis [60] (shown in figure 5). However, there is still no clear conclusion on which type or subtype of breast cancer patients are prone to develop second primary lung cancer. For example, earlier studies believed that radiotherapy after breast cancer can significantly increase the incidence of lung cancer [61,62] . It was also found in studies that ER-or triple-negative breast cancer patients are more likely to develop lung cancer particularly during the first 5 years after breast cancer diagnosis [63,64] . But in our study, although there is no statistical difference, there are more ER+ numbers in PLC patients, there were 29(72.50%) ER+ in PLC, and 34(56.67%) ER+ in MBC as listed in table 1 and shown in figure 6. The results seem to be contradictory, but due to the deviation of the research subjects, the results are not comparable. The cohort of this study was breast cancer patients with lung lesions, so it cannot be said that patients with PR+ and KI-67-are more likely to get second primary lung cancer. Therefore, the clinical, pathological and genetic characteristics that affect the occurrence of PLC need to be further studied.
When we added pathological features to CT model, we found that both AUC(0.9583) and ACC(0.9) were significantly improved, but on the contrary, when pathological features were added to the LDCT and PET models, both AUC and ACC did not increase, and even had a small range decrease.
The reason may be that pathological features add some unnecessary features to LDCT and PET whose AUC are already very high, which increases the noise of the data, thereby increasing the error of the learning algorithm. The AUC of the CT single model is lower. After adding pathological features, its diagnostic performance is significantly improved, and it is higher than the AUC of the LDCT and PET single models. Therefore, we believe that pathological information will play a certain role in the optimization of the model, but the correlation between pathology and radiomics features is not the main content of this study, and further research is needed to discuss the relationship between pathological and radiomics features of lung lesions among breast cancer patients.

Limitations
With regard to limitations of our study, firstly, there may be selection bias because the subjects only consisted of 100 patients, and a radiomics study with a larger sample size is still needed in the future. Secondly only 4 pathological features of the primary breast cancer were included in this study to distinguish between MBC and PLC. Although a better model is obtained, in future research, more clinical and pathological information needs to optimize the models. Thirdly, segmentation were not automatic or semi-automatic, but we have manually segmented VOI 3 times by two radiologists to minimize the doctor's supervisor deviation.

Conclusion
In this study, a radiomics method based on pathological features, LDCT and PET/CT data was