The Characteristics Comparison of Breast Cancer Patients in Southern China and America: A Multicentre Study in China Versus SEER Database


 Background and Objective: The morbidity and mortality of breast cancer are increasing in recent years, which had become the second main cause of cancer death in women. However, there are different characteristics of breast cancer in developing countries and developed countries. What’s more, it has not reported the comparison of breast cancer between southern China and the United States. We intend to compare the age, different stages and grades of tumor and treatment methods, in order to study the factors which influence the survival and prognosis of breast cancer patients. Methods: To study the two groups which have been diagnosed with breast cancer in southern China from 2001 to 2016 and SEER database from 1975 to 2016. To register, collect and analyze the clinicopathological features and treatment information. To follow up the patients who have been diagnosed before 2016. Kaplan-Meier method was used to evaluate disease-free survival (DFS) and overall survival (OS). Results: Young breast cancer patients accounted for 19.8% and 6.14% respectively in southern China and SEER cohort. The early diagnostic rate of breast cancer is high in southern China, but still lower than SEER cohort. Our study found that there are significant differences in tumor size and positive lymph node status between southern China and SEER cohort (P=0.000), which notably affect the OS of breast cancer patients (P=0.018 and P=0.000). Furthermore, KI-67 is also an important prognostic factor of breast cancer patients in southern China, which also affect the OS of patients (P=0.034). In treatment, there are also significant differences between the two regions. In southern China, there are 4.91% of breast cancer patients performed breast conserving surgery and 95.09% patients performed mastectomy. But in SEER cohort, there are only 4.91% patients preformed breast surgery. Conclusions: The age, tumor size, positive-node and KI-67 may cause the difference of morbidity and mortality of breast cancer patients in southern China and SEER cohort. Overall, the prognosis of breast cancer patients in SEER cohort is better than southern China.

that, the mortality of BC in urban China was higher than that in rural areas, the average annual change of urban women was decrease, while that of rural women was increase [9]. What's more, Wu Z et al found that the participation rate of breast cancer screening among women in Eastern China was higher than that in Western China [10]. Wang X et al suggested that the risk factors of BC in northern and eastern China are associated with body size, especially in premenopausal women [11]. Then, a lot of research suggests that there is a large difference of BC between China and other regions, which may due to the difference of clinicopathological features, the details as follows. Jarzab M et al considered that the prognosis of lumen type G1 tumor is good, while G3 tumor is poor in BC of China [12]. Meanwhile, Zhang X et al reported that, lymph node metastasis seriously affects the prognosis of BC patients. In addition, the expression of KI-67 and prognosis were closely related to pTNM stage and PR expression [13]. It is indicated that Ki-67 positive can lead to higher histological grade of BC, Wang J et al showed that it is bene cial for Ki-67 expression < 40%, which can signi cantly increase DFS in BC patients [14]. However, HER2 overexpression can lead to invasive breast cancer, which overexpression was negatively correlated with the expression of PR and ER. Wang X et al found that the frequency of BRCA1 mutation in triple negative BC patients is higher than that in non-triple negative BC patients in China [15].
In contrast, there are about 12% of women in the United States are diagnosed with BC in their lifetime, nevertheless, it is estimated that there are about 3.1 million BC survivors each year [16,17]. With the development of treatment strategies, the mortality of BC has been decreased in the United States, which the 5-year survival rate was about 90% after treating [18]. A large number of studies have shown that, BC also has its own characteristics in the United States. DeSantis CE et al reported that BC patients in the United States from 2004 to 2014, young women have higher invasive and speci c genomic characteristics, meanwhile, the incidence of HR positive (ER positive or PR positive) breast cancer increased, while the morbidity of HR negative tumors decreased [19].
There are different researchers in the world comparing BC in China and other regions, and nding that there exist some differences between them. Sung H et al suggested that with the widespread assimilation of western lifestyles, the incidence of BC in China is narrowing gradually [20]. Wu AH et al reported that the incidence of BC in Filipino people is signi cantly higher than that in Japan and China [21]. Zhang G et al found that compared with the white, the expression of TP53 and AKT1 is higher in Chinese, which may be a potential factor affecting the incidence of BC [22]. Yang SY et al found that the morbidity of BC is different between China and other Asian populations, and the mutation frequency of BRCA2 was much higher than that of BRCA1, by comparison, BRCA1 mutations are more common than BRCA2 mutations in Caucasian populations [23]. Chen L et al researched the Asia and African Americans, discovering that there are more than 50% of BC cases in Asia were lumA subtype, and the basal-like subtype only accounts for 5%, however, in African-American populations, this subtype accounts for more than 30% [24].
Additionally, there are different characteristics of BC in different regions of China. A previous study showed that the incidence of BC is the highest in eastern China, followed by central China, and then in western China [25]. Among these, BRCA1 mutations are more frequent than BRCA2 mutations in patients with familial breast cancer in Henan, central China[26]. Co M et al reported that the age of onset of BC in mainland China is younger than in Hong Kong [27]. Moreover, it has been reported that the incidence of BC in Taiwan, China (similar to that in Hong Kong), is even higher than that in the United States in recent years [28]. However, the comparison of BC patients between southern China and the United States has not been reported. This study aims to investigate the differences of BC patients between southern China and the population-based Surveillance, Epidemiology, and End Results (SEER) cohort.
Researches showed that the age of diagnosis, stage and grade of tumor and treatment methods may be the prognostic factors of BC [29][30][31] . Based on these, our study intends to research the age, stage and grade of tumor, ER, PR, HER2, KI-67 and treatment methods, in order to report and analyze the age distribution, clinical characteristics, treatment and prognosis of BC patients in southern China and SEER database, to compare and analyze the two groups.

Patients and ethics
Retrospective analysis and compare the patients who have been diagnosed with primary breast cancer in southern China (2001-2016) and SEER database . Overall, there was a total of 525 breast cancer patients were diagnosed in southern China, among them, 15 patients were excluded from this study due to lack of age information. In addition, 129 patients were removed, which without tumor stage, ER, PR, HER2, KI-67 and treatment information. Additionally, there are about 95 patients were lost. Finally, a total of 286 patients were included in the study (Fig. 1). The study was approved by Institutional Review Board of Yunnan Cancer Hospital (Number: KY201944), Cancer Hospital A liated to Guangxi Medical University (Number: LW2020061) and Foshan rst people's Hospital (Number: L[2020] 13). Informed consent was obtained from all individual participants included in the study. All procedures implemented in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee, and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards. The SEER cohort was derived from the SEER database (November 2018 submission) by using SEER*Stat software provided by the National Cancer Institute (NCI). There were 65535 breast cancer patients, among them, 26277 cases were lost. Except for without complete information (age, stage and grade of tumor, ER, PR, HER2, KI-67 and treatment) patients, which was about 38662 cases, there were 596 patients included in this study (Fig. 1).

Clinical data collection
A retrospective review of medical records and pathology reports was conducted. Staging was performed according to the American Joint Committee on Cancer (AJCC) guidelines [32]. The age of the patients was classi ed to young adult group (<40), middle aged group  and aged+ group (>70), and then, calculated the median age of each age group (35,48,75), nally, statistical analysis was carried out respectively. A cutoff of 14% for KI-67 was used, which was recommended by 2011 St Gallen consensus panel [33], and then, we divided the group of KI-67≥14% into two subgroups, according to the median (51.7% in southern China). Patients in southern China were told to have an examination and treatment according to the guidelines of the breast cancer center and were followed up by telephone, and collect the information about survival and treatment, including date of progression metastasis, date of relapse and date cause of death.

Statistical methods
The IBM SPSS Statistics (Version 21.0; IBM Corp., New York, USA) and GraphPad Prism (Version 6.0; GraphPad software, Inc., LaJolla, CA, USA) were used for statistical analysis. Disease-free survival (DFS) was measured from the beginning of the operation to the rst recurrence / metastasis of the tumor or the death of the subject for any reason (the last follow-up time was the patients who lost the follow-up; the patients who were still alive at the end of the study were the end of the follow-up). Overall Survival (OS) was measured from the beginning of operation to death due to any reason. Univariate analysis and multivariate analysis was performed by Cox regression analysis according to comparing the age (<40 and ≥40), tumor size (≤2cm and >2cm), node status, ER, PR, HER2, KI-67, surgery and radiation. Kaplan-Meier method was used to estimate DFS and OS, Log-rank test was used to compare the patients with different clinicopathologic characteristics. The count data were tested by c 2 test, sher exact probability method was used when the cases was less than 6. Statistical signi cance was set at a P<0.05, P < 0.01 had signi cant difference.

Patients
Between 2001 and 2016, a total of 525 females received surgical treatment in the southern China. Among them, 15 patients without the age information, so excluding this study. There were 51 patients without the information of tumor size, 6 patients without node status, and in the left 453 cases, 75 patients without pathological stage and treatment information and 95 patients were lost, all in all, there were 286 patients included in the southern China cohort. Between 1975 and 2016, a total of 65535 patients were included in SEER database, among them, 26277 patients were lost to follow-up, 35956 patients without the information of lymph node stage, 2678 cases had no tumor size information, 28 patients had no complete information of tumor staging and treatment, so there were about 596 patients were included in the SEER cohort ( Fig. 1).

Comparison of clinicopathological features between southern China and SEER cohort
All patients were divided into three subgroups: young adult group (<40), middle aged group (40-70) and aged+ group (>70). Among 510 breast cancer patients in southern China, 101 (19.8%) patients were under 40, what's more, middle aged patients account for the most, about 396 (77.65%) patients. Between 1975 and 2016, there were 65535 BC patients were included SEER database, the proportion of young patients was slightly less than that in southern China, was 4024 (6.14%), but there was no statistical signi cance between the two (P=0.923). However, the middle aged patients account for 41256 (62.95%) in SEER cohort (P=0.000). There was a difference between southern China and SEER cohort, the proportion of aged+ group was higher in SEER cohort, which was about 13104 (20.00%), it was signi cantly higher than 5 patients (0.98%) in southern China, there was statistically signi cant (P=0.048) (Tab. 1). BC patients in southern China and SEER cohort were compared, the age of newly diagnosed cases in the two groups was 40 to 70 years old. While, there was a different between the two groups: BC patients in the subgroup of 40 to 48 years old (48 years old was the median age of 40-70 years old)were roughly similar to the subgroup of 48 to 70 years old in southern China, there were 194 (38.04%) patients and 202 (39.61%) patients respectively. However, BC patients in SEER cohort were slightly older, it was 9105 (13.89%) patients in the subgroup of 40 to 48 years old, while there were 32151 (49.06%) patients in the subgroup of 48 to 70 years old (Tab. 1).
According to TNM stage, BC patients were divided into Tx, Tis and invasive group (including ≤2cm 2≤5cm >5cm), among them, invasive group account for the most, there were 448 (89.96%) patients in southern China and 1364 (2.99%) patients in SEER cohort, P=0.000. In southern China, the tumor size was 2 to 5 cm accounted for the most, were 323 (64.86%) patients. However, there were too many data were missing, were 64071 (97.8%) cases, Tis subgroup in the two was both 1 case (Tab. 1). There were 257 (50.39%) patients with node metastasis in southern China and 296 (0.45%) cases in SEER cohort. Among the two groups, no lymph node metastasis accounted for the most, were about 206 (40.39%) patients and 1145 (1.75%) patients. While, there were too many data of lymph node status missing, were 29269 (44.7%) patients (Tab. 1). Comparing southern China and SEER database, there was statistical signi cance in tumor stage (P=0.000). Among them, the proportion of stage 2 was the highest, were 262 (51.37%) cases and 14936 (22.81%) cases respectively. Next, it was stage 3, were 133 (26.08%) cases and 12993 (19.75%) cases. However, there were many missing data about tumor stage in SEER database, were about 29269 (44.7%) cases (Tab. 1). The expression of ER was count in both southern China and SEER cohort (P=0.000). Among them, ER (+) accounted for a higher proportion, were 290 (56.86%) cases and 20233 (65.34%) cases respectively, and ER (-) were 178 (34.9%) cases and 5563 (17.96%) cases respectively (Tab. 1). Similarly, the expression of PR in southern China and SEER cohort was also statistically signi cant (P=0.000). Among them, PR (+) was higher in both southern China and SEER cohort, were 270 (52.94%) cases and 179194 (55.55%) cases, PR (-) were 182 (35.69%) cases and 8110 (26.2%) cases (Tab. 1). While, the expression of HER2 was different between the two groups (P=0.000). The expression of HER2 (+) accounted for the most in southern China, was 283 (55.49%) cases, but there was only 169 (9.46%) cases in SEER cohort, with the proportion of PR (-) was high, was 1530 (85.62%) cases (Tab. 1). Additionally, the expression of KI-67 was different in the two groups, there were 365 (71.57%) cases of KI-67 (+), among them, it was the most between 14% to 51.7%, was 170 (16.58%) cases. However, there was no data about the expression of KI-67 in SEER cohort (Tab. 1). and OS of all included breast cancer patients were compared: in comparing of DFS, there was no signi cant different between southern China and SEER cohort (P=0.133), but there was signi cant different in OS (P=0.000), and the OS in SEER cohort was signi cantly higher than southern China ( Fig.  2A-B). Secondly, due to the data of southern China was only included from 2001 to 2016, so layered statistic was used to count the DFS and OS in both southern China and SEER database from 2001 to 2016. Among them, in the rst 70 months of follow-up, DFS in southern China was higher than SEER cohort, and then, DFS in SEER cohort was signi cantly higher than that in southern China (P=0.035), and OS in this period also has signi cant statistical different (P=0.000), SEER cohort was signi cantly higher than that in southern China (Fig. 2C-D).  (Fig. 3A-D). Secondly, the effects of different tumor sizes on the survival of BC patients in each cohort were analyzed separately. Among them, tumor size had little effect on DFS in southern China (P=0.487), but for OS, there was signi cant statistical different, OS in T>2cm group was signi cantly lower than T≤2cm (P=0.012) (Fig. 3E-F). However, for SEER cohort, DFS and OS of T>2cm group were slightly lower than that of T≤2cm group, but there was no statistical different (P=0.738 and P=0.299) (Fig. 3G-H). Analyze and compare the effect of different node stage on survival of different BC cohorts. Positive-node affected DFS and OS in both southern China and SEER cohort (P=0.000 and P=0.044). Meanwhile, negative-node also affected DFS and OS in the two groups (P=0.000 and P=0.000). OS of SEER cohort with different lymph node status was higher than that of southern China (Fig. 4A-D). Analyzing southern China and SEER cohort separately, DFS and OS of positive-node were lower than negative-node, among them, OS of lymph node status has signi cant statistical different (P=0.000), but DFS of lymph node status has no statistical different (P=0.448) (Fig4. E-F). But for SEER cohort, DFS and OS of positive-node was slightly higher than negative-node, while

Changes inmorbidity with years
To further analyze and compare the age distribution of BC patients in different years in SEER cohort. Among them, except for 90's, the proportion of BC patients in young adult group (< 40 years old) was gradually increased with time goes by, it were respectively: 70's: 6%, 80's: 6.31%, 90's: 5.76%, 00's: 7.62%, and the median age was 35 year, 36 year, 36 year and 36 year. In all age groups, the incidence of middleaged group (40-70 years old) was the highest, followed by 63.89%, 61.74%, 62.67% and 66.42%, and the median age was 57 year, 56 year, 56 year and 56 year. In addition, we found that with time going, the morbidity of aged+ group (>70) was decreasing year by year, respectively was 30.11%, 31.95%, 31.57% and 25.96%, and the median age was orderly 78year, 78 year, 78 year and 77 year (Fig. 6).

Discussion
Our research is a very important one, which rstly analyzed and compared the related factors of survival and prognosis of BC patients in southern China and SEER cohort. What's more, it is the rst time to analyze the factors: including age, tumor stage and grade, ER, PR, HER2, KI-67, surgery and radiotherapy, which may in uence the survival and prognosis of BC. It is a multi-regional and big data clinical study.
In this study, by comparing and analyzing the age of both southern China and SEER cohort, we found that in southern China from 2001 to 2016, there was about 19.8% BC patients were under 40 year, which was the same as the results of Wang K, who had reported that the incidence of young BC patients in China is about 21.97% [34] . However, the morbidity of young BC patients in China is signi cantly higher than that in western countries (about 4-6%) [35][36][37][38] , which was similar to the incidence of our study: it was 6.14% of SEER cohort from 1975 to 2016. All of these suggested that the incidence of BC in China is younger than that in western countries, which indicated that age may be a factor affecting the survival and prognosis of BC patients in southern China and the United States.
To further study DFS and OS, we focused on T stage, positive lymph node status, ER, PR, HER2, KI-67 expression of BC patients, and thought that T stage, positive lymph node status and KI-67 expression all could be regarded as factors, which affected the survival and prognosis of BC patients. Other scholars had also studied tumor stage, and found that there were about 60-70% of BC patients were diagnosed with stage 1, which was higher than Asian countries, but there were only about 10% women were stage 4 [39]. This research was similar as our results, our nding showed that most BC patients in southern China from 2001 to 2016 could be diagnosed at early time, among them, there were about 64.86% of patients with T2 BC, 40.39% of patients with N0, 51.37% of patients with stage 2. However, the early diagnosis rate of BC in China is far lower than that in the United States. It showed that in SEER cohort, there were about 65.96% of patients with T1, 79.46% of patients with N0 and 41.18% of patients with stage 2. Cox regression analysis showed that T stage and positive lymph node status were important factors affecting OS of BC. Therefore, we further to prove that stages and grades of tumor had a signi cant impact on the survival and prognosis of BC. Meanwhile, China should to further strengthen the early diagnosis and treatment of BC, so as to improve the prognosis of BC patients in China.
Besides, we further to explore the effect of ER, PR, HER2, KI-67 expression on survival and prognosis of BC. The proportion of ER (+) BC patients was similar in both southern China and SEER cohort. It was 56.86% in southern China and 65.34% in SEER cohort, which was slightly lower than that had been reported (about 70%) [40] . It may be related to excessive data deletion in SEER cohort, which was about 16.7% of ER data were missing in this study. There were studies have showed that the most important factors affecting the prognosis of BC were tumor grade and ER status [41] . However, in our study, ER was not an indicator of survival and prognosis of BC. Additionally, there were different treatment methods of BC according to the different status of hormone receptor (HR). Endocrine therapy could be used for ER or PR positive patients, but the effect of chemotherapy was not as good as these of negative patients, and the different treatment methods could signi cantly affect the prognosis of BC. Therefore, PR was also an important factor affecting the prognosis of BC. In our study, there was 52.94% of PR positive in southern China, and 55.55% of that in SEER cohort, there was signi cant statistical different between the two groups (P = 0.000). Similarly, the results of Cox regression analysis showed that PR was also not an indicator of survival and prognosis of BC, which may had a relationship between a large data missing. In addition, Ding L et al found that BC with HER2 and KI-67 overexpression had higher lymph node metastasis rate and higher AJCC tumor stage [42,43] , which was the same as our researches. In this study, HER2 positive were 55.49% and 9.46% in southern China and SEER cohort respectively (P = 0.000), which was consistent with the literature that BC cells from young patients are more likely to show HER2 positive expression [34] . Besides, KI-67 positive was 70.01% in southern China, which high expression was also an important factor affecting OS in BC patients. However, DFS was detected by χ 2 test when KI-67 was regarded as an independent factor, P = 0.05, but we thought that this was mainly due to the small sample size, the trend in the conclusion was still valid, with the sample size continues to increase, the value of P may gradually decrease. In summary, our results showed that T stage, positive-node status, KI-67 expression were all important factors affecting the prognosis of BC, which also re ected that BC patients in southern China and the United States have different biological behaviors and pathogenesis.
Additionally, the treatment methods of BC were also important factors affecting its prognosis. At present, the main treatments of BC were surgery, radiotherapy, chemotherapy, targeted therapy and hormone therapy [44,45] . Among them, surgery can signi cantly reduce the mortality rate, which is the most critical step in the treatment of breast cancer, there are ve common surgical methods: Breast conserving surgery (BCS), simple mastectomy (SM), modi ed radical mastectomy (MRM), radical mastectomy (RM) and extensive radical mastectomy (ERM) [46] . Among them, Bartelink H et al reported that BCS has the equivalence with mastectomy [47] . However, the comparison of treatment methods between southern China and the United States has not yet reported, we had studied this for the rst time. In this study, BC patients enrolled in the study in southern China all underwent surgery, and the treatment including BCS and mastectomy (including SM and MRM). Among them, there were about 95.09% of BC patients performed mastectomy and 4.91% of that performed BCS, and the implementation rate of BCS was signi cantly lower than that of developed countries, which was similar to the results of Gupta A: the highest implementation rate of BCS in China is only 8.6% [48] . However, there were only 18.24% of BC patients performed mastectomy in SEER cohort, 81.76% of that had not. While, there was no signi cant effects on DFS and OS for whether performing surgery, which suggesting that surgery had little effect on the survival and prognosis of BC. But for SEER cohort had not include the details of surgical procedures, so its' effects on survival and prognosis of BC were not studied and analyzed, which may be a potential factor. In recent years, a number of large randomized trials have shown that radiotherapy could signi cantly reduce the local recurrence of BC, so as to improve the breast preservation rate and obtain good survival rate [48]. However, in this study, the radiotherapy rate of BC patients in southern China and SEER cohort were both not high, were 27.18% and 34.22% respectively, and there were signi cant statistical different between the two (P = 0.001). Nevertheless, the results of Cox regression analysis showed that radiotherapy had no signi cant effect on both DFS and OS, which may be related to a low radiotherapy rate, but it should be further studied and analyzed. Besides, chemotherapy was also an important treatment for BC, we found that most of BC patients in southern China are treated with chemotherapy, which was approximately 97.01%. But there were only about 24.08% of BC patients had received chemotherapy in SEER cohort (P = 0.000), which indicated that there are great differences in BC treatment between China and the United States.
Additionally, as time goes by (from 70's to 90's), the proportion of young BC patients in SEER cohort increased gradually, while the proportion of elderly patients decreased gradually. What's more, the median age of diagnosis was gradually constant, it were 36 year, 56 year and 77 year respectively, which may be related to the early diagnosis of BC.
There are some limitations to this study. The data from southern China cohort are from Yunnan Cancer Hospital, A liated Tumor Hospital of Guangxi Medical University and The First People's Hospital of Foshan, these data may therefore be slightly different from that of the National Cancer Registry System. In addition, the presence of missing data and limited follow-up time can be considered weaknesses of this study. Other limitations include the lack of data on surgical procedures, KI -67 statuses and endocrine therapy in SEER cohort, which limited the analysis of their in uence on patients' survival improvements.

Conclusion
In conclusion, our study suggested that T stage, positive lymph node status and KI-67 expression rate were important factors affecting the prognosis of BC, and played an important role in the development process of BC.       Comparison of different years all patients' age in SEER database.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.