1. Clinical samples and immunohistochemistry
From 2002 to 2014, we collected paraffin specimens of cancer and paracancerous tissues of breast cancer patients from Shanghai Changhai Hospital, Shanghai Ruijin Hospital, Shanghai Xinhua Hospital and Shanghai Huangpu District Central Hospital, including 573 cases of primary breast cancer tissues and 29 cases of paracancerous normal tissues. Finally, we successfully constructed seven tissue chips, of which six were cancer tissue chips, with a total of 303 cases; one was a paracancerous normal tissue chip, with a total of 29 cases. All cases were diagnosed by comprehensive pathology and definitely confirmed as breast cancer. All patients received systemic local and/or systemic treatment including radiotherapy, surgery, chemotherapy and endocrine therapy. We obtained hospitalization number and pathology number from the medical record room, collected all original medical records corresponding to patients through the hospital internal database, collated the data of breast cancer patients, and classified the statistics according to specified indicators, including clinical characteristics, lymph node metastasis and TNM staging. We used the streptomycin avidin-peroxidase (HRP) complex method to determine the distribution of antigens in tissues and cells through the biotin streptavidin reaction. The results were judged by double-blind method. Without knowing the patient’s clinical data, two experienced pathologists judged separately and reviewed the inconsistent results.
2. Scoring criteria for immunohistochemistry
For Brachyury-positive cells, the positive staining was light yellow, brownish-yellow, and brown, which were located in the nucleus. The results of immunohistochemistry were evaluated using a two-level scoring method. According to the degree of staining, positive cells ≤ 5% were judged as 0 points, 6%-25% were judged as 1 point, 26%-50% were judged as 2 points, and 51-75% were judged as 3 points, and >75% were judged as 4 points. For staining intensity, non-coloring was judged as negative and counted as 0 points, light brown was judged as weak positive (+) and counted as 1 point, dark brown was judged as strong positive (3+) and counted as 3 points, and staining between weak positive and strong positive was judged as (2+) and counted as 2 points. The comprehensive calculation was based on the product of staining intensity and percentage of positive cells, of which 0 points were judged as (-), 1-4 points were judged as (+), 5-8 points were judged as (2+) and 9-12 points were judged as (3+). A total score of 0-4 points was considered negative, and a total score of 5-12 points was considered positive.
3. Data analysis
We used the mice package in R to perform multiple imputation on missing data. First, SPSS 21.0 statistical software was used to perform univariate analysis on the data, and P<0.05 on both sides indicated that the difference was statistically significant. Then different statistical methods were used according to the specific conditions of the data. Mann-Whitney U non-parametric test was used to analyze the relationship between the expression of Brachyury protein and age, Pearson X2 test or Fisher exact probability test was used to analyze the Brachyury expression in cancer tissues and paracancerous tissues, McNemar's test was used to analyze the Brachyury matched expression in cancer tissues and paracancerous tissues, and P<0.05 on both sides indicated that the difference was statistically significant. Subsequently, we calculated the person correlation coefficient between each variable, compared the relationship between each variable and the patient's prognosis, and then selected the variables suitable for modeling. We used logistic regression, random forest, decision tree and neural network algorithms to build clinical prediction models. All the above models were implemented using R language.