An Anaplastic Lymphoma Kinase Pathway Signature is associated with Cell De-differentiation, Neoadjuvant Response and Recurrence Risk in Breast Cancer

The role of ALK signaling in the pathogenesis of breast cancer (BC) is not clear. Previously we generated a gene signature for ALK pathway based on the difference of gene expression profiles between tumor cells with activated and inactive ALK pathway. Here, this signature was used to compute ALK pathway activity in BC samples from 42 microarray datasets, and the associations between ALK pathway score and the clinical outcome were examined by logistic regression and survival analysis. Our results indicated that high ALK pathway activity was a significant risk factor for the presence of higher-grade breast cancer with loss of ER and PR expression in the 42 datasets (n= 6381). ALK pathway activity was also positively associated with pathological complete response (pCR) in 15 datasets annotated with patient’s neoadjuvant response information (n= 2093, overall OR 1.67, p=2.00E-12), and with recurrence risk in 30 datasets annotated with patient’s survival information (n= 4678, overall HR 1.21, p=3.31E-08). The associations of ALK pathway activity with pCR and recurrence were more significant in HER2 negative and grade 1&2 tumors than in HER2 positive and grade 3 tumors. Notably, the association between ALK and tumor recurrence was statistically significant in patient with age>50 but not ≤50 years old, in patient with positive but not negative lymph node and in patients with residual disease but not pCR following neoadjuvant chemotherapy. These data indicate that ALK may be involved in BC tumorigenesis and ALK pathway signature represents a novel biomarker for BC clinical management.


Abstract
The role of ALK signaling in the pathogenesis of breast cancer (BC) is not clear. Previously we generated a gene signature for ALK pathway based on the difference of gene expression profiles between tumor cells with activated and inactive ALK pathway. Here, this signature was used to compute ALK pathway activity in BC samples from 42 microarray datasets, and the associations between ALK pathway score and the clinical outcome were examined by logistic regression and survival analysis. Our results indicated that high ALK pathway activity was a significant risk factor for the presence of highergrade breast cancer with loss of ER and PR expression in the 42 datasets (n= 6381). ALK pathway activity was also positively associated with pathological complete response (pCR) in 15 datasets annotated with patient's neoadjuvant response information (n= 2093, overall OR 1.67, p=2. 00E-12), and with recurrence risk in 30 datasets annotated with patient's survival information (n= 4678, overall HR 1.21, p=3.31E-08). The associations of ALK pathway activity with pCR and recurrence were more significant in HER2 negative and grade 1&2 tumors than in HER2 positive and grade 3 tumors. Notably, the association between ALK and tumor recurrence was statistically significant in patient with age>50 but not ≤50 years old, in patient with positive but not negative lymph node and in patients with residual disease but not pCR following neoadjuvant chemotherapy. These data indicate that ALK may be involved in BC tumorigenesis and ALK pathway signature represents a novel biomarker for BC clinical management.

Background
Breast cancer (BC) is the most common cancer among women, accounting for approximately a quarter of all new cancer cases diagnosed in women worldwide [1].
Recent genomic studies demonstrate that BCs are highly heterogeneous in their molecular 3 biology. In addition to those well-known genetically altered oncogenes and tumor suppressor genes, including BRCA1, EGFR, FGFR, MET, PI3K, TP53 and RB1, many lesserknown or lesser-characterized genes in the case of BC, such as anaplastic lymphoma kinase (ALK), were found to be mutated or amplified in BC [2,3].
ALK is a transmembrane tyrosine kinase receptor belonging to the insulin receptor superfamily. Aberrant activation of ALK is involved in the tumorigenesis of a subset of haematopoietic, epithelial and mesenchymal neoplasms such as anaplastic large cell lymphoma, lung cancer, inflammatory myofibroblastic tumors and neuroblastoma [4][5][6][7].
The role of ALK signaling in the pathogenesis of BC is not clear.
Conventionally, pathway activation is assessed by methods including immunohistochemistry that detects the levels of pathway-related protein expression or fluorescence in situ hybridization and quantitative PCR that detects amplification/overexpression of related oncogenes. The disadvantage of these methods is that they may sometimes be unreliable because most pathways can be activated at multiple points. The latest advances in high-throughput genomic technologies provide alternative strategies for semi-quantifying pathway activity through analyzing the expression profile of a pathway-specific gene signature by using approaches such as Bayesian binary regression (BinReg) that was developed by Nevins group [8,9]. We previously generated a gene signature for ALK pathway based on the difference of gene expression profiles between tumor cells with activated and inactive ALK pathway [10]. In the current study, ALK pathway activity in BC samples from 42 microarray datasets was computed and its associations with the clinical outcome were examined. Our results indicate that ALK pathway is associated with cell de-differentiation, neoadjuvant response and recurrence risk in BC.

Microarray datasets and patient cohorts
Microarray datasets from Affymetrix U133 GeneChip microarray platforms (including HU133A, HU133A 2.0 and HU133 Plus 2.0) were analyzed since ALK pathway signature used in this study was generated from datasets of these platforms. To avoid selection bias, only datasets with more than 50 samples were chosen in this study. When performing this study, we found 42 publicly available datasets annotated with neoadjuvant response or cancer recurrence information meet the above-mentioned requirements. The raw CEL files of datasets were downloaded from Gene Expression Omnibus (GEO), except for dataset MDA133 that was downloaded from MD Anderson (https://bioinformatics.mdanderson.org/public-datasets/). The CEL files were normalized using Microarray Suite 5.0 (MAS5.0) and Robust Multi-array Average (RMA) approaches in R environment, respectively. Array quality was assessed by R simpleaffy package [11].
Batch effects for arrays from different hybridization dates were estimated using principal component analysis. ComBat program was used to eliminate batch effects when multiple datasets were merged or apparent batch effects were observed in a single dataset [12].
For datasets GSE3494, GSE2990, GSE6532 GSE7390, the patient survival data were updated and duplicate samples were removed according to the curated clinical data made by Dr. Jonas Bergh's that is available in GEO database (accession number: GSE83232).

Pathway activity prediction by BinReg
Using the BinReg approach to generate pathway signatures and predict pathway activities of individual samples has been described in detail before [8,9]. Briefly, the gene 5 expression patterns of two sets of samples (with one pathway being 'on' and 'off' respectively) were analyzed, and the pathway-specific informative genes (signature genes) were identified. Principal components were then used to compute weights for each of signature genes, such that the weighted average of expression levels showed a clear ability to distinguish the pathway "on" and "off" group. By applying binary regression on the principal components to the gene expression dataset of an unknown sample, a probability score of pathway activity for that sample was produced.

ALK Pathway signature
Two microarray datasets were used to generate ALK pathway signature as we described previously [10]. Briefly, the gene expression data of anaplastic largecell lymphoma cell line TS treated with or without ALK inhibitors A2 or A3 (GSE6184) was used as training set to generate signature. This signature was then validated by the gene expression data of TS cells with or without knock-down of ALK (GSE6184) and the expression data of lung cancer cell line NCI-H2228 treated with or without ALK inhibitor CH5424802 (GSE2511817).

Statistics
Statistics was performed by using R packages including Metafor [13], Survival [14], and Survminer. Odds-ratios for the associations of neoadjuvant response with risk scores were calculated using logistic regression. Kaplan-Meier survival curves with log-rank test and cox proportional hazards regression were used to analyze the association between disease-free survival and risk scores. Disease free survival (DFS) was defined as the time from surgery to the first confirmed relapse or metastasis. In this study distant metastasis-free survival (DMFS) was preferred to be used in survival analysis, while when the data is not available, relapse-free survival (RFS) data was used. The overall hazard ratio (HR) of a variable of interest was calculated using a random-effects model. The 6 significance of the overall effects across multiple datasets was estimated by Z test.
To examine the association of ALK pathway with neoadjuvant response and cancer recurrence in different subgroups of BC, the 42 microarray datasets were merged into 4 cohorts. Cohort 1 was merged from the 15 datasets (MDA133, GSE16446 in which only RFS information is available and the longest follow-up time is > 8 years. The remaining 6 datasets (GSE16391, GSE16446, GSE17907, GSE19615, GSE25055 and GSE25065) that have either RFS or DMFS information were merged as cohort 4 since these datasets have a short follow-up time. Combat program was used to remove the batch effects in the cohort. It is worth noting that 3 datasets (GSE16446, GSE25055 and GSE25065) were present in both Cohort 1 and Cohort 4, therefore when both these 2 datasets were used for same statistical analysis, these 3 datasets were removed from cohort1. The disease free survival time was censored at 8 years in cohort 2 and cohort 3 while censored at 5 years in cohort 4.
The datasets used in this study were generated from different research groups and contain different clinical variables. The sample size will be too small if all the variables were put together for adjustment in multivariate regression analysis. Therefore, in this study, ALK pathway score was tested with adjustment for one variable each time in multivariate regression analysis. 7 All statistical analyses were two-sided and considered significant when p < 0.05.

ALK pathway signature score is associated with de-differentiation of BC
The association between ALK pathway signature score and seven clinical characteristics of BC were analyzed in 42 affymatrix datasets individually by univariate regression analysis. Figure 1 showed that in most of the datasets, ALK pathway signature score was a significant unfavorable factor for ER-positive (overall OR 0. 30 (Fig. 2).
The 42 datasets were further merged into four cohorts, and similar associations of ALK pathway scores with the seven clinical characteristics were observed in the four merged cohorts ( Supplementary Fig. 1). Multivariate logistic regression was also performed in these merged cohorts to assess whether the association of ALK pathway scores with estrogen receptor (ER) status, progesterone receptor (PR) status and tumor grade remain significant after adjustment for other covariates. Overall, the ORs for association of ALK pathway score with ER status stay constant when the pathway score was tested separately with correction for age, tumor grade, and tumor size, status of PR, HER2 or lymph node (Fig. 3A). Similar trend was also observed for the associations of ALK pathway scores with tumor grade or PR status, except that the association with PR status was almost abolished when adjusted for ER status (Fig. 3B and C). These results indicate that ALK pathway score is an independent risk factor for the presence of ER-negative and high grade BC, or in the other words, ALK pathway score is an independent factor associated with de-8 differentiation of BC since loss of ER expression and high pathological grade are the features of de-differentiated BC [15].

ALK pathway signature score is associated with pathological complete response (pCR) of BC
Logistic regression analysis showed that ALK pathway score was positively associated with pCR rate of BC in 15 individual datasets that contain patient's neoadjuvant response information (overall OR 1.67, 95% CI 1.45-1.93, p=2.00E-12) (Fig. 4A). This association reached statistical significance (p<0.05) in 8 of the 15 datasets, and approached the borderline of significance (p=0.08) in 2 of the datasets (Fig. 4A). As to the remaining 5 datasets with p > 0.08, two (GSE16446, and GSE18864) contain only ER-negative BC samples, and another two (GSE37946 and GSE50948) are mainly composed of HER2positive samples (Supplementary Table 1), suggesting that ER and HER2 status affect the association between ALK pathway score and pCR rate. This effect was also observed in the cohort merged from the 15 datasets. As shown in Fig. 4B, an increase of one standard deviation (SD) of ALK pathway score is associated with 62% and 85% increase of pCR rate in ER-positive (95% CI 1.28-2.04, p=5.31E-5) and HER2-negative (95% CI 1.63-2.10, p=1.80E-21) BC respectively, while only with 17% and 28% increase in ER-negative (95% CI 1.02-1.34, p=0.03) and HER-positive (95% CI 1.03-1.58 p=0.02) BC respectively. In addition, ALK pathway score is also more strongly associated with pCR rate in grade 1&2 tumors (OR 2.16, 95% CI 1.63-2.87, p=8.09E-08) than in grade 3 tumors (OR 1.36, 95% CI 1.17-1.59, p=9.59E-05).
Multivariate logistic regression analysis on the merged cohort showed that the association of ALK pathway scores with pCR was hardly affected after adjustment for other clinical variables, except that the OR values decreased moderately when adjusted by ER or PR status (Fig. 4C). 9

ALK pathway signature score is associated with recurrence risk of BC
Logistic regression analysis showed that ALK pathway score was positively associated with recurrence risk of BC in 30 individual datasets in which patient's survival information is available (overall HR 1.21, 95% CI 1.13-1.29, p=3.31E-08) (Fig. 5A). Among the 30 datasets, GSE16446, GSE31519, GSE58812 contain only ER-negative samples and GSE17907 is mainly composed of Her2-positive samples (Supplementary Table 1). ALK pathway score failed to achieve positive association significantly with recurrence risk in all of these 4 datasets. Four additional datasets (GSE 20711, GSE2603, GSE2990 and GSE7378) have sample size <100. Among the remaining 22 datasets, the association of ALK pathway with recurrence risk reached statistical significance in 11 datasets (Fig. 5A).

ALK pathway signature score is associated with recurrence risk only in patients with residual disease (RD) following neoadjuvant chemotherapy
Our data showed that ALK pathway score is positively associated with both pCR rate and recurrence risk of BC. These results seem to be controversial since BC patients with pCR generally have better prognosis than patients with RD [16]. A potential explanation is that ALK pathway signature score is associated with recurrence risk only in RD but not in pCR patients. To test this hypothesis, samples in cohort 4 that were annotated with both pCR and survival information were tested. Kaplan-Meier analysis showed that patients with pCR have much lower relapse rate than patients with RD (black vs grey, 5-year DFS rate: 0.93 vs 0.69, p=1.96E-05) (Fig. 6D). When RD patients were further stratified into two subgroups based on ALK pathway scores, the subgroup with low ALK pathway score had better DFS rate than that with high ALK score (blue vs violet, 5-year DFS rate: 0.60 vs 0.77, p=1.83E-04) (Fig. 6D). By contrast, the two subgroups formed by further stratifying the pCR patients based on ALK pathway scores showed similar DFS rates (green vs red, p = 0.67) (Fig. 6D).
It is controversial whether alterations of ALK gene exist in BC. While Perez-Pinera et al. [17] reported that activated ALK strongly expressed in different histological subtypes of BC, Fukuyoshi et al. [18] and Lerebours et al. [19] failed to observe such abnormality in BC patients. Recently, Kim et al. [20] demonstrated that ALK copy number significantly increased in inflammatory BC (IBC) and was associated with higher recurrence risk in IBC patients. In addition, TCGA (The Cancer Genome Atlas Network) genomic analysis showed amplification of ALK gene in 43 of 476 BC samples [3]. In this study, we found that higher ALK pathway score was associated with cell de-differentiation and higher recurrence risk in BC. These results support the notion that active ALK signaling has a functional role in the pathogenesis of BC.
An interesting finding in this study is that the association between high ALK and tumor recurrence differs by age group (>50 vs. ≤50 years), with ALK significantly associated with risk of recurrence only in the middle-aged and elderly patients. This characteristic of ALK has not been reported in other cancer studies. The reason for this is unclear, although it might be due to the complex hormonal environment of young women. A retrospective analysis performed on 260 elderly and 294 middle-aged patients with primary BC shows that negative lymph node status, small tumor size and positive ER status are favorable indicators of survival in both the elderly and the middle-aged patients [24]. In addition, another study shows that ER/PR status and HER2 gene amplification or overexpression are prognostic factors in elderly patients with BC [25]. ALK, ER/PR and HER2 share common downstream activation pathways, such as Ras/MEK/ERK and PI3K/Akt/mTOR, which ultimately lead to increased transcription, cell proliferation, growth and survival [26][27][28][29][30].
This also may be used to explain that they have similar prognostic effects on the same type of patients.
Another interesting finding in this study is that the relationship between ALK and tumor recurrence exists in patient with lymph node metastasis, but not in the node negative patients. A potential explanation is that the growth/survival of BC with metastatic capabilities (such as BC with positive lymph node) will benefit more from active ALK signaling than that without the capability (such as BC with negative lymph node). To establish a distant tumor, a metastatic cancer cell needs to survive and adapt to a new environment, rebuild cell-matrix interactions, from a micrometastasis and finally restart an unlimited growth process. Metabolic rewiring is a key factor for tumor cells to finish this metastatic cascade, and several pathways including TGF-β and hypoxic signaling are involved in this rewiring processes [31]. Recent studies revealed that the upregulation of hypoxia-inducible factors under hypoxic conditions, as well as the induction of VEGF secretion induced by TGF-β, were both in an ALK-dependent manner in different tumor cells [32,33], suggesting a potential connection between ALK signaling and metabolic rewiring. Probably through this connection, ALK signaling plays more active roles in the molecular pathogenesis of metastatic cells than the primary tumor cells that may not benefit from metabolic rewiring.
A number of prognostic gene signatures have been developed for prediction of neoadjuvant response or recurrence risk in BC [34][35][36][37][38][39][40]. In this study, an ALK pathway gene signature was found to well predicted both the neoadjuvant response and recurrence risk in multiple datasets encompassing >5000 BC cases, suggesting it as a potentially promising biomarker for BC prognosis and management. Compared with those gene signatures reported previously, the unique feature of ALK pathway signature is that it is specifically associated with recurrence in BC patients with age >50, with lymph node metastasis, or with RD after neoadjuvant chemotherapy, which indicates that these 13 patients may particularly benefit from using this signature for risk estimation. Further validation of ALK pathway signature in additional independent and clinically relevant analyses will be necessary before entering clinical trials.
In summary, this study highlights that ALK pathway gene signature represents a potentially promising biomarker for guiding clinical management of BC. Our results also support the notion that ALK signaling may have an oncogenic role in the pathogenesis of BC and therefore may be a potential molecular target for BC therapy.   Associations of ALK pathway signature score with age, tumor size and status of HER2 and node in 42 individual breast cancer datasets. A) Patient age (>50 as event and ≤50 as non-event); B) HER2 status (HER2 positive as event and HER2 negative as non-event); C) Lymph node status (node positive as event and node negative as non-event); D) Tumor size (>2cm as event and ≤20 as non-event).
The OR (per one standard deviation increment) with the increase of ALK pathway signature score (used as continuous variable) was analyzed using univariate logistic regression in 42 individual breast cancer datasets. Only datasets with event case >5, non-event case >5 and total case >30 were included for regression analysis.  Associations of ALK pathway signature score with recurrence risk in breast cancer. The associations of ALK pathway score with recurrence risk were analyzed by univariate logistic regression in 30 individual breast cancer datasets (A), and were also compared among different subgroups of breast cancer in three 24 independent cohorts that were merged from the 30 datasets, including cohort 2 (B), cohort 3 (C) and cohort 4 (D). The detail of these 3 breast cancer cohorts are described in Material and Methods. ALK pathway score was used as continuous variable and recurrence risk (indicated by HR) is presented per one-SD increment.

Figure 6
Kaplan-Meier analysis of the associations of ALK pathway signature score with recurrence risk in breast cancer. The patients were stratified into three groups based on ALK pathway signature score (high: ≥ 2/3 percentile; low ≤ 1/3 percentile; intermediate: < 2/3 percentile and >1/3 percentile) in cohort 2 (A) cohort 3 (B) and cohort 4 (C) respectively and Kaplan-Meier analysis was performed to compare the probabilities of DFS among the three groups. To compare the associations of ALK pathway score with recurrence risk in breast cancer patients with pCR or RD following neoadjuvant chemotherapy, the samples