Metastasis progression through the interplay between the immune system and Epithelial-Mesenchymal-Transition in circulating breast tumor cells

Background Circulating tumor cells (CTCs) are the critical initiator of systemic dissemination of cancer, contributing to distant metastasis formation. The metastatic cascades rely on the fundamental roles of different types of CTCs. In which the dual immune responses and epithelial-mesenchymal-transition (EMT) are of two metastasis-driving phenomena and require more molecular assessments. Methods In this study, we investigated the transcriptomic modular pattern of single/cluster circulating tumor cells (CTCs). The co-expression analysis implemented, and we could detect two metastatic subnetworks indicating the immune responses and EMT in CTCs. Furthermore, a directed subnetwork identied in the KEGG database. The metastatic potential of subnetworks assessed and validated by classication methods on primary tumors. And, we could t risk models to distant-metastasis survival of patients. Our results show the crosstalk among EMT, immune system, menstrual cycles, and stemness in CTCs. In which, uctuation of menstrual cycles (hormone-related signals) is a new detected pathway in CTCs in breast cancer. The immune SVM model showed high metastatic potential in classifying patients metastatic/non-metastatic groups (accuracy, sensitivity, and specicity scores are 78%). The distant-metastasis free survival model could be used to stratify patients into low, medium, and high-risk groups. Finally, PTCRA, F13A1, LAT, ICAM2, and SNRPC are novel detected biomarkers in breast cancer.

metastasis free survival model could be used to stratify patients into low, medium, and high-risk groups. Finally, PTCRA, F13A1, LAT, ICAM2, and SNRPC are novel detected biomarkers in breast cancer.

Conclusion
In conclusion, different types of CTCs, including cluster/single cells, are metastasis-leading elements in breast cancer. In which, individual assessment of their intrinsic biological properties may assist elucidating metastasis-related mechanisms. These ndings may apply to develop superior treatments in the clinic.

Background
Metastasis is the common cancer-associated cause of deaths that accounts for a remarkable 90% deaths [1]. Cancer progression and metastasis are of the critical and even controversial aspects of cancer biology. There are two arguable metastasis models, including parallel progression and linear progression, which try to explain the dark side of the cancer metastasis [2]. In the linear model, the tumor initiates by genetic and epigenetic alternations, grow, spread, and gain metastasis potentials and disseminate to ectopic sites. Still, in the parallel model, the metastasis ability initiates early-onset and separately evolve [2,3]. Apart from multiple metastasis models and molecular mechanisms which indicate a different aspect of cancer initiation and progression, the circulating tumor cells (CTCs) in patients' bloodstream and likewise their physical characteristics of single CTC or clustered CTC play a crucial role in metastasis propensity [4]. Of note, CTCs are rare disseminated tumor cells in the peripheral blood of patients that are negatively related to the high rise of mortality rates in cancer. They borrow the morphologic features of their primary tumor and gain new features to survive and metastasize secondary organs [5]. The cluster CTCs consist of 2-50 cancer cells within the circulation of patients, and they have a 23-to 50-fold increase in metastasis potential [4].
Circulating tumor cells overcome many hurdles to colonize distant organs [6] and includes intravasation into blood or lymph, evading immune bulwarks, extravasation to distant sites, and eventually replacing with host tissue microenvironment [6,7]. The CTCs mirror their primary tumor characteristics and even metastases they initiate, with which strong abilities to survive under shear forces in circulation and eventually overtake host tissue [6,7].
The two important power of malignant cells is the reversal phenotypic of Epithelial-mesenchymal transition (EMT) and even immune system suppression. EMT is a cellular transition in which cells acquire mesenchymal characteristics that can accelerate metastasis through immunosuppression [8,9]. The epithelial cells that isolated the primary site undergo immediately anoikis, a fate likely to meet most CTCs in blood circulation. Accordingly, such mesenchymal transformation may provide longer survival signals to reduce apoptotic outcome [4,10].
Metastasis is the leading cause of death among women with breast cancer, and CTCs are prominent and leading components that appear even in early stages [11,12]. Detection of CTCs in metastatic and nonmetastatic breast cancer patients implies its role in cancer progression. Therefore, fully realize the CTCs' molecular characteristics will guide us to unknown metastasis concepts and more precise therapeutic decisions.
In this study, we implemented the co-expression network reconstruction for CTCs isolated from advanced patients. We extracted metastasis relevant subnetworks that enriched in the immune system and EMT.
The metastasis-free survival of genes assessed in GSE7390. Concerning a better understanding of signaling inside CTCs, we also extracted an induced directed subnetwork from the KEGG database. We also carried out the support vector machine (SVM) classi cation for selected subnetwork on GSE7390 to prepare a predictive model to classify metastatic and non-metastatic primary tumors. The subnetworks preservations were validated on another dataset.

| data sets and metadata information
The single-cell RNA-seq data related to advanced ER+ breast cancer patients were downloaded from the NCBI data repository (GSE86978). The data consist of 77 cells in which 47 of them are clustered CTCs, 22 CTCs are single cells, and the rest of the cells are not categorized. The GSE51827 was used for subnetwork preservation analysis that consists of 29 cells (single CTCs and cluster CTCs), in which 15 cells are single CTCs, and 14 cells are cluster CTCs. The GSE7390 consists of 198 breast cancer patients' expression extracted from the primary tumors. The patients are not treated. The GSE9195 consists of 77 breast cancer treated with tamoxifen, which is used for classi cation and metastasis-free survival validation.
2.2 | Pre-processing, normalization, and differential analysis To have more precise downstream analysis and remove non-biological variations, we implemented several pre-process steps on genes and also cells. At the rst step, we ltered out low abundance genes, then the small count cells omitted subsequently. In the last step, the expression data were normalized to reduce technical effects using the scatter package [13]. The differentially expressed analysis (DEA) was completed using the limma package in R [14]. The DEA was implemented to compare clustered cells expression to the single cells' expression (FDR < 0.05).

| Co-expression network reconstruction (CNNR) and subnetwork extraction
The co-expression network reconstructed using a weighted correlation network analysis (WGCNA) method [15]. The pairwise relation among genes was estimated using the Pearson correlation among genes. Concerning having more connected subnetworks, we carried out the topological overlap matrix (TOM) and, consequently, connectivity gene ltering (connectivity values less 0.1 are omitted). Eventually, we used hierarchical clustering to extract subnetworks. The trait used in this study is clustered and single status of the cells captured in blood. The subnetworks in which have strong correlations between their rst principle component and the biological trait are selected as trait related subnetworks. The gene signi cance and module membership were used to lter out essential genes in selected subnetworks. The gene signi cance is the correlation between gene expression and the trait. The module membership is the correlation between gene expression and module representative ( rst principle component in the principal component analysis (PCA)).

| Directed network reconstruction
We downloaded all homo sapiens pathways from the Kyoto Encyclopedia of Genes and Genomes (KEGG) database resource [16]. All the KEGG Mark-up Language ( KGML) pathways parsed and converted to graph objects, and nally, these graphs merged [17]. Furthermore, the KEGG ids annotated to gene symbols. At the last step, we extracted KEGG induced subnetwork using the most trait associated subnetworks (correlation > 0.5) resulted from the CNNR step. The genes were categorized by their biological process feature using ClueGO plug-in in Cytoscape [18,19]. The network visualization implemented by the Cytoscape and the Gephi software [19,20].

| Gene set enrichment analysis and subnetwork preservation analysis
The most trait-related subnetworks ( ) extracted from CNNR, were enriched using ConsensusPathDB webserver (q-value < 0.05) [21]. The hierarchical clustering applied to selected subnetworks to check the genes' potential to separate cluster CTCs and single CTCs.
The GSE51827 downloaded from NCBI to implement preservation analysis of subnetworks in another dataset in R. The data pre-processed, normalized, and merged using scatter package, which is suitable for single-cell RNA-seq data. The preservation combined statistics including and were used to check the reproducibility of subnetworks [22]. These two combined statistics consist of 12 statistics, which all of them are calculated.
The includes connectivity and density statistics, which shows the interaction pattern among genes in subnetworks. The subnetworks with are not preserved. if , the subnetwork is semi preserved, and if , the subnetwork is preserved. Also, a higher indicates more preservation of subnetworks in the second data set [22].

| Distant metastasis classi cation
To evaluate the importance of selected subnetworks and the potential of genes in metastasis, we implemented the classi cation algorithms on two individual datasets. The expression data refer to primary tumors. The GSE7390 (Affymetrix platform, HG-U133A) downloaded using the GEOquerry package in R [23]. The ER+ patients (134 patients out of 198 ones) ltered, and the expression data normalized using the RMA method [24]. In this section, we learned three classi ers, including support vector machine (SVM), arti cial neural network (ANN), and decision tree on metastatic and nonmetastatic patients [25]. The classi cation algorithms ran with and without feature selection algorithms, including the genetic algorithm (GA) and the world competitive contest (WCC) algorithm. The SVM ran with 5-fold cross-validation and 80 percent of cells as the training set. Finally, the accuracy, precision, and speci city were checked to select a better classi er for metastasis prediction, furthermore to extract the most metastatic features too. The selected model was assessed in another dataset (GSE9195).

| Distant metastasis-free survival analysis
The Kaplan-Meire distant metastasis-free survival estimate and overall survival analysis were implemented using GSE7390 in R [26]. The patients were strati ed due to quantiles. The expression values lower than the second quantile were labeled low expression, and expression values higher than the fourth quantile were labeled high expression. The stepwise Cox proportional hazard ratio (Cox-PH) modeling was implemented on selected subnetworks [26]. The Variance In ation Factor (VIF) lower than two was used as the variable selection criteria. The second and fourth quantiles of the predicted hazard ratio were used for stratifying patients into three groups, including low-risk, medium-risk, and high-risk patients.

| Pre-processing of CTCs and DEA
There are 74 out of 77 cells after pre-processing for downstream analyses. The excluded cells had low quality; therefore, we omited them. The differential expression analysis was implemented after the normalization step. The expression difference between cluster CTCs and single CTCs groups were assessed (FDR < 0.05). The genes of immune subnetwork are downregulated (light purple color in the heatmap) in cluster CTCs compare to single CTCs. Furthermore, the genes of the EMT subnetworks imply upregulation (dark purple color in the heatmap) in cluster CTCs (Fig. 1). The immune and EMT subnetworks were used independently to cluster all cells. As illustrated in Fig. 1, the single and cluster CTC cells separated well with selected subnetworks. The immune-related subnetwork represents a stronger expression difference between cluster cells and single cells.

| Metastasis associated subnetworks
Metastasis associated subnetworks were determined by co-expression analysis and hierarchical clustering. We  Table S2.
The preservation of all subnetworks was assessed in another dataset (GSE51827). The two combined statistics and calculated to assess subnetworks preservation in the second data set (immune subnetwork: , and EMT subnetwork: , ). The values for both subnetworks are more than 10, and values are high too (min= 7, max= 32). Subsequently, these statistics represent our immune and EMT subnetworks preservation in the second data set (Table S3).

| Directed network reconstruction
The signaling pathways inside the CTCs are not well recognized. The 336 homo sapiens signaling pathways out of 537 ones in the KEGG database downloaded and merged. The association among two selected subnetworks (immune and EMT) were investigated by extracting induced subnetwork from KEGG. A directed subnetwork of size 255 genes extracted and illustrated in Fig. 3. There are 12 gene categories obtained from biological processes, including Hormonal regulation, Immune responses, Ion metabolism, Nucleobase metabolism, Oxidative responses, Protein localization, Protein topology response, STAT singling pathway, Vitamin metabolism, cell differentiation, circulation in blood regulation, Energy metabolism. These categories are illustrated by colors on network nodes, and the genes with no category remained grey. The nodes with multiple colors were detected in different biological processes. The node size is illustrated by the node degrees. PLCG1 and ENTPD8 are two hub nodes in the network that participate in energy and nucleobase metabolisms, respectively.

| Distant metastasis classi cation model
The distant metastasis potential of two nominated subnetworks for classifying patients into primary tumors was assessed using SVM, neural network, and decision tree methods. The 5-fold cross-validation SVM accuracy, sensitivity, and speci city scores for EMT related subnetwork are 79%, 78%, and 21%, respectively. The neural network accuracy, sensitivity, and speci city scores are 18%, 18%, and 80%, respectively. Eventually, the decision tree accuracy, sensitivity, and speci city scores are 60%, 60%, and 30%, respectively. These results refer to a full model (all genes in the subnetwork included). Compare to these three models, the SVM model is the strongest method in classifying metastatic and non-metastatic patients, but the speci city score is too low. The SVM model validated in GSE9195.
The SVM accuracy, sensitivity, and speci city scores for immune-related subnetwork are 78%, 78%, and 78%, respectively. The neural network accuracy, sensitivity, and speci city scores are 85%, 85%, and 14%, respectively. Eventually, the decision tree accuracy, sensitivity, and speci city scores are 71%, 71%, and 36%, respectively. These results refer to a full model (all subnetwork genes included). The speci city of the neural network and decision tree method is low compare to the SVM. Due to results, the SVM model is the most powerful method in classifying metastatic and non-metastatic patients for the immune-related subnetwork. The immune-related subnetwork accuracy, sensitivity, and speci city for the SVM model are superior to EMT related subnetwork.
The feature selection methods were implemented. The accuracy, sensitivity, and speci city of feature The accuracy, sensitivity, and speci city are 0.868, which is superior to GSE7390. The results con rm that the immune-related genes detected in this study can classify metastatic and non-metastatic samples more precisely compared to the neural network and decision tree models in two data sets. We implemented the classi cation methods to assess the metastasis potential of two nominated subnetworks which extracted from CTCs' data. We tted the Cox-PH as a predictive model to discriminate patients to low, medium, and high metastasis risk groups.

| Distant metastasis and overall survival analysis
The association between gene expression and distant metastasis-free survival /overall survival was implemented to detect metastasis potential genes in selected subnetworks. The JUP, KRT18, and KRT19 overall survival and distant metastasis-free survival are signi cant by log-rank test (p-value < 0.05) (Fig.  4a, b, c, d, e, and f). These three genes belong to EMT subnetwork. The high expression level of JUP, KRT18, and KRT19 associate with low overall survival and further lower metastasis progression. The distant metastasis-free survival and overall survival curves indicate the same pattern.
We tted a metastasis free-survival Cox-PH regression model for EMT and Immune subnetworks to assess metastasis progression and patients' predictive models. The Immune Cox-PH model includes of RUFY1 and P2RX1 variables (Likelihood ratio test p-value = 0.0295). The EMT Cox-PH model includes of RAN, PEBP1, KRT8, DSP, DDR1, and CLDN4 variables (Likelihood ratio test p-value = 0.0001016). The variables' coe cients and p-values reported in Table S4 and S5. All the signi cant genes in the model have VIF < 2 to avoid multicollinearity (Table S6 and S7). The proportional hazard assumption for two model variables was assessed by the Schoenfeld residuals ( Fig. S2 and S3). The predictive Cox-PH models for distant metastasis-free survival for two subnetworks are illustrated in Fig. 4g and h. The concordance index, as a model evaluation measure, for EMT and Immune predictive Cox-PH models are 0.7 and 0.6, respectively. Therefore, the EMT model is more powerful in discriminating patients into low, medium, and high metastasis risk groups compare to the Immune model (higher concordance index indicates more power in discrimination).

Discussions
Whereas multiple studies on circulating tumor cells (CTCs) as single CTCs or metastatic microemboli (CTC clusters) have been conducted, the molecular mechanisms of such rare cells are insu ciently characterized. Several metastatic potentials of CTC to overcome many restrictions on extravasation of the primary tumor microenvironment, survival in the bloodstream, and successfully colonize secondary organs are still a mystery. Therefore, a better understanding of the molecular features of CTCs' types is needed.
This study aims to explore metastasis clues. We have implemented the co-expression analysis to detect subnetworks discriminating single/cluster CTCs (Fig. 1). Two of subnetworks indicated a high correlation to the trait (single/cluster status of CTCs). These two illustrate immune-and EMT-related pathways that mediate bloodborne dissemination of cancer cells (Figs. 2a and b). Due to previous studies, the immuneassociated mechanisms and EMT are of two major arms in breast cancer progression and metastasis, but investigating them in CTCs is not studied well [27]. To prepare cancer cells for intravasation, the keratin family, claudins, and cadherins must be downregulated through the EMT process in primary tumors. But, due to the surviving urgency of CTCs in the bloodstream, and also avoiding anoikis, a small number of tumor cells must be attached and break off from the primary site [28,29]. Therefore, the keratins, claudins, and cadherins should be upregulated in CTC clusters to survive shear forces in blood circulation. The plakoglobin (JUP), KRT8, KRT18, KRT19, CLDN4, and EPCAM are such essential genes that their role in breast cancer metastasis demonstrated in previous studies [4,29,30].
The KRT8, KRT18, KRT19 are a group of cytoskeleton genes within the cellular cytoplasm called keratins. Although they extensively used as diagnostic tumor markers, several studies have demonstrated their involvement in cancer cell invasion and metastasis, as well as in treatment responsiveness. Keratins are the intermediate lament-forming proteins of epithelial cells that organize the internal three-dimensional cellular structure also act in cell shape maintenance by bearing tension. [31]. Therefore, they may play an essential role in CTC clusters in the bloodstream shear forces. The plakoglobin is one of the cell junction genes that hold tumor cells together in CTC clusters. And its upregulation in breast cancers CTC clusters compare to single cells demonstrated in Aceto, Nicola, et al. study [4]. In our study, the overexpression of JUP, KRT18, and KRT19, as well as the overall survival and metastasis-free survival, are signi cant either (Fig. 4a or d). The EPCAM and cytokeratins are reported as detection markers in the Enrichment of CTCs [32]. These markers guide scientists to detect metastatic patients. But, the rest of the genes are not investigated in CTC studies, including CTC detection or molecular mechanisms. So, they may be a good alternative in cluster CTCs studies. There are several types of immune cells that ambiguously reveal antiand pro-tumor behaviors. The immunosuppressive microenvironment of tumors protects the primary tumor cells. But, while tumor cells extravasate and enter circulation, they lose their tumor protection. Therefore, they must adapt themselves to escape immune surveillance [1,33]. The interplay between immune cells and cytokeratins contributes to CTCs evasion from immune surveillance. The cytotoxic T lymphocytes (CTLs) recruited by recognizing tumor antigens presented by major histocompatibility class I (MHCI) [34,35]. The under-expression of MHCI in tumor cell surface guides them to hide from CTLs, and thereby survive in circulation. Moreover, the overexpression of cytokeratins such as KRT8, and together with heterodimeric partners KRT18 and KRT19 inhibit MHCI interactions with CTLS [33,34]. All these foundings are consistent with our results (overexpression of KRT8, KRT18, and KRT19; under-expression of HLA-E) for CTC clusters, which show the CTC cluster potential to evade the immune system ( Fig. 1a   and b). Several studies support the association between EMT and immune cell escape [36,37]. Moreover, a plethora of genes and signals support stem-cell support pathways such as Wnt, TGF-β, and NOTCH [6]. Downregulation of DAB2 (putative tumor suppressor) is reported in breast cancer, which promotes EMT and also involves in the TGF-β pathway [38,39]. DAB2 downregulation might be related to the stemness phenotype acquired from EMT, which helps CTC clusters to escape the immune system. DAB2 is founded in our immune subnetwork, and it is downregulated in CTC clusters (Fig. 1b). Of note, although the CTC clusters have higher metastatic potential due to less frequency in metastatic patients, the single cells contribute metastasis either. Several studies such as Szczerba, Barbara Maria, et al. indicate more singlecell detection in metastatic patients (about 88.0%) [40]. Therefore, biological mechanisms of metastasis in the single and clusters CTCs need to be investigated in more detail.
The coordination among different cancer-associated pathways is reported in various studies. Atashgaran, Vahid, et al. investigated the crosstalk between the immune response pathway and hormonal regulation pathways, such as the uctuation of menstrual cycles. The dis-regulation of hormonal factors may affect on genome instability and decrease of immune surveillance [41]. Accordingly, such pro-metastatic crosstalks in breast cancer contribute to progression and metastasis. The crosstalk among immune responses, EMT, and stemness pathways such as embryonic pathways are reported in Takebe, et al. study. The cancer stemness signaling pathway includes ErbB, Wnt, and transforming growth factor (TGF)-β pathways are essential during embryogenesis, moreover, in the stimulation of self-renewing in tumor cells [42]. Such pro-metastatic pathways and the interplay among all of them may happen in circulating tumor cells (Fig. 2c and Fig. 3). These pro-metastatic cells may show different signaling pathways and interrelationship among them to act like multi-role players with great metastatic potentials. The CTCs are tumor cells that show primary tumor characteristics and likewise more additional metastatic propensity to survive in blood-stream and extravasation capabilities. As a result, characterizing multiple aspects of CTCs' involvement in cancer progression is essential and useful in patients' treatment strategies.

Conclusions
Circulating tumor cells as single CTCs or CTC clusters indicate metastatic potentials in ER+ breast cancer patients. However, the CTC clusters show much higher pro-metastatic potentials. They bene t several metastatic capabilities to extravasate, surviving in blood-stream, and nally extravasate to secondary tissues.
CTCs include considerable dis-regulated pathways such as hormonal pathways, EMT, embryonic pathways, and also immune responses. Identi cation of several cancer-associated pathways in CTCs offers a continuum of potential therapeutic targets. Exclusively, the uctuation of menstrual cycles and their effect on ER+ breast cancer pathways are less considered in the clinic. The interplay among all pathways and key genes that mediate such bridges among metastatic pathways are essential in cancer biology study and accordingly in therapeutic guidelines.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.