Peripheral Blood Monocyte Abundance Predicts Outcomes in Breast Cancer Patients

Biomarkers of response are needed in breast cancer to stratify patients to appropriate therapies and avoid unnecessary toxicity. Peripheral blood gene expression and cell type abundance were used to identify biomarkers of response and recurrence in neoadjuvant chemotherapy treated breast cancer patients. Higher peripheral blood monocyte abundance after neoadjuvant chemotherapy was associated with improved prognosis in multiple independent cohorts of breast cancer patients.


Main
Neoadjuvant chemotherapy (NAC), the standard of care for many breast cancer patients, is known to have systemic immunologic effects and is increasingly being used in clinical trials in combination with immunotherapeutics. Currently, there are few biomarkers to predict NAC or immunotherapy response, although response to NAC is known to be associated with long term outcome in breast cancer 1 . Thus, biomarkers are needed to identify patients who will bene t from combination therapy compared to those who are likely to respond to NAC alone, and thereby avoid the added risk of toxicity and nancial burden.
Peripheral blood is an attractive site of biomarker development due to the relative ease of longitudinal sampling. We have previously shown that high expression of a cytotoxicity gene signature in the blood following NAC is associated with the presence of residual disease (RD) and future breast cancer recurrence, demonstrating the feasibility of using blood-based transcriptional biomarkers 2 .
RNA sequencing was performed on whole blood of 53 breast cancer patients after completion of NAC (if received) and prior to de nitive surgery ( Fig. 1a; n=23 RD, 9 pathologic complete response (pCR), 21 no NAC; Table 1). We strati ed patients with RD by whether they experienced a breast cancer recurrence within three years of surgery (RD-R) or remained free of recurrence for three years (RD-nR). Using DeSeq, we identi ed 1,238 (FDR corrected q-value <0.1) differentially expressed genes between pCR and RD samples (Supplemental Table 1) 3 . Using gene set enrichment analysis (GSEA), we collapsed differentially expressed genes into pathways using the Molecular Signatures Database hallmark gene sets 4,5 . Hallmark Interferon (IFN) Gamma Response (q-value <0.0001; normalized enrichment score (NES)=3.32), Hallmark Interferon Alpha Response (q-value <0.0001; NES=3.14), and Hallmark Complement (q-value=0.000111; NES=2.29) pathways were signi cantly enriched in the blood of patients experiencing pCR compared to those with RD (Fig. 1b). No pathways were statistically signi cantly upregulated in RD samples relative to pCR samples. To evaluate the genes involved in these pathways, we identi ed the leading-edge genes from each pathway (IFN gamma = 49 genes; IFN alpha = 26 genes; complement = 15 genes) and selected only the unique genes (n=60 genes). There is strong, uniform upregulation of many of these genes in many of the pCR samples, regardless of TNBC status ( Supplementary Fig. 1). We combined expression of these genes into an IFN/complement score, calculated as sum of z-scores divided by number of genes in the signature (n=60 genes). We compared expression of the IFN/complement score to a previously published 8 gene cytotoxic score (FGFBP2 + GNLY + GZMB + GZMH + NKG7 + LAG3 + PDCD1 -HLA-G) 2 . No genes overlapped between the two signatures. Samples with the highest expression of the IFN/complement had low expression of the cytotoxic score and tended to be pCR samples. Conversely, those with highest expression of the cytotoxic score tended to have low expression of the IFN/complement score and be RD samples (Fig. 1c). A combination peripheral immunologic response score (PIRS) of IFN/complement score minus cytotoxic score had improved predictive power compared to either signature alone (Fig. 1d). To examine which cell types predominately express each signature, we used single cell RNA sequencing data from whole peripheral blood mononuclear cells (PBMCs) from two breast cancer patients post-NAC, prior to surgery 2 . Expression of the cytotoxic score was the highest in CD8+ T cells and natural killer cells, while the IFN/complement score was the highest in a subset of monocytes. There was very little co-expression of the signatures across cells (Fig. 1e).
Given the partitioning of the gene expression scores into cell types, we next aimed to identify whether there were differences in cell type abundances between the outcome groups. CIBERSORTx was used to deconvolute relative cell type abundance from the RNA sequencing data 6 (Fig. 2a). Relative monocyte abundance was highest in samples with pCR, intermediate in those with RD, and lowest in samples not receiving NAC (Fig. 2b). Naïve B cells were also statistically signi cantly different across groups, being highest in no NAC samples and lowest in pCR samples ( Supplementary Fig. 2a). However, only monocytes followed the trend of increases from no NAC to RD-R to RD-nR to pCR ( Supplementary  Fig. 2b). Additionally, the routine nature of clinically measuring monocytes made monocyte values an intriguing metric for further study. We reviewed electronic medical records and extracted monocyte values from complete blood counts for patients receiving NAC in this cohort. 23 out of 32 (72%) of NAC-treated patients (n=7 pCR, 12 RD-nR, 4 RD-R) had a complete blood count with differential (which includes monocyte values) in the 30-day interval prior to surgery (following completion of NAC), indicating the commonality of collecting this information clinically. Clinically measured monocyte values were signi cantly positively correlated with monocyte values inferred by CIBERSORTx (R=0.51, p=0.012; Supplementary Fig. 2c), even though monocyte values were not always collected on the same day as the blood for RNA sequencing (though in the same 30-day window). Post-NAC, but not pre-NAC, monocytes were signi cantly higher in patients with pCR compared to those with RD ( Fig. 2c; Supplementary  Fig. 2d).
Next, we sought to assess whether monocyte abundance was associated with outcome in independent cohorts. Higher monocytes were also seen with pCR in an additional cohort of 41 TNBC patients (n=18 RD, 23 pCR; placebo arm of the GeparNuevo study; Table1), though this association was not statistically signi cant (p=0.0638 for absolute monocyte counts, p=0.186 for relative monocyte frequencies, onetailed Wilcoxon; Supplementary Fig. 2e) 7 . In this TNBC only dataset PIRS, measured by NanoString, was not associated with outcome, indicating independence of monocytes and PIRS measurements ( Supplementary Fig. 2f). In an additional independent cohort of 14 hormone receptor positive (HR+) HER2-breast cancer patients from the Instituto Valenciano de Oncología, monocytes tended to be higher in patients without metastatic recurrence, with at least four years of follow-up time for each patient (p=0.0949, one-tailed Wilcoxon; n= 5 with metastasis, 9 without metastasis; Supplementary Fig. 2g; Table  1). Using a de-identi ed medical record database called the synthetic derivative (SD), we identi ed 110 breast cancer patients (VICC-SD; n=35 pCR, 75 RD; Table 1) who had been treated with NAC, had a breast surgery, and had a monocyte value within 30 days prior to surgery. In the VICC-SD cohort, relative frequencies of monocytes were statistically signi cantly higher in patients with pCR compared to those with RD (Fig. 2d). This effect was more pronounced when considering only the TNBC patients (n=50), which may be re ective of underlying TNBC-speci c biology, or the more uniform treatment options for TNBC (chemotherapy rather than targeted therapy agents) (Supplementary Fig. 2h). In all cohorts, patients who had received GM-CSF products in the 30-day window prior to surgery were excluded from analysis as this may affect monocyte counts. Taken together, these data suggest that higher blood monocyte levels post-NAC may be indicative of superior outcomes in breast cancer patients.
Peripheral blood gene expression scores and cell type abundance may be useful biomarkers of NAC response and outcomes in breast cancer. We identi ed an immunologic gene signature (PIRS) that was highest in patients with the best outcomes (pCR) and lowest in those with the worst outcome (RD with recurrence). However, PIRS was not associated with outcome in a separate cohort of TNBC only patients.
Additional studies are needed to test whether PIRS or other gene expression scores may be useful biomarkers in breast cancer patients. Higher peripheral monocytes, a standard clinical assay performed on most breast cancer patients, was associated with improved patient outcomes (pCR or lack of recurrence) in four independent breast cancer patient cohorts. Additional efforts are needed to explore whether there might be a causal link between chemotherapy induced monocyte mobilization and improved response. Taken together, these results suggest that peripheral blood biomarkers following NAC may be useful in predicting long-term outcome. Future work will explore the utility of peripheral blood biomarkers in predicting immunotherapy response.

Materials & Methods
Patients. For all cohorts, data use was approved by the relevant ethics committee, institutional review board, and national competent authority and adheres to the ethical principles of the Declaration of Helsinki. For the VICC-1 cohort, clinical and pathologic data were retrieved from medical records under an institutionally approved protocol (VICC IRB 030747). For the VICC-SD cohort, clinical and pathologic data were retrieved from the deidenti ed synthetic derivative medical record under and institutionally approved protocol (VICC IRB 202207). First, all female patients with International Classi cation of Diseases, 9th/10th Revision, Clinical Modi cation (ICD-9/10-CM) billing codes for malignant neoplasm of breast were identi ed. Next, from this cohort, patients with at least one breast cancer surgical procedure using Current Procedural Terminology (CPT) codes and exposure to cyclophosphamide and doxorubicin within 12 months preceding the surgery date were automatically selected. The ICD and CPT codes used for this purpose are listed in the Supplementary Tables 2 and 3, respectively. Finally, the electronic health records of the selected patients were manually reviewed to extract the VICC-SD cohort used in this study. To guide the manual chart review process, additional clinical information was automatically extracted from the SD.
This includes breast cancer surgery dates, dates when patients received cyclophosphamide and doxorubicin, laboratory measurements, other chemotherapy or cytokine-related medications and date of administration, pathology reports and operative notes. The GeparNuevo cohort consisted of patients from the placebo arm of the GeparNuevo trial (registration number NCT02685059) 7 . The Instituto Valenciano de Oncología cohort consisted of HR+HER2-patients treated with neoadjuvant chemotherapy. All data use was approved by the relevant ethics committees. RNA Sequencing. RNA was extracted from 0.5-2mL of whole blood or processed PBMCs using the Promega Maxwell RSC simplyRNA Blood kit (AS1380, Promega). Total RNA quality was assessed using the 2200 TapeStation (Agilent). Library preparation was done with a ribo-depletion total RNA library preparation kit and 150 bp paired-end sequencing was performed on the Illumina NovaSeq 6000 targeting an average of 10M reads per sample. Quality control was evaluated at different levels, including RNA quality, raw read data, alignment, and gene expression. Raw RNA-seq paired-ends were mapped to the human reference genome hg19 using STAR 2.7.3. Raw reads count matrix was calculated by featureCounts and used for downstream analysis. 25 genes were removed from the RNA sequencing analysis and transcripts-per-million (TPM) calculation due to over representation. These genes represent red blood cell contamination of the PBMC. The removed genes are: "RN7SL1", "RN7SL2","HBA1", "HBA2", "HBB", "HBQ1", "HBZ", "HBD", "HBG2", "HBE1", "HBG1", "HBM", "MIR3648-1", "MIR3648-2", "AC104389.6","AC010507.1", "SLC25A37", "SLC4A1", "NRGN", "SNCA", "BNIP3L", "EPB42", "ALAS2", "BPGM", "OSBP2". DESeq2 was used to identify differentially expressed genes and apeglm was used for log fold change reduction 3,8 . GSEA was used to identify pathways using the Molecular Signatures Database hallmark gene sets 4,5 . CIBERSORTx was used in relative mode with 500 permutations and the LM22 reference matrix 6 . Simpli ed cell type categories were collapsed as follows: CD4 T cells = Memory

activated CD4 T cells + memory resting CD4 T cells + naïve CD4 T cells + T regulatory cells. B cells = Naïve B cells + Memory B cells + plasma cells. NK cells = activated NK cells + resting NK cells. Other
Myeloid = M0 macrophages + M1 macrophages + M2 macrophages + activated dendritic cells + resting dendritic cells. Notably, these populations were very low as these are all cell types not commonly seen in the peripheral blood. Granulocytes = activated mast cells + resting mast cells + eosinophils+ neutrophils.
NanoString nCounter Analysis. Gene expression was assessed on the GeparNuevo cohort using a custom NanoString Elements panel to measure PIRS genes according to the manufacturer's standard protocol. Brie y, RNA was extracted from processed PBMC pellets using the Promega Maxwell RSC simplyRNA Blood kit and 50 ng of total RNA was used for input into nCounter hybridizations. Data were normalized according to positive and negative spike-in controls, then endogenous housekeeper controls, and transcript counts were log transformed for downstream analyses.
Statistical analysis. All statistical analyses were performed in R. Single-cell statistical analyses were calculated in R using the Seurat package 9,10 . Shared Nearest Neighbors were calculated using the Harmony reduction, and clusters were identi ed at a resolution of 0.3. UMAP was performed for visualization, and missing values were imputed using ALRA 11 . Cell types were assigned to individual cells using SingleR 12 . BlueprintEncodeData was used as a reference 13,14 . Some heatmaps were generated using the R package Complex Heatmap 15 . P-value cut-offs displayed on plots correspond to "ns" equals p>0.05, * equals 0.01<p< 0.05, ** equals 0.001<p<0.01, *** equals 0.0001<p<0.001, **** equals p<0.0001.
Code used to generate gures is available on request.
Declarations Data Availability: The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request. fees from Novartis, and is listed as a coinventor on a provisional patent application on methods to predict therapeutic outcome using blood-based gene expression patterns, that is owned by Vanderbilt University Medical Center and is currently unlicensed. No potential con icts of interest were disclosed by the other authors. Figure 1 Expression of immune related genes in the peripheral blood is associated with good outcome following NAC. A) Schematic describing blood collection timing and downstream analyses. B) GSEA plots for gene sets enriched in blood of patients with pCR relative to RD. C) Expression of IFN/Complement and cytotoxicity scores is shown for each sample. D) Cytotoxic, IFN/complement, and combined peripheral immunologic response scores (PIRS; IFN/Complement score minus cytotoxic score) are shown for each sample, strati ed by outcome. Box plots show the median, rst and third quartiles. P-values represent FDR corrected Wilcox tests. E) Single cell sequencing data shows cell type speci c expression of each score. Cytotoxic score is shown in red. IFN/Complement score is shown in blue. Co-expression would be shown in pink.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.