In this study, we conducted a comprehensive characterization of proteomic differences between benign and malignant breast tumors within a small cohort of Mexican women. Aided by multivariant statistical analysis, we exhibited a distinctive four-protein signature capable of differentiating between these two tumor types. Subsequent bioinformatic analysis revealed an intricate involvement within the extracellular matrix of these signature proteins. To validate the significance of our findings, we undertook a reanalysis of published proteomic and transcriptomic databases (encompassing diverse ethnic groups, larger sample sizes, and stratified by molecular breast cancer subtypes) and this enabled us to assess the capacity of our identified signature in discriminating malignant breast tumors from nonmalignant breast tissues. Notably, our investigations revealed consistent dysregulated (i.e., downmodulation) of all four signature proteins in breast cancer cases. Interestingly, the signature proteins presented a more pronounced downmodulation in highly aggressive breast cancer subtypes (such as Basal-like and HER2) compared to less aggressive subtypes (Luminal A).
Mammography has traditionally served as the primary screening tool for early detection of breast tumors. Breast ultrasound and magnetic resonance imaging (MRI) are used when mammograms prove insufficient or as an additional diagnostic test7. Molecular testing has become invaluable for cancer diagnosis, prognosis, and therapy guide selection. However, these tests are cancer type-specific, leading to disparities in research focus. Immunohistochemical biomarkers, such as estrogen receptor, progesterone receptor, HER2, and ki67, are commonly employed for breast cancer molecular subtyping36. New diagnostic approaches have emerged to address the limitations of immunohistochemical classification associated with the heterogenetic biology of breast cancer. In 2000, Perou et al., proposed novel molecular subtypes, including Luminal A, HER2-enriched, and Basal-like, based on gene expression profiling using hierarchical clustering analysis37. Subsequently, a 50-gene subtype predictor (denominated as PAM50 classifier) emerged based on microarray and quantitative PCR11. This supervised risk predictor demonstrated a high agreement in classification with that reported by Perou et al. Over the past decade, the PAM50 classifier has proven value in predicting disease prognosis and response to drug treatments, surpassing clinical factors and immunohistochemical classification38–44. Recent studies have employed quantitative proteomics to gain insights into the subtyping of breast cancer, considering that genetic and RNA-level alterations do not always translate into protein synthesis. These studies, while aligning with PAM50 subtyping, have identified new subtypes within PAM50 groups12,45–48, some of which have been correlated with distinct clinical outcomes12. However, despite extensive research on breast cancer subtyping, distinguishing between benign and malignant tumors at the molecular level remains understudied. Therefore, our study involved a rigorous quantitative proteomic analysis of surgically derived benign and malignant breast tumors aimed at identifying markers of malignancy. Employing SWATH-based quantification, renowned for its high reproducibility49,50, and following stringent filtering criteria we retained 1,221 proteins without missing values for all bioinformatic analyses. It is worth mentioning that we opted against sample subfractioning, a method that could have expanded the proteome coverage. Our approach was to maintain simplicity in data generation, focusing on a straightforward strategy for the study.
Upon initial analysis, a distinct separation between benign and malignant tumors became evident through unsupervised analysis (PCA and hierarchical clustering analyses). To comprehensively understand these differences at a systems level, we focused on the sets of dysregulated proteins in malignant tissue. Their statistical enrichment (or overrepresentation) was assessed against diverse pathway systems databases and the GO database. The biological interpretation of the molecular differences revealed that upregulated proteins in malignant tumors were primarily linked to proliferation, specifically the metabolism of DNA/RNA, a phenomenon extensively studied51 and associated with the Warburg effect in cancer cells52. However, for a deeper understanding of the altered biochemistry in malignancy, we broadened the analysis to include both upregulated and downregulated proteins in malignant tumors. This expanded approach was employed in the enrichment analysis against REACTOME database21. Remarkably, pathways such as extracellular matrix organization, platelet degranulation, and innate immune system emerged as top enriched terms, integrating both upmodulated and downmodulated proteins. On the contrary, the metabolism of RNA was predominantly represented by upregulated proteins.
Upon examining the specific protein members within each pathway term, we predominantly observed a non-redundant composition (i.e., different proteins per term) per pathway, as illustrated by PPI network visualizations. Notably, we observed significant enrichment in the platelet degranulation pathway, which is thought to contribute to tumor progression and metastasis53. Previous studies by Brunoro et al.54, Braakman et al.55, and Tang et al.24 had independently identified dysregulation of proteins in this pathway in various breast cancer samples, including nipple aspirate fluid, laser capture microdissection, and fresh-frozen tissue, in comparison to non-cancerous breast tissue. In our comparative analysis, the dysregulation of specific proteins - KNG1, CFL1, ALDOA, HRG, TF, and A2M- remained consistent across all studies, regardless of the sample type or collection technique.
Furthermore, we observed enrichment of the innate immune system pathway, with a majority of dysregulated proteins falling within this category. According to findings from Brunoro et al.,54, Braakman et al.,55 and Tang et al.,24, our study identified dysregulation of proteins associated with the immune system in malignant breast tissue compared to non-cancerous tissue. Specifically, proteins like ANPEP, C3, HSP90AA1, and TTR exhibited consistent dysregulation across all studies, indicating their potential significance in breast cancer pathology. Additionally, we found an increased abundance of complement system proteins (C2, C6, C9, C8A, C8G, among others) in malignant tissue, a pattern corroborated by previous research highlighting elevated levels of complement system proteins in malignant breast tumors56. Activation of the complement system is theorized to promote tumor growth and metastasis according to animal models57.
Another relevant enriched pathway was the extracellular matrix organization, which was primarily composed of downmodulated proteins, including collagen and laminin subunits, along with cell adhesion proteins. These results aligned with previous studies by Tang et al.,24, Cha et al.,58 and Braakman et al.,55 evidencing dysregulation of extracellular matrix-associated proteins in malignant breast tissue compared to non-cancerous counterparts. The extracellular matrix, known to regulate critical cellular processes such as proliferation, differentiation, and migration, has established connections with tumorigenesis and metastasis59.
In our pursuit to move beyond mere molecular or pathway descriptions associated with malignant tumors, we mined our data deeper, aiming to discern a distinct signature, or set of proteins capable of differentiating between malignant and benign tumors. Employing unsupervised and intuitive analyses, we identified a cluster of proteins linked to the extracellular matrix, namely DCN, LUM, OGN, and COL14A1, all of which exhibited significant downregulation in malignant tumors.
To validate our findings’ relevance and comprehend the roles of these proteins in breast malignancy, we conducted a rigorous reanalysis of published proteomics24,26 and transcriptomics28 datasets. Across these datasets, we consistently observed the dysregulation of all identified proteins and through manual scrutiny of reported differential abundance analyses in comparable proteomic studies55,58, we further confirmed the dysregulation of these proteins in malignant breast tissue, reinforcing their potential significance in breast malignancy. Moreover, inspecting the results from other studies, we observed the dysregulation of LUM and DCN (OGN and DCN were not measured) in malignant breast tissue compared to their non-cancerous counterparts, as demonstrated through RT-PCR analyses60. It is important to note that these compelling results emerged from the analysis of diverse types of breast tissues, including fresh-frozen, laser capture microdissection, and formalin-fixed paraffin-embedded tissue samples. This diversity underscores the robustness of our findings, encouraging that these dysregulations persist regardless of the sample type, enhancing the reliability and generalizability of our conclusions.
Considering the well-documented variation in mortality rates among different breast cancer molecular subtypes35,61,62 we aimed to explore potential differences in the abundance of signature-associated proteins across these subtypes. Although the limited size of our cohorts precluded an in-depth investigation, we conducted a reanalysis of larger proteomic studies encompassing breast cancer subtypes according to the PAM50 classification11. Interestingly, in two studies12,24, we observed consistently lower abundance levels of the four identified proteins in Basal-Like subtype (with a trend toward lower levels in the HER2 subtype) compared to Luminal A subtype. Another study, while not statistically significant, also indicated a trend toward lower abundance levels45. Additionally, a recent study reported lower mRNA abundance levels of COL14A1 (DCN, OGN, and LUM were not quantified) in HER2 and Basal-Like subtypes compared to Luminal subtype 63. It is noteworthy that Basal-Like and HER2 subtypes are associated with worst survival rates compared to Luminal A subtype24,61,62.
The proteins COL14A1, DCN, OGN, and LUM play crucial roles in the extracellular matrix organization. DCN, OGN, and LUM belong to the group of small leucine-rich proteoglycans64 and have been linked to tumor growth suppression in both in-vitro and in-vivo models65–68. These proteoglycans are thought to exert their effects by modulating intracellular signaling pathways, extending beyond their conventional role in extracellular matrix organization. For instance, DCN is believed to act as a defense mechanism, countering abnormal cancer cell growth by downregulating EGFR activity and inducing the expression of p21 and p27, inhibitors of cyclin-dependent kinases that lead to cell cycle arrest65,69. Similarly, in-vitro and in-vivo studies have hinted at the involvement of TGF-β270 and p2171, as well as the PI3K/Akt/mTOR68 and EGFR72 pathways in the tumor suppression effects exerted by LUM and OGN, respectively. Considering this evidence, it is proposed that these proteins act as oncosuppressive proteoglycans. Their reduced abundance in breast tumors raises the hypothesis that this decrease may facilitate tumor growth and metastasis and lead to higher rates of mortality. Supporting this notion, Muraoka et al., demonstrated that the protein levels of LUM, OGN, and COL14A1 were lower in subjects with breast cancer classified as low-risk compared to high-risk, as determined by the MammaPrint diagnostic test73. Likewise, other studies have associated reduced levels of DCN and LUM with worse prognosis74. However, the dynamic behavior of these proteins in breast cancer remains unknown. Further research is needed to determine whether breast cancer actively promotes the dysregulation of these proteins or if breast cancer develops because of factors influencing their expression or abundance levels.
A limitation of our study is that subjects underwent chemotherapy before the collection of breast samples. This may have affected the molecular integrity of malignant tumors, potentially masking molecular distinctions compared to benign tumors. Ideally, samples should have been obtained prior to chemotherapy. However, despite this limitation, we observed clear differences between tumors, and our findings were corroborated using diverse datasets, thereby enhancing the robustness and confidence in our results.
In conclusion, our study focused on identifying a distinctive protein signature for differentiating malignant and benign breast tumors through quantitative proteomics and comprehensive bioinformatics (Fig. 6). Despite our limitation in sample size, our meticulous reanalysis of published datasets provided robust validation for our findings. The consistent downregulation of COL14A1, DCN, OGN, and LUM (with intrinsic associations as members of the extracellular matrix) in malignant tumors, especially in aggressive Basal-Like and HER2 subtypes, was evident across diverse datasets. This innovative methodology not only highlighted the potential of these proteins as reliable markers for malignancy but also hinted at their role in breast tumor progression. Our study demonstrates the power of integrative analyses, transcending sample size constraints, and offers valuable insights for precise diagnostics and targeted therapeutic interventions in breast cancer.