Breast cancer treatment is based on tumor biology and tumor stage. Therefore, early detection has been an important step toward improving the curation rates observed over the last several decades. Today, it is common practice in industrialized countries to screen the female population for breast cancer on a regular basis in national programs based on mammography. Improved mammography approaches using machine learning for deeper and more accurate image analysis are therefore the next logical step in an effort to detect breast cancer as early as possible to improve treatment and curation options (14, 15). Nevertheless, the tremendous technical and timely effort, physical discomfort during the procedure and monetary aspects of this technique could lead to the use of an easy, fast, and cost-effective prescreening method, which in the case of a positive finding would lead to an additional imaging method.
Furthermore, early information about tumor biology would likely be useful for stratifying consecutive imaging and work-up procedures.
Stratifying cancer patients based on noninvasive methods is currently a tremendous challenge. Especially in breast cancer, the diagnosis of TNBC has much more severe implications for the patient than a luminal A type tumor. Therefore, detecting this disease noninvasively and obtaining further information on the type of tumor would be extremely beneficial. This approach would give the treating physician a distinct advantage for subsequent work-up and treatment decisions.
Here, we present the first tightly controlled miRNA sequencing effort of urine samples from breast cancer patients to gain insight into how the miRNA genome is regulated in this disease and its intrinsic subtypes. Earlier efforts from our group focused on specific miRNAs known to be regulated in breast cancer using a proprietary miRNA amplification paradigm (9). Nevertheless, in the current approach, we implemented miRNA sequencing as an innovative approach for urinary analysis to understand how many miRNAs in the currently known genome are regulated in breast cancer and whether consecutively identified signatures might represent specific subclasses of BC, allowing their detection from noninvasive urine samples.
We found the let-7-miRNA family to be strongly represented in the cancer cohort, as would be expected from studies on other cancer entities using different methods of detection. The Let7-miRNAs are dysregulated in lung (16), pancreatic (17), colorectal (18), and papillary thyroid (19) cancers and, as recently described, in breast cancer (20). Let7 was further shown to regulate cancer stemness (21).
Apart from these initial findings, we also detected considerable variability among the top regulated miRNAs in some samples (e.g., variability of let-7c expression in healthy individuals [Figure 2A]), making an individual diagnosis of breast cancer or its subclasses less reliable. We therefore applied a machine learning approach to the sequencing data to investigate whether the patterns of multiple miRNAs would be more informative than those of several strongly differentially regulated miRNAs. Interestingly, the random forest approach outclassed the decision tree, logistic regression and SVM so dramatically, making it the method of choice for future analysis of miRNA sequencing data from urine samples.
An increase or decrease in a single given miRNA did not seem to have as much impact as the whole “signature” of miRNA expression changes (Fig. 2 vs. Figure 4). The detection of very specific subsets of miRNA patterns, specifically identifying both breast cancer patients and even their specific intrinsic subtypes, is innovative and, thus far, not known. Nevertheless, more surprisingly, these patterns of miRNAs overlap very little with each other; on average, only 10–15% of miRNAs are commonly regulated, whereas most miRNAs clearly identify a subgroup or breast cancer in general. This, to our knowledge, has not been shown before and raises the question of whether previous data should be reanalyzed with a more unbiased approach to possibly identify yet unknown patterns. However, only a machine learning approach can unravel this issue, as has been shown in other fields of research (22–24).
Consecutively, an important focus of further research should be the reduction and minimization of miRNAs included in our identified distinct miRNA pattern. The applicability of our technology for screening or early detection also relies on the sensitivity, specificity, false positive and false negative rates. The optimization of these pertinent parameters relies on large cohorts of patient and healthy control samples, which have been analyzed for this purpose.