Machine learning approach identifies miRNA signatures for breast cancer detection and classification from patient urine samples

doi:10.21203/rs.3.rs-3993094/v1

Download PDF

Research Article

Machine learning approach identifies miRNA signatures for breast cancer detection and classification from patient urine samples

https://doi.org/10.21203/rs.3.rs-3993094/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

Introduction

Breast cancer is the most common cancer in women, with one in eight women suffering from this disease in her lifetime. The implementation of centrally organized mammography screening for women between 50 and 69 years of age was a major step in the direction of early detection and lead to a significant improvement in cure rates. Within the screening program, women undergo a mammogram every two years with an implemented centralized quality-controlled review process. However, the participation rate reached only approximately 50% of the eligible women. In addition to several others, the technical aspects of mammography, including painful compression of the breast, are cited as a reason for not participating in this very important program. Therefore, focusing current research on less painful and less invasive techniques for the detection of breast cancer seems to be highly clinically useful. Liquid biopsies offer this option with distinct molecules or cells in line with the research. Blood-based tests of circulating tumor cells (CTCs), cell-free DNA (ctDNA) and cell-free miRNA have been performed in a variety of studies and tumor entities.

Methods

We performed miRNA sequencing on 82 urine samples, 32 samples from breast cancer patients (9× luminal A, 8× luminal B, 9× triple-negative and 6× HER2) and 50 healthy control samples. Data were analyzed and interpreted using Random Forest analysis.

Results

We identified a signature of 275 miRNAs that allows the detection of invasive breast cancer in urine from breast cancer patients. Furthermore, we identified distinct miRNA expression patterns for the major intrinsic subtypes of breast cancer, specifically luminal A, luminal B, HER2-enriched and triple-negative breast cancer.

Conclusions

Here, we present the first approach for sequencing miRNAs in female urine to detect breast cancer and, subsequently, intrinsic subtype-specific miRNA patterns. This experimental approach specifically validates miRNA sequencing as a technique for breast cancer detection in urine samples and opens the door to a new, easy and painless procedure for regular breast cancer screening.

Breast cancer

miRNA sequencing

urine

luminal A

luminal B

HER2

TNBC

screening

patient classification

Breast cancer (BC) is still the leading cause of cancer-related deaths among the female population, affecting 1 in 8 women in her lifetime (1). Moreover, this disease still places an enormous burden on healthcare systems worldwide. In addition to tumor biology, which guides treatment options and therapeutic needs in general, tumor stage is still an important risk factor for treatment decisions. The implementation of mammography-based early detection programs consecutively led to a decreased tumor size (T stage) and a decreased rate of axillary lymph node involvement (N stage), thereby improving survival rates in patients with breast cancer (2). Almost 100% of German women aged 50 to 69 years are invited to undergo mammography screening every 2 years. However, since its implementation in 2009, the participation rate has stagnated at approximately 50% (3).

Therefore, it appears reasonable that these programs could dramatically profit from new early detection methodologies, which are less invasive than the current standards.

MicroRNAs (miRNAs) are small noncoding RNA molecules that regulate gene expression by binding to specific target mRNAs, affecting their translation. They are found in many different cell types and tissues and have been shown to play a role in a wide range of biological processes, including cell growth and proliferation, differentiation, and apoptosis. Recent studies have shown that miRNAs can also be found in body fluids, such as blood and urine (4), where they can be easily isolated and quantified and can act as biomarkers for various diseases, including cancer (5, 6).

As biomarkers, miRNAs possess high potential for cancer detection because they are stable in body fluids, which allows easy collection and storage of samples (7). They are also present and detectable in very small samples, which makes them ideal biomarkers (8). Additionally, miRNA-based diagnostic tests can be noninvasive, which is especially important for the early detection of cancer.

Several studies have investigated the use of miRNAs as biomarkers for breast cancer and other gynecological malignancies. In breast cancer, several miRNAs, including miR-21, miR-155, and miR-205, have been identified as biomarkers for this disease(9–11). Studies have shown that the levels of these miRNAs are elevated in the blood of breast cancer patients compared to healthy individuals. Furthermore, an earlier study using qPCR analysis identified miR-424 and miR-423 as BC markers in urine as source materials(12, 13). Researchers have also shown that miRNAs can be used to distinguish between different types of breast cancer, such as invasive ductal carcinoma and invasive lobular carcinoma.

It is worth noting that while these studies have been promising, additional research is needed to fully establish the clinical utility of miRNA-based diagnostic tests for breast and ovarian cancer. There is still much to be learned about the potential of miRNA-based diagnostic tests. However, additional research is needed to fully understand the biology of miRNAs and how they are regulated in cancer cells. Additionally, additional studies are needed to validate the use of miRNA-based diagnostic tests in clinical settings, including larger, multicenter studies that will help to establish the specificity and sensitivity of these tests.

To investigate the feasibility of miRNA detection as a diagnostic tool for breast cancer, we investigated a cohort of 82 urine samples from BC patients and healthy individuals using miRNA sequencing for the first time. The goal was to identify all currently known miRNAs regulated in BC in a completely unbiased setting without prior selection of BC-specific miRNAs from the literature. Here, we present the first step toward the clinical use of miRNA sequencing in BC detection and provide insight into the stratification of BC patients utilizing this information.

Patient Selection and Urine Sample Preparation

Samples were collected at the University Hospital Aachen (ethics vote 206/09) and at the University Hospital Erlangen. Participants in Erlangen were recruited within the iMODE-B study (Imaging and Molecular Detection of Breast Cancer; ethical approval by the ethics committee of the Friedrich-Alexander-Universität Erlangen-Nuremberg; #325_19 B). Patients were eligible for inclusion if they had an indication for a diagnostic biopsy due to a suspicious breast lesion. The main aim of the iMODE-B study was to identify molecular markers that are predictive of patient prognosis and treatment response at the time of the first diagnosis of breast cancer. After the participants provided written informed consent in accordance with the Declaration of Helsinki, biospecimen sampling was performed (blood draw and urine).

A total of 355 urine samples were collected between November 2019 and July 2020. The urine samples were centrifuged at 944 × g for 10 minutes at room temperature to separate the cell pellet and cell-free supernatant, which were stored separately at -80°C until further use. For miRNA extraction, at least 7 ml of cell-free supernatant was available, 4 ml was used for miRNA extraction, and the remaining volume was used as a backup. From these 282 participants, 82 were randomly selected (n = 50 healthy controls and n = 32 cancer patients) for final analysis.

RNA isolation

RNA isolation was performed with the miRNeasy Mini Kit by Qiagen (#217004) following the instructions of the user manual. One 5 ml aliquot was isolated to ensure the same volume for each sample. After isolation, the RNA was stored at -80°C until cDNA transcription.

Quantitative PCR

Transcribed miRNA samples were analyzed using a TaqMan Advanced miRNA Assay (#A25576 Applied Biosystems) in combination with TaqMan Fast Advanced Master Mix (#4444557 Applied Biosystems). A Roche LightCycler 480 Instrument II (#05015243001) was used for detection. The samples and master mix were added to 384-well plates (#04729749001 LightCycler 480 Multiwell Plate white, Roche). For correct preparation following the manufacturer’s instructions, the samples were diluted with 0.1x TE. Samples were pipetted in triplicate on each plate per assay as well as the exogenous control ath-mir-159a.

qPCR statistics

The LightCycler 480 data were exported as MS-EXCEL files and analyzed. The resulting Ct values were analyzed using the ∆∆Ct method. The microRNA ath-mir-159a was used as a reference gene to normalize the data to the ΔCt, and samples from healthy donors served as the second reference to calculate the miRNA fold change.

GraphPad Prism software was used for statistical evaluation. Student´s t test was used to determine significant differences.

miRNA sequencing and statistical analysis

Sequencing libraries were prepared with the QIASeq miRNA UDI Library Kit (Qiagen, Hilde, Germany) according to the manufacturer’s instructions. To the recommended 4 µl sample input for biofluids, 1 µl of synthetic miRNAs from the QIASeq miRNA Library QC Kit was added as additional quality control. The quality of the libraries was checked on a Bioanalyzer or Tapestation (both Agilent, Waldbronn, Germany), and the libraries were quantified by a Quantus fluorometer (Promega, Madison, WI, USA). All the samples were sequenced on an Illumina NextSeq 500 instrument (Illumina, San Diego, CA, USA) in 72 bp single-end mode. Sequencing yielded a mean coverage of approximately five million reads per sample.

FASTQ files were generated using bcl2fastq (Illumina). To facilitate reproducible analysis, samples were processed using the publicly available nf-core/smRNAseq pipeline version 2.2.1 (Ewels et al. 2020) implemented in Nextflow 23.04 (Di Tommaso et al. 2017) using Docker 24.0.2 (Merkel 2014) with the minimal command. All analyses were performed using custom scripts in R version 4.2.2 using the DESeq2 v.1.38.3 framework (Love, Huber, and Anders 2014).

One sample was removed due to poor quality, and the remaining samples (from 32 cancer patients and 49 healthy individuals) were used in the downstream analyses. After normalizing the read counts using DESeq2, we applied cross-validation with five data folds for robust evaluation. Our approach comprises two strategies: first, the random forest algorithm was used on all 4039 miRNAs to capture broad interactions, and second, random forest-based feature selection was employed to narrow down relevant miRNAs. We utilized Python tools such as scikit-learn, numpy, pandas, matplotlib, and seaborn for analysis and visualization, ensuring a comprehensive exploration of miRNA‒target dynamics.

Analysis of 82 urine samples from breast cancer patients and healthy female individuals

We implemented miRNA sequencing as an unbiased method to evaluate the currently known miRNA genome from human urine samples. The first aim was to implement reliable and consistent detection of miRNAs in urine. Since this study should evaluate the feasibility of miRNA sequencing from reasonably small sample sizes, which can be obtained during a regular visit in an outpatient setting, we tested a sample size of 4 ml of urine in this sequencing approach.

Eighty-two individual patient samples were utilized in this study, and the miRNAs were extracted from 4 ml urine samples as described in the Methods section. The 82 samples were classified into 50 healthy tumor-free control, 6 HER2-enriched, 9 luminal A, 8 luminal B and 9 TNBC tumor-bearing patient urine samples (Table 1).

Table 1

Description of the patient cohort used for analysis.
Description	all	healthy	cancer
	N = 82	N = 50	N = 32
	mean (sd)	mean (sd)	mean (sd)
	or	or	or
	n (%)	n (%)	n (%)
age at urine sampling	54.5 (13.2)	50.7 (11.3)	60.4 (13.9)
age at urine sampling by group
< 50 years	30 (36.6)	23 (46.0)	7 (21.9)
> 50 years	52 (63.4)	27 (54.0)	25 (78.1)
age at first diagnosis	na	na	59.7 (14.1)
BMI	26.5 (6.6)	25.6 (5.9)	27.6 (7.6)
Tumor size
T1	na	na	16 (50.0)
T2-4	na	na	16 (50.0)
Tumor grade
G1/2	na	na	12 (37.5)
G3	na	na	20 (62.5)
Distant metastasis status
cM0	na	na	29 (90.6)
cM1	na	na	3 (9.4)
Histology
ductal	na	na	25 (78.1)
lobular	na	na	6 (18.8)
others	na	na	1 (3.1)
molecular-like subtype
Luminal A-like	na	na	9 (28.1)
Luminal B-like	na	na	8 (25.0)
HER2 positive	na	na	6 (18.8)
TNBC	na	na	9 (28.1)

miRNA sequencing of urine samples enables the detection of more than 4000 target miRNAs

We analyzed the expression of more than 4000 miRNAs and detected the consistent expression of 4039 distinct miRNAs. The mean absolute miRNA expression (normalized read counts) of all the samples varied between 6.5 and 14.1, with the majority of the samples showing an average expression of 7–8 +/- 1 (Fig. 1A). Using a 1.5-fold log change up- or downregulation of expression with an adjusted p value of 0.05 or less as a cutoff, we identified several miRNAs exhibiting differential expression between healthy and tumor-bearing individuals (Fig. 1B). Further stratification of patient groups yielded different numbers of differentially expressed miRNAs for luminal A vs healthy (161 miRNAs), HER2 vs healthy (30 miRNAs), luminal B vs healthy (19 miRNAs) and TNBC vs healthy (12 miRNAs) patients. Several of these differentially expressed miRNAs were found in more than one of the subgroup comparisons and could therefore not be used as biomarkers to stratify individual patients.

Confirmation of miRNA expression profiles by qPCR

To validate our initial results, we analyzed the expression of the most highly differentially expressed miRNA markers using qPCR and confirmed the differential regulation of some (Fig. 2A-I) but not others (Supplemental Fig. 1A-H). Overall, it must be stated that the variability in the qPCR results far exceeded the variability found in the sequencing results. Nevertheless, some subgroup-specific expression patterns, such as high expression of miR-30a-5p in TNBC, were validated (Fig. 2F). In general, the detection of cancer in urine compared to urine from healthy individuals was more consistent even though it varied distinctly among the whole patient population.

The random forest approach for data modeling

The miRNA expression across the groups determined by qPCR analysis varied strongly, and there was no absolute consistency pattern within each cancer subtype for stratifying a distinct number of patients using this method. This caused the differentially expressed miRNAs to be useful as potential biomarkers for stratifying cancer subtypes.

An analysis based on the miRNA sequencing data showed that the unsupervised clustering of the whole dataset did not match distinct patterns within the cancer group or a subtype (Fig. 3A). Moreover, PCA did not reveal any distinct patterns (Fig. 3B). To investigate the data in greater detail, we applied several machine learning algorithms to detect hidden patterns of miRNA expression. Figure 3C shows the initial benchmark of the shallow learning methods. The random forest (RF) outperforms logistic regression, decision tree and SVM due to its ensemble approach, which reduces overfitting, captures complex relationships, handles high-dimensional data, and balances bias-variance trade-offs by aggregating diverse decision trees.

Sequencing data indicating the ability of 275 individual miRNAs to distinguish BC patients from healthy women

We trained the RF model with two approaches: one with 4039 miRNAs and the other with feature selection via random forest. The prediction with 275 miRNAs selected by the RF algorithm (mean AUC = 0.67) (Fig. 4A, right) performed much better than the prediction with the whole miRNA dataset (mean AUC = 0.58) (Fig. 4A, left).

Figure 4B shows the heatmap showing the expression of the filtered miRNAs. The ensemble approach involving the random forest algorithm produced better results than did the generic statistical approach. This is probably due to the complicated or combinatorial expression patterns of miRNAs in urine. The miRNAs in urine are a mixture of different tissues and organs at their final stop, so their expression patterns are no longer obvious and are detectable by generic statistical methods.

Random forest algorithms with filtered miRNAs identify all intrinsic subtypes of breast cancer

Following the approach to distinguish healthy controls from women with breast cancer, we applied random forest analysis to identify miRNA patterns that would allow us to substratify patients into the distinct breast cancer subtypes of our patient cohort: luminal A, luminal B, Her2-enriched and TNBC.

For HER2-enriched BCs, 175 miRNAs out of the 4039 miRNAs were sufficient to increase the AUC on average from 0.55+/-38 to 0.68+/- 0.32 (Fig. 5A, compare left to right). In the case of LumA-type cancer, differential expression of 195 miRNAs was associated with an increase in the AUC from 0.7+/-0.32 to 0.78+/-0.26 on average (Fig. 5B, compared left to right). As one can easily derive from the figures with ever-increasing ROCs, an increased number of runs with more samples will increase the true positive rate dramatically.

For luminal B-type breast cancer, we found 191 miRNAs that distinguish Lum B-carrying patients from healthy individuals, the difference in which increased the area under the curve (AUC) from 0.57+/-0.2 to 0.71+/-0.24 (Fig. 6A, left/right). Finally, TNBC was detected by RF analysis of 189 miRNAs, for which the area under the curve (AUC) was 0.65 ± 0.31 and the AUC was 0.39 ± 0.22 for all the detectable miRNAs (Fig. 6B). We found this to be the most dramatic increase in sensitivity among all the subgroups.

The filtered miRNA subgroups exhibited no overlap

Among the filtered miRNAs above, there were very few overlapping miRNAs (Fig. 7A). There were no common miRNAs among the four subtypes. This finding suggested that the associated miRNAs might be distinct across these four subtypes. Most of these filtered miRNAs were not significantly or differentially expressed according to DGEA.

Breast cancer treatment is based on tumor biology and tumor stage. Therefore, early detection has been an important step toward improving the curation rates observed over the last several decades. Today, it is common practice in industrialized countries to screen the female population for breast cancer on a regular basis in national programs based on mammography. Improved mammography approaches using machine learning for deeper and more accurate image analysis are therefore the next logical step in an effort to detect breast cancer as early as possible to improve treatment and curation options (14, 15). Nevertheless, the tremendous technical and timely effort, physical discomfort during the procedure and monetary aspects of this technique could lead to the use of an easy, fast, and cost-effective prescreening method, which in the case of a positive finding would lead to an additional imaging method.

Furthermore, early information about tumor biology would likely be useful for stratifying consecutive imaging and work-up procedures.

Stratifying cancer patients based on noninvasive methods is currently a tremendous challenge. Especially in breast cancer, the diagnosis of TNBC has much more severe implications for the patient than a luminal A type tumor. Therefore, detecting this disease noninvasively and obtaining further information on the type of tumor would be extremely beneficial. This approach would give the treating physician a distinct advantage for subsequent work-up and treatment decisions.

Here, we present the first tightly controlled miRNA sequencing effort of urine samples from breast cancer patients to gain insight into how the miRNA genome is regulated in this disease and its intrinsic subtypes. Earlier efforts from our group focused on specific miRNAs known to be regulated in breast cancer using a proprietary miRNA amplification paradigm (9). Nevertheless, in the current approach, we implemented miRNA sequencing as an innovative approach for urinary analysis to understand how many miRNAs in the currently known genome are regulated in breast cancer and whether consecutively identified signatures might represent specific subclasses of BC, allowing their detection from noninvasive urine samples.

We found the let-7-miRNA family to be strongly represented in the cancer cohort, as would be expected from studies on other cancer entities using different methods of detection. The Let7-miRNAs are dysregulated in lung (16), pancreatic (17), colorectal (18), and papillary thyroid (19) cancers and, as recently described, in breast cancer (20). Let7 was further shown to regulate cancer stemness (21).

Apart from these initial findings, we also detected considerable variability among the top regulated miRNAs in some samples (e.g., variability of let-7c expression in healthy individuals [Figure 2A]), making an individual diagnosis of breast cancer or its subclasses less reliable. We therefore applied a machine learning approach to the sequencing data to investigate whether the patterns of multiple miRNAs would be more informative than those of several strongly differentially regulated miRNAs. Interestingly, the random forest approach outclassed the decision tree, logistic regression and SVM so dramatically, making it the method of choice for future analysis of miRNA sequencing data from urine samples.

An increase or decrease in a single given miRNA did not seem to have as much impact as the whole “signature” of miRNA expression changes (Fig. 2 vs. Figure 4). The detection of very specific subsets of miRNA patterns, specifically identifying both breast cancer patients and even their specific intrinsic subtypes, is innovative and, thus far, not known. Nevertheless, more surprisingly, these patterns of miRNAs overlap very little with each other; on average, only 10–15% of miRNAs are commonly regulated, whereas most miRNAs clearly identify a subgroup or breast cancer in general. This, to our knowledge, has not been shown before and raises the question of whether previous data should be reanalyzed with a more unbiased approach to possibly identify yet unknown patterns. However, only a machine learning approach can unravel this issue, as has been shown in other fields of research (22–24).

Consecutively, an important focus of further research should be the reduction and minimization of miRNAs included in our identified distinct miRNA pattern. The applicability of our technology for screening or early detection also relies on the sensitivity, specificity, false positive and false negative rates. The optimization of these pertinent parameters relies on large cohorts of patient and healthy control samples, which have been analyzed for this purpose.

In summary, our study represents an innovative approach and “proof of principle” concept for a sensitive noninvasive, urine-based, liquid biopsy test to detect breast cancer and its distinct intrinsic subtypes with a wide variety of application options.

Ethics approval and consent to participate

Samples were collected at the University Hospital Aachen (ethics vote 206/09) and at the University Hospital Erlangen (ethics vote #325_19 B). Participants in Erlangen were recruited within the iMODE-B study (Imaging and Molecular Detection of Breast Cancer; ethical approval by the ethics committee of the Friedrich-Alexander-Universität Erlangen-Nuremberg; #325_19 B). Patients were eligible for inclusion if they had an indication for a diagnostic biopsy due to a suspicious breast lesion. The main aim of the iMODE-B study was to identify molecular markers that are predictive of patient prognosis and treatment response at the time of the first diagnosis of breast cancer. After the participants provided written informed consent in accordance with the Declaration of Helsinki, biospecimen sampling was performed (blood draw and urine).

Availability of data and materials

The datasets generated and/or analyzed during the current study are not publicly available due an ongoing licensing process but are available from the corresponding author on reasonable request.

Competing interests

The authors declare that they have no competing interests.

Funding

This work was strongly supported by Dr. Pommer-Jung-Stiftung.

Authors' contributions

JM analyzed and interpreted patient data and was a major contributor to the writing and editing of the manuscript. MR provided patient urine samples and information on the patients. CK worked on the machine learning algorithms and provided the random forest analysis. BK performed the qPCR analysis. JF provided sequencing data. JW interpreted patient data. TK analyzed the correlations of patient data with subtypes of breast cancer. LN edited the manuscript and provided writing support. PF contributed to writing and editing the manuscript. ES provided funding and the original idea for the research and supported the execution and writing and editing of the manuscript. All the authors read and approved the final manuscript.

Acknowledgments

This work was supported by the Genomics Facility, a core facility of the Interdisciplinary Center for Clinical Research (IZKF) Aachen within the Faculty of Medicine at RWTH Aachen University. We thank Lothar Häberle for advice and support.

Nolan E, Lindeman GJ, Visvader JE. Deciphering breast cancer: from biology to the clinic. Cell. 2023;186(8):1708–28.
Luu XQ, Lee K, Jun JK, Suh M, Jung KW, Choi KS. Effect of mammography screening on the long-term survival of breast cancer patients: results from the National Cancer Screening Program in Korea. Epidemiol Health. 2022;44:e2022094.
Lowry KP, Callaway KA, Lee JM, Zhang F, Ross-Degnan D, Wharam JF, et al. Trends in Annual Surveillance Mammography Participation Among Breast Cancer Survivors From 2004 to 2016. J Natl Compr Canc Netw. 2022;20(4):379–86. e9.
Kupec T, Bleilevens A, Klein B, Hansen T, Najjari L, Wittenborn J et al. Comparison of Serum and Urine as Sources of miRNA Markers for the Detection of Ovarian Cancer. Biomedicines. 2023;11(9).
Duque G, Manterola C, Otzen T, Arias C, Palacios D, Mora M, et al. Cancer Biomarkers in Liquid Biopsy for Early Detection of Breast Cancer: A Systematic Review. Clin Med Insights Oncol. 2022;16:11795549221134831.
Shiao MS, Chang JM, Lertkhachonsuk AA, Rermluk N, Jinawath N. Circulating Exosomal miRNAs as Biomarkers in Epithelial Ovarian Cancer. Biomedicines. 2021;9(10).
Kupec T, Bleilevens A, Iborra S, Najjari L, Wittenborn J, Maurer J, Stickeler E. Stability of circulating microRNAs in serum. PLoS ONE. 2022;17(8):e0268958.
Hulstaert E, Morlion A, Levanon K, Vandesompele J, Mestdagh P. Candidate RNA biomarkers in biofluids for early diagnosis of ovarian cancer: A systematic review. Gynecol Oncol. 2021;160(2):633–42.
Erbes T, Hirschfeld M, Rucker G, Jaeger M, Boas J, Iborra S, et al. Feasibility of urinary microRNA detection in breast cancer patients and its potential as an innovative noninvasive biomarker. BMC Cancer. 2015;15:193.
Kumar S, Keerthana R, Pazhanimuthu A, Perumal P. Overexpression of circulating miRNA-21 and miRNA-146a in plasma samples of breast cancer patients. Indian J Biochem Biophys. 2013;50(3):210–4.
Rama K, Bitla AR, Hulikal N, Yootla M, Yadagiri LA, Asha T et al. Assessment of serum microRNA-21 and miRNA-205 as diagnostic markers for stage I and II breast cancer in Indian population. Indian J Cancer. 2023.
Zhang L, Xu Y, Jin X, Wang Z, Wu Y, Zhao D, et al. A circulating miRNA signature as a diagnostic biomarker for noninvasive early detection of breast cancer. Breast Cancer Res Treat. 2015;154(2):423–34.
Zhao H, Gao A, Zhang Z, Tian R, Luo A, Li M, et al. Genetic analysis and preliminary function study of miR-423 in breast cancer. Tumor Biol. 2015;36(6):4763–71.
Houssami N, Marinovich ML. AI for mammography screening: enter evidence from prospective trials. Lancet Digit Health. 2023;5(10):e641–e2.
Ng AY, Oberije CJG, Ambrozay E, Szabo E, Serfozo O, Karpati E, et al. Prospective implementation of AI-assisted screen reading to improve early detection of breast cancer. Nat Med. 2023;29(12):3044–9.
Takamizawa J, Konishi H, Yanagisawa K, Tomida S, Osada H, Endoh H, et al. Reduced expression of the let-7 microRNAs in human lung cancers in association with shortened postoperative survival. Cancer Res. 2004;64(11):3753–6.
Xiong G, Liu C, Yang G, Feng M, Xu J, Zhao F, et al. Long noncoding RNA GSTM3TV2 upregulates LAT2 and OLR1 by competitively sponging let-7 to promote gemcitabine resistance in pancreatic cancer. J Hematol Oncol. 2019;12(1):97.
Langevin SM, Christensen BC. Let-7 microRNA-binding-site polymorphism in the 3'UTR of KRAS and colorectal cancer outcome: a systematic review and meta-analysis. Cancer Med. 2014;3(5):1385–95.
Perdas E, Stawski R, Kaczka K, Zubrzycka M. Analysis of Let-7 Family miRNA in Plasma as Potential Predictive Biomarkers of Diagnosis for Papillary Thyroid Cancer. Diagnostics (Basel). 2020;10(3).
Chiu SC, Chung HY, Cho DY, Chan TM, Liu MC, Huang HM, et al. Therapeutic potential of microRNA let-7: tumor suppression or impeding normal stemness. Cell Transpl. 2014;23(4–5):459–69.
Ma Y, Shen N, Wicha MS, Luo M. The Roles of the Let-7 Family of MicroRNAs in the Regulation of Cancer Stemness. Cells. 2021;10(9).
Lee E, Jung SY, Hwang HJ, Jung J. Patient-Level Cancer Prediction Models From a Nationwide Patient Cohort: Model Development and Validation. JMIR Med Inf. 2021;9(8):e29807.
Zhang K, Liu C, Sha X, Yao S, Li Z, Yu Y, et al. Development and validation of a prediction model to predict major adverse cardiovascular events in elderly patients undergoing noncardiac surgery: A retrospective cohort study. Atherosclerosis. 2023;376:71–9.
Awad A, Bader-El-Den M, McNicholas J, Briggs J. Early hospital mortality prediction of intensive care unit patients using an ensemble learning approach. Int J Med Inf. 2017;108:185–95.

No competing interests reported.

FigureLayoutmiRNAPaperfinalJMSupplFig1.tif
Supplemental Figure 1 Quantitative PCR of several differentially expressed (healthy vs. tumor) miRNAs. QPCR-based expression of the depicted miRNAs in healthy individuals (healthy) from all tumor entities combined (tumor) and from the subclasses luminal A (Lum A), luminal B (Lum B), triple-negative breast cancer (TNBC) and HER2-enriched tumors (Her2). The expression of let-7e-5p (A), let-7f-5p (B), 451b (C), 21-3p (D), 21-5p (E), 451a (F), 125b-5p (G) and 26a-5p (H) was analyzed.

Download PDF

Reviews received at journal
23 Apr, 2024
Reviews received at journal
19 Mar, 2024
Reviewers agreed at journal
13 Mar, 2024
Reviewers agreed at journal
13 Mar, 2024
Reviewers invited by journal
13 Mar, 2024
Editor assigned by journal
28 Feb, 2024
Submission checks completed at journal
27 Feb, 2024
First submitted to journal
27 Feb, 2024

You are reading this latest preprint version

Machine learning approach identifies miRNA signatures for breast cancer detection and classification from patient urine samples

Status:

Version 1

Abstract

Figures

Background

Methods

Patient Selection and Urine Sample Preparation

RNA isolation

Quantitative PCR

qPCR statistics

miRNA sequencing and statistical analysis

Results

Analysis of 82 urine samples from breast cancer patients and healthy female individuals

miRNA sequencing of urine samples enables the detection of more than 4000 target miRNAs

Confirmation of miRNA expression profiles by qPCR

The random forest approach for data modeling

Random forest algorithms with filtered miRNAs identify all intrinsic subtypes of breast cancer

The filtered miRNA subgroups exhibited no overlap

Discussion

Conclusions

Declarations

References

Additional Declarations

Supplementary Files

Status:

Version 1