Development of an accurate breast cancer detection classifier based on platelet RNA

doi:10.21203/rs.3.rs-4152659/v1

Download PDF

Article

Development of an accurate breast cancer detection classifier based on platelet RNA

https://doi.org/10.21203/rs.3.rs-4152659/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

Platelets possess cancer-induced reprogramming properties, thereby contributing to RNA profile alterations and further cancer progression, while the former is considered a promising biosource for cancer detection. Hence, tumor-educated platelets (TEP) are considered a prospective novel method for early breast cancer (BC) screening. Our study integrated the data from 276 patients with untreated BC, 95 with benign disease controls, 214 healthy controls, and 2 who underwent mastectomy in Chinese and European cohorts to develop a 10-biomarker diagnostic model. The model demonstrated high diagnostic performance for BC in an independent test set (n = 177) with an area under the curve of 0.957. The sensitivity for BC diagnosis was 89.2%, with 100% specificity in asymptomatic controls, while that for the symptomatic group, including benign tumors and inflammatory diseases, was 62.1%. The model demonstrated substantial accuracy for stages 0–III BC (80% for stage 0 [n = 5], 83.3% for stage I [n = 12], 94.6% for stage II [n = 37], and 88.9% for stage III [n = 9]) and precisely helped determine residual cancer in two patients who underwent mastectomy. Moreover, our developed classifiers distinguish different BC subtypes properly. In summary, we created and tested a new TEP-RNA-based BC diagnostic model that was confirmed valid and demonstrated high efficiency in detecting early-stage BC and heterogeneous subtypes, including recurrent tumors. However, these results warrant more validation in larger population-based prospective studies before clinical implementation.

One of the primary aims of cancer research is early tumor diagnostics that are essential for preventing metastases that worsen surgical treatment prognosis, especially in highly recurrent malignancies such as breast cancer (BC). BC has been reported to be the most prevalent cancer in women globally and is increasingly observed in young women [1]. Presently, breast mammography and ultrasound are the prominently used screening methods, but their accuracy is dependent on patient age, breast density, and operator experience. Therefore, accurate BC screening approaches are warranted. We aim to identify BC in its early stages to provide personalized treatment options for patients, improve overall survival rates, and attain potentially complete treatment.

Several studies have revealed the crucial role of platelets in inflammation, tumor progression, and metastatic process [2, 3]. Concurrently, tumors have been shown to modulate platelet behavior, quantity, and internal RNA through so-called “education” [4].In particular, an experimental ovarian cancer model reflected interleukin (IL)-6 that triggers the liver to produce thrombopoietin, causing boosted platelet production by megakaryocytes [5]. Platelets internalize factors that are released by breast tumor cells in vitro as well as those from glioma and prostate tumor cells in vivo [6-8]. Notably, platelets sequester tumor-derived RNA, thereby changing their RNA profile which can be a basis for cancer detection with TEP [7]. TEP-derived RNA profiles have allowed the successful differentiation of various cancer types from healthy controls, thereby displaying consistency in the outcomes across diverse ethnicities and subtypes [9-11].

Our research revealed substantial differences in platelet RNA expression profiles between patients with BC and healthy individuals. Platelet RNA is characterized by being a potentially effective tool for early adjunctive BC diagnostics. Leveraging machine learning techniques, we developed a classifier that involves different platelet RNA expression profiles and enables accurate detection of early-stage BC. In conclusion, our study demonstrates that platelet RNA expression profiles can serve as reliable biomarkers for BC diagnostics. Further, it holds the promise of convenient screening for BC through a smaller set of platelet biomarkers.

1. Platelet collection for BC diagnostics and data processing

Circulating platelets are marked by unique RNA splicing capabilities, rendering them a valuable set of biomarkers. Our focus lies in identifying the most pertinent and distinctive spliced RNAs showing alterations in non- and BC cases. To achieve this goal, we systematically obtained blood samples from 499 subjects who had not undergone tumor resection surgery while having two unique post-mastectomy blood specimens. All individuals in the study cohort underwent rigorous clinical diagnostics, including comprehensive pathological analyses. We excluded samples with significant genomic contamination and severe degradation before sequencing. The analysis finally included 280 samples, including 183 cases of BC, 95 benign breast diseases, and 2 post-mastectomies. Benign breast diseases included a wide range of conditions, such as benign breast tumors (n = 44), breast inflammations (n = 17), and sclerosing adenosis (n = 34), not covering BC. These pathologic conditions collectively form the symptomatic control group in our study, despite representing different diseases and severity levels. Additionally, we united platelet RNA-sequencing data available in the GEO database from the study conducted by Best et al. [10]. This dataset comprises 217 samples from healthy female individuals, referred to as the asymptomatic control group due to the absence of reported signs of cancer or other severe illnesses, and 93 from patients with BC.

Following the approach described by Myron G. Best et al., we implemented a quality control program enabling the identification of 5111 high-confidence transcripts (reads of >30 in over 10% of the samples). Three platelet samples demonstrating low inter-sample correlations (<0.5) compared to all the others were eliminated. Additionally, we ensured that transcript numbers of >1500 were found in each sample resulting in a comprehensive total cohort of 587 samples (see Supplementary Fig. 1, Supplementary Table 3)[12]. Data processing was performed to mitigate batch effects between the two data sources and within them. Figure 1 shows the study design. Within the pre-surgery sample dataset (n = 585), the proportions for each group composed 47.18% (BC, n = 276), 36.58% (asymptomatic controls, n = 214), and 16.24% (symptomatic controls, n = 95). These proportions were used to create the internal subgroups representing training, validation, and testing cohorts. Notably, tumor stage and molecular subtyping in some patients remained unknown. Table 1 shows the detailed baseline characteristics of the participants.

2. Differential platelet RNA profiles in BC and healthy individuals

A comparative analysis between BC cases and the healthy controls in the training dataset demonstrated substantial modifications in the transcriptome of TEP. Out of the assessed 5111 RNAs, 1181 exhibited upregulation and 1207 demonstrated downregulation in BC cases (log CPM≥ 3, P < 0.001; Figure 2a). Additionally, the number of highly confident RNAs found in BC (median value 5001) exceeded that in healthy females (median value: 4791) (P = 1.06e-07). An unsupervised hierarchical clustering heatmap received after differential analysis (Figure. 2b) indicated a distinct difference in TEP-RNA profiles between BC and healthy donors.

We performed an enrichment analysis of these differentially expressed gene sets with the Gene Ontology database to enhance our comprehension of biological significance [13]. The outcomes demonstrated the enrichment of upregulated TEP RNAs in the biological processes, such as regulation of protein-containing complex assembly, blood coagulation, and platelet activation. Conversely, downregulated RNAs were predominantly associated with processes such as ribonucleoprotein complex biogenesis, RNA splicing, and cytoplasmic translation (Supplementary Table 1).

However, the differences in expression were determined in only 120 RNAs (log CPM ≥ 3, P < 0.05) between BC and the symptomatic control group with non-BC conditions. This could be associated with the heterogeneity caused by certain types of benign breast tumors and condition severity.

3. Model development and validation for BC diagnostics

Subsequently, we proceeded to construct a model using a support vector machine (SVM) algorithm. Figure 3 outlines model elaboration and optimization workflow. We included symptomatic controls in model processing, considering the prevalence of benign breast diseases among women, where approximately 50% of those aged >30 years experience mastalgia and fibrocystic changes [14]. We initially performed differential analysis on the training dataset between the BC group (n = 138) and the disease-free group (n = 154) using the R package edgeR to detect the required input gene list for the classifier, which resulted in 1761 genes (log CPM≥ 3 and P < 0.001). We further narrowed the total number down to 57 genes by LASSO modeling to minimize the number of markers in the final classifier. Single-factor logistic regression analysis performed on these 57 genes revealed four genes with an area under the curve (AUC) of >0.8 (refer to Supplementary Table 2), forming the basis of the 4-BC-TEP-RNA panel. This panel demonstrated high diagnostic performance in the training cohort (AUC = 0.894, 95% confidence interval [CI]: 0.857–0.931, sensitivity: 94.2%, specificity: 74%, Figure 4a). The classifier revealed high sensitivity in distinguishing BC (n = 55) from non-BC (n = 61) in the validation cohort (validation cohort: AUC = 0.861, 95% CI: 0.787–0.935, sensitivity: 96.4%, specificity: 73.8%, Figure 4a). The specificity was 100% for asymptomatic controls, but decreased to only 15.79% for the opposite group, resulting in the overall reduction in the model’s specificity (Figure 4b).

We extracted the top seven genes representing the RNAs differing between BC and the symptomatic control group in a pursuit to improve the overall classifier performance (single gene AUC > 0.65, see Supplementary Table 2). These genes were integrated to complement the 4-BC-TEP-RNA panel. The diagnostic model constructed with the optimized 10-BC-TEP-RNA panel demonstrated an AUC, sensitivity, and specificity of 0.97 (95% CI: 0.954–0.986), 88.4%, and 92.9% in the training cohort and 0.941 (95% CI: 0.91–0.979), 87.3%, and 88.3% in the validation cohort, respectively (Figure 4c, 4d). The AUC of this classifier had significantly higher values than the 4-BC-TEP-RNA panel in both training (P = 2.685e-06) and validation cohorts (P = 6.139e-04).

4. Test of the TEP-derived RNA panel model in an independent cohort

We evaluated the efficacy of two TEP-derived RNA panels using the independent test cohort composed of 83 BC and 71 non-BC samples. Applying a 10-BC-TEP-RNA panel for diagnostics yielded an AUC of 0.957 (95% CI: 0.931–0.983), which confirmed significantly superior diagnostic performance compared to the former (P = 1.578e-05) (Figure. 5a). This enhancement aimed to refine the distinction between BC and benign breast lesions by further correcting samples initially identified as positive by a 4-BC-TEP-RNA panel. Furthermore, the specificity was 100% (n = 65) in the asymptomatic control group and 62.1% (n = 29) in the symptomatic control, respectively (Figure 5b). The values for detection accuracy for different tumor stages ranged from 80% to 94.6% (80% [n = 5], 83.3% [n = 12], 94.6% [n = 37], 88.9% [n = 9], 83.3% [n = 18], and 100% [n = 2] for 0, I, II, III, IV, and unknown stages, respectively, Figure 5c). Table 2 shows other detailed parameters.

Of interest, within the complete cohort, two patients who underwent mastectomy provided blood samples approximately one week after breast tumor resection. One patient, initially diagnosed with invasive breast carcinoma, exhibited no evidence of residual cancer upon a pathologic examination conducted a week later. The 10-BC-TEP-RNA panel in this case assigned a probability of 0.216 for BC (<0.5). Conversely, the second patient, diagnosed with moderate-grade ductal carcinoma in situ during the pathologic assessment, received a classifier-assigned probability of 0.858 (>0.5) for BC. Remarkably, the classifier’s results aligned with the former pathologic diagnoses in both instances.

5. Development of classifiers for various receptor subtypes in BC

The advanced therapeutic approach for each patient with BC depends on tumor subtype, cancer stage, and patient preferences. For instance, the strategy for patients having no metastases is determined by tumor receptor subtype: patients with hormone receptor (HR)-positive tumors are recommended to undergo endocrine treatment; patients with human epidermal growth factor 2 (HER2)-amplified tumors received targeted therapy or a combination of small molecule inhibitors and chemotherapy; combined HER2/ER blockade effectively treated HR-positive/HER2-amplified BC; and patients with triple-negative tumors are eligible for only chemotherapy [15, 16].

Hence, we aimed to estimate the use of TEP-RNA profiles in distinguishing different receptor subtypes. Despite the limited sample size and the influence of other potential confounding factors, our developed TEP-RNA profiles-based panels successfully detected HER2-amplified and HR-positive subtypes in BC (compared to a random classifier, P < 0.01, Figure 6).

Current BC diagnostic methods primarily depend on mammography and ultrasound. However, mammography is considered more effective for females aged ≥40, because those of younger age demonstrated denser breast tissue explaining the lower sensitivity in this demographic group [17]. Ultrasound can be a complementary screening technique for females with dense breast tissue, with the absence of ionizing radiation being its primary advantage [18]. However, ultrasound has faced criticism due to its relatively low specificity shown in the overlapped features of many benign lesions. Many developments are still mostly aimed at increasing the specificity of ultrasound to enhance its value as a screening modality[19]. The experience and skill level of an ultrasound physician affect the results of the examination in practical application, and different operators may provide varying interpretations of results.

Interest is increasing in liquid biopsy due to its minimally invasive nature and a high potential for cancer monitoring to advance BC diagnostics and prognosis [20-22]. In particular, the research has indicated the development of a classifier comprising 26 markers and using cfDNA methylation profiles, where sensitivity of 89.37% and specificity of 100% were obtained when discriminating between patients with BC and healthy women, despite excluding patients with benign lesions [23]. The combined expression of urinary sEV miR-21 and matrix metalloproteinase-1 (MMP-1) demonstrated elevated sensitivity (95%) and specificity (79%) in the case of early-stage BC [24]. Platelet RNA, as an emerging biosource for liquid biopsy, can provide valuable information for cancer diagnostics. The recent intercontinental TEP biomarker study confirmed that TEPs demonstrated high validity in populations of different ethnicities, histological subtypes, and early-stage ovarian cancer [9, 10, 25]. Furthermore, platelets are abundant and easy to isolate in peripheral blood compared to the low concentrations of cfDNA or exosomal miRNA [26]. Therefore, TEP can be more easily implemented in routine laboratory practice.

Here, we report significant differences in platelet expression profiles between patients with and without BC. Our results indicate that a TEP-RNA-based classifier containing 10 markers can precisely identify patients with BC, demonstrating a sensitivity of 89.2% (74/83) in the test cohort and a specificity of 100% (65/65) in asymptomatic controls. The specificity for symptomatic controls is relatively lower at 62.1% (18/29), similar to the outcomes observed in the study by Wurdinger et al. for pan-cancer diagnosis [10]. Therefore, further in-depth investigation of platelet RNA expression profile in symptomatic controls is warranted, involving more blood sample collection and appropriate subset design.

Our diagnostic results revealed higher sensitivity for patients with early-stage BC (80% [n = 5], 83.3% [n = 12], 94.6% [n = 37], 88.9% [n = 9] for stages 0, I, II, and III, respectively) because patients with non-metastatic BC compose the predominant part of our study cohort. This can promptly provide personalized therapeutic options for patients, thereby ultimately improving their quality of life. Interestingly, our classifier correctly determined the presence or absence of residual cancer for two of the patients who had undergone breast surgery. This aligns with the role of platelets as responders to signals produced by cancer cells, indicating that platelet RNA immediately responds to changes in tumor growth theoretically explaining TEP’s high sensitivity in early cancer identification [3, 27, 28]. In the future, monitoring of BC recurrence through platelet RNA is expected to be implemented through the analysis of a large number of follow-up platelet samples.

Furthermore, we have revealed the potential of TEP-RNA to further differentiate BC subtypes, which can be caused by different “education” patterns applied to platelet RNA by distinct BC subtypes. More precise “education” patterns can be detected for each by supplementing platelet samples with different subtypes, thereby allowing its application in companion diagnostics and determining the most suitable treatment plan for patients.

In conclusion, we have confirmed the important role of platelet RNA as a biosource for the BC diagnostic model based on TEP-RNA that allows precise BC diagnostics. Particularly, our study included a higher proportion of patients with early-stage BC, and we revealed that TEPs can be effectively employed for early BC detection creating the possibility for timely treatment and elevating patient survival rates. Platelet-based methods are more convenient, minimally invasive, radiation-free, and do not require specialized personnel for result interpretation, thus being applicable for BC screening, companion diagnostics, and recurrence monitoring.

Participants

The selected research subjects from The First Affiliated Hospital of Xiamen University met the following criteria: 1) patients diagnosed with BC without a history of other malignancies; 2) patients with benign breast tumors, such as fibroadenoma and papilloma, as well as those with sclerosing adenosis and mastitis, none of whom had BC; 3) patients who underwent mastectomy with subsequent pathologic examination of tumor residue one-week post-surgery. All blood samples were obtained from the Department of Breast Surgery at The First Affiliated Hospital of Xiamen University. All participants enrolled in the study signed an informed consent for blood collection and blood platelet analysis.

Processing of blood samples and raw RNA-sequencing data

The standardized protocol, as described by Best et al., was implemented for the processing of blood samples gathered from The First Affiliated Hospital of Xiamen University. This study was conducted in accordance with the principles outlined in the Declaration of Helsinki and was approved by the institutional review board and ethics committee of the participating hospital. The supplementary material outlines the details of the procedure. BGI Genomics Co., Ltd. has successfully performed the intricate processes of RNA extraction and library construction. The libraries were subjected to quality control with an Agilent 2100 bioanalyzer, and the comprehensive standardized approach provided in the supplementary methods was used for sequencing on the BIGSEQ platform. The platelet RNA raw sequencing data from both the dataset publicly available from Best et al. and our FASTQ files were treated together through the RNA-sequencing pipeline. In summary, clean data were mapped to the human reference genome (hg38) using STAR (v2.7.8). Gene expression quantification in aligned reads was performed via HTSeq (v2.0) guided by Ensembl gene annotation version 104. All the statistical analyses were conducted in R (v4.1.2) and R-studio.

Ethics approval and consent to participate

The research was conformed to the principles of the Helsinki Declaration and approved by the Ethics Committee of the First Affiliated Hospital of Xiamen University (XMYY-2021KYSB005). All participants have been provided with and have duly signed informed consent forms for both blood collection and analysis of blood platelets.

Filtering and data normalization

Initially, genes with <30 reads in >90% of the samples were ruled out. We ensured that each sample contained >2000 unique high-confidence genes with at least one mapped read. Leave-one-sample-out cross-correlation analysis identified three samples with inter-sample correlations of <0.5, specifying their exclusion. Therefore, we conducted a principal component analysis with the FactoMineR (v2.8) package in R and used the ComBat function from the sva-package (v3.48.0) in R to mitigate batch effects in the data.

Feature selection and model construction

We used a training set to extract features and decreased the gene list used for model input through a series of steps aimed at reducing the sample size. Firstly, we excluded the genes not significantly expressed in platelets. Thus, the initial extraction of genes in the training set was performed with gene differential expression analysis (edgeR) meeting log CPM of ≥3 and P-value of <0.001 criteria. Thirdly, the least absolute shrinkage and selection operator (LASSO) approach was used to select the genes with the optimal lambda value for the model, while R-packages glmnet (v4.1) and caret (v6.0) were utilized with 10-fold cross-validation. Finally, we assessed each candidate gene’s role in classification by single-factor logistic regression and selected the most prominent genes to develop the final diagnostic panel for the model.

We developed a BC diagnostic prediction model with the gene panel elicited from the training set, optimizing the costs and sigma parameters. The final model demonstrated a high diagnostic accuracy.

Statistical analysis

All the analyses were conducted with R software unless explicitly stated. Data set division was performed using the sample split function from the R-package caTools (v1.18.2). AUC and 95% CI were computed using R-package pROC (v1.18.4). The performance assessment of the detection test involved P-value calculation through a two-sided DeLong test. Pairwise comparisons of gene expression levels among the groups were conducted with the Games-Howell method suitable for unequal variances and sample sizes. P-values were obtained with a two-sided Student’s t-test for other evaluations. Single-factor logistic regression was implemented to estimate the diagnostic efficacy of each marker. The resulting probabilities were used for test result classification.

Data availability

The datasets used and/or analysed during the current study available from the corresponding author on reasonable request.

Author contributions

W.X., J.H., Z.Z., B.L. and Z.O. conceived the study. W.X., Z.Z., H.L. and Y.H. designed and/or conducted the experiments. W.X., J.H., Z.Z. designed and/or carried out bioinformatic data analysis. W.X. and J.H. generated figures and tables. W.X. and J.H. wrote the manuscript. All the authors read and approved the final manuscript.

Competing interests statement

The authors declare no competing interests.

Acknowledgements

We are grateful for the support from the patients and their families. We thank all our team members for their critical review of the manuscript. We also thank Liyuan Fan and Qianying Li for they suggestions on the bioinformatics analysis. This work was supported by AigenBri (Xiamen) Medical Laboratory Co., Ltd.

Siegel, R. L., Miller, K. D., Fuchs, H. E. & Jemal, A. Cancer statistics, 2022. CA Cancer J Clin. 72, 7-33 (2022)
Haemmerle, M., Stone, R. L., Menter, D. G., Afshar-Kharghan, V. & Sood, A. K. The platelet lifeline to cancer: challenges and opportunities. Cancer Cell. 33, 965-983 (2018)
McAllister, S. S. & Weinberg, R. A. The tumour-induced systemic environment as a critical regulator of cancer progression and metastasis. Nat. Cell Biol. 16, 717-727 (2014)
In 't Veld, S. & Wurdinger, T. Tumor-educated platelets. Blood. 133, 2359-2364 (2019)
Stone, R. L., et al. Paraneoplastic thrombocytosis in ovarian cancer. N. Engl. J. Med. 366, 610-618 (2012)
Battinelli, E. M., Markens, B. A., & Italiano, J. E., Jr. Release of angiogenesis regulatory proteins from platelet alpha granules: modulation of physiologic and pathologic angiogenesis. Blood. 118, 1359-1369 (2011)
Nilsson, R. J., et al. Blood platelets contain tumor-derived RNA biomarkers. Blood. 118, 3680-3683 (2011)
Kuznetsov, H. S., et al. Identification of luminal breast cancers that establish a tumor-supportive macroenvironment defined by proangiogenic platelets and bone marrow-derived cells. Cancer Discovery. 2, 1150-1165 (2012)
Gao, Y., et al. Platelet RNA enables accurate detection of ovarian cancer: an intercontinental, biomarker identification study. Protein Cell. 14, 579-590 (2023)
In 't Veld, S., et al. Detection and localization of early- and late-stage cancers using platelet RNA. Cancer Cell. 40, 999-1009.e6 (2022)
Best, M. G., et al. Swarm intelligence-enhanced detection of non-small-cell lung cancer using tumor-educated platelets. Cancer Cell. 32, 238-252 e9 (2017)
Best, M. G., In ’t Veld, S. G. J. G., Sol, N. & Wurdinger, T. RNA sequencing and swarm intelligence-enhanced classification algorithm development for blood-based disease diagnostics using spliced blood platelet RNA. Nat Protoc. 14, 1206-1234 (2019)
Wu, T., et al. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. Innovation (Camb). 2, 100141 (2021)
Stachs, A., Stubert, J., Reimer, T. & Hartmann, S. Benign breast disease in women. Dtsch Arztebl Int. 116, 565-574 (2019)
Waks, A. G. & Winer, E. P. Breast cancer treatment: A review. JAMA. 321, 288-300 (2019)
Pegram, M., Jackisch, C. & Johnston, S. R. D. Estrogen/HER2 receptor crosstalk in breast cancer: combination therapies to improve outcomes for patients with hormone receptor-positive/HER2-positive breast cancer. NPJ Breast Cancer. 9, 45 (2023)
Pace, L. E. & Keating, N. L. A systematic assessment of benefits and risks to guide breast cancer screening decisions. JAMA. 311, 1327-1335 (2014)
Mann, R. M., Hooley, R., Barr, R. G. & Moy, L. Novel approaches to screening for breast cancer. Radiology. 297, 266-285 (2020)
Weigert, J. M. The Connecticut experiment; the third installment: 4 years of screening women with dense breasts with bilateral ultrasound. Breast J. 23, 34-39 (2017)
Chen, M. & Zhao, H. Next-generation sequencing in liquid biopsy: cancer screening and early detection. Hum. Genomics. 13, 34 (2019)
Klein, E. A., et al. Clinical validation of a targeted methylation-based multi-cancer early detection test using an independent validation set. Ann. Oncol. 32, 1167-1177 (2021)
Liu, J., et al. Genome-wide cell-free DNA methylation analyses improve accuracy of non-invasive diagnostic imaging for early-stage breast cancer. Mol Cancer. 20, 36 (2021)
Zhang, X., et al. Circulating cell-free DNA-based methylation patterns for breast cancer diagnosis. npj Breast Cancer. 7, 106 (2021)
Ando, W., et al. Novel breast cancer screening: combined expression of miR-21 and MMP-1 in urinary exosomes detects 95% of breast cancer without metastasis. Sci. Rep. 9, 13595 (2019)
Best, M. G., et al. RNA-Seq of tumor-educated platelets enables blood-based pan-cancer, multiclass, and molecular pathway cancer diagnostics. Cancer Cell. 28, 666-676 (2015)
Denis, M. M., et al. Escaping the nuclear confines: signal-dependent pre-mRNA splicing in anucleate platelets. Cell. 122, 379-391 (2005)
Calverley, D. C., et al., Significant downregulation of platelet gene expression in metastatic lung cancer. Clin. Transl. Sci. 3, 227-232 (2010)
Antunes-Ferreira, M., et al. Tumor-educated platelet blood tests for non-small cell lung cancer detection and management. Sci. Rep. 13, 9359 (2023)

Table 1. Characteristics of asymptomatic control, symptomatic control, and patients with breast cancer.

Characteristics	Subgroup	Whole set (n = 585)	Training set (n = 292)	Validation set (n = 116)	Testing set (n = 177)
Number of samples	Asymptomatic control	214	107	42	65
	Symptomatic control	95	47	19	29
	Patients with BC	276	138	55	83
Age (asymptomatic control/symptomatic control/BC)	＜40	58/39/32	34/20/20	10/8/4	14/11/8
	40–49	31/35/81	14/15/40	4/7/14	13/13/27
	50–59	53/18/95	25/10/48	9/3/19	19/5/28
	≥60	72/3/68	34/2/30	18/1/18	19/0/20
	Mean	49.6/42.6/52.4	48.2/42.8/51.3	52.4/43.1/54.1	50.0/42.0/53.0
Subtype	Luminal A	21	10	5	6
	Luminal B (HER2+)	22	12	7	3
	Luminal B (HER2-)	56	30	8	18
	HER2+	27	14	3	10
	TNBC	20	8	5	7
	NA	130	64	27	39
Stage	0	12	7	0	5
	I	51	31	8	12
	II	117	60	20	37
	III	23	10	4	9
	IV	64	26	20	18
	NA	9	4	3	2
symptomatic control	Benign tumor	45	22	8	15
	Sclerosing adenosis	30	15	5	10
	Inflammation	20	10	6	4

Abbreviations: HER2: human epidermal growth factor receptor 2; TNBC: triple-negative breast cancer.

Table 2. Performance of TEPBC 4 markers and 10 markers in detecting breast cancer in the training, validation, and testing cohorts.

		AUC (95% CI)	ACC (95% CI), %	Sensitivity, %	Specificity, %	PPV, %	NPV, %	AUC P-value
4 markers	Training set	0.894(0.857–0.931)	83.7(78.8–87.6)	94.2	74.0	76.5	93.4	/
	Validation set	0.861(0.787–0.935)	84.5(76.6–90.5)	96.4	73.8	76.8	95.7	/
	Testing set	0.828(0.764–0.891)	81.4(74.8–86.8)	95.2	69.2	73.2	94.2	/
10 markers	Training set	0.970(0.954–0.986)	90.8(86.8–93.8)	88.4	92.9	91.7	89.9	2.685e-06
	Validation set	0.941(0.910–0.979)	87.1(79.6–92.6)	87.3	86.9	85.7	88.3	6.139e-04
	Testing set	0.957(0.931–0.983)	88.7(83.1–93.0)	89.2	88.3	87.1	90.2	1.578e-05

Predictions of TEPBC 4 markers were compared with 11 markers using a two-sided DeLong’s test (AUC P-value). Abbreviations: AUC: area under the curve; ACC: accuracy; PPV: positive predictive value; NPV: negative predictive value; CI: confidence interval.

No competing interests reported.

supplementarymethods.docx
SupplementaryTable1.xlsx
SupplementaryTable2.xlsx
SupplementaryTable3.xlsx
SupplementaryFigure1.pdf
Supplementary Figure 1. Performance of quality control. (a) A leave-one-sample-out cross-correlation filtering step, where the counts of each sample are correlated with the median counts of all other samples ensuring that the correlation coefficient is >0.5, excluded three samples. (b) Count of transcripts reliably detected in platelet RNA samples. (c) Boxplot of the age distribution in the samples. (d) Ensuring the inclusion of samples with >1500 detected transcripts for further analysis.
SupplementaryFigure2.pdf
Supplementary Figure 2. Violin plot showing the ten markers’ expression levels in all platelet samples.

Download PDF

Reviewers invited by journal
15 May, 2024
Editor assigned by journal
15 May, 2024
Editor invited by journal
10 Apr, 2024
Submission checks completed at journal
05 Apr, 2024
First submitted to journal
22 Mar, 2024

You are reading this latest preprint version

Development of an accurate breast cancer detection classifier based on platelet RNA

Status:

Version 1

Abstract

Figures

Introduction

Results

Discussion

Methods

Declarations

References

Tables

Additional Declarations

Supplementary Files

Status:

Version 1