Transcriptome profiling of chronic liver disease-educated platelets (CLDEPs) reveals unique mRNA signatures in affected patients compared with those in healthy volunteers
Peripheral blood samples and isolated circulating platelets from healthy donors (n = 50), patients with CHB (n = 39), patients with LC (n = 40), and patients with HCC (n = 34) were collected, all diagnosed based on clinical presentations and conventional pathological analysis of liver tissues (Fig. 1A-B). 16,968 genes were identified for subsequent analyses. Of 16,968 genes, 4700 exhibited difference in terms of expression levels among the four groups (FDR < 0.001; Table 2, Fig. 1C). The GO functional enrichment analysis revealed that transcriptome data enriched for transcripts correlated with platelet functions (false discovery rate [FDR] < 0.05; Table S1; Fig. 1D). Furthermore, the KEGG enrichment analysis of the transcriptome data determined endocytosis as the most enriched signatures (FDR < 0.05, Table S2, Fig. 1E).
Table 1
Summary of study participants
Group
|
Total (n)
|
Training
|
Validation
|
Male
|
Female
|
Age (SD)
|
Healthy
|
50
|
35
|
15
|
22
|
28
|
34(10)
|
HBV
|
39
|
27
|
12
|
29
|
10
|
39(13)
|
LC
|
40
|
28
|
12
|
35
|
4
|
44(9)
|
HCC
|
34
|
24
|
10
|
29
|
4
|
52(11)
|
CHB, chronic hepatitis B; LC, liver cirrhosis; HCC, hepatocellular carcinoma. |
Table 2
Differentially expressed genes among the study groups
|
Healthy
|
CHB
|
LC
|
HCC
|
Healthy (n = 50)
|
0
|
144
|
3582
|
2676
|
CHB (n = 39)
|
|
0
|
103
|
530
|
LC (n = 40)
|
|
|
0
|
362
|
HCC (n = 34)
|
|
|
|
0
|
CHB, chronic hepatitis B; LC, liver cirrhosis; HCC, hepatocellular carcinoma. |
Signature identification and diagnosis between HCC and healthy or non-HCC group
Comparison of the transcriptome profile of patients with HCC with that of healthy volunteers revealed 2,676 DEGs, whereas that of patients with HCC and CHB revealed 530 DEGs and those of patients with HCC and LC revealed 362 DEGs. Remarkably, the number of DEGs between HCC and other groups declined with disease progression, suggesting the underlying biological rationale of platelet transcriptome profiling as a surrogate marker for CLD (Table 2). GO and KEGG enrichment analyses between the two groups identified specific pathways and functional groups (Fig. 2A-B; Table S1–S2).
For HCC diagnosis-specific SVM algorithm, we selected DEGs from the healthy and HCC groups that were incorporated in the training cohort (n = 59), yielding the ideal sensitivity, specificity, and accuracy for tests performed within the training cohort (100%; Fig. 3A). Subsequent validation (n = 25) of the DEG-trained SVM algorithm yielded 90% sensitivity, 93% specificity, and 92% accuracy with an area under the curve (AUC) of 0.993, illustrating high predictive strength of the algorithm in correctly differentiating patients with HCC from healthy donors (Fig. 3A). Unfortunately, one sample of the validation cohort was misdiagnosed (6.6%). A total of 100 random class-proportional subsampling processes of the entire transcriptome profile data, combining training and validation cohorts, produced similar accuracy rates with a mean overall accuracy of 92.4% (SD: ±4.95%), establishing reproducible classification algorithm within this dataset.
In addition, we used the SVM classifier for cancer and non-cancer classification. In the training cohort (n = 114), an accuracy of 100% and AUC of 0.99 were obtained. The subsequent validation cohort (n = 49) yielded a sensitivity of 80% and specificity of 92.3%, with localized disease correctly classified in 44/49 patients (89.80%), and an AUC of 0.94 to detect the disease and a high predictive strength. A total of 100 random proportional subsampling processes of the entire dataset in a training and validation set (ratio: 70:30) yielded similar accuracy rates (mean overall accuracy: 89.92% ± 3.46%), confirming the reproducible classification accuracy in this dataset (Fig. 3B).
Signature identification and diagnosis between CLD and Heathy group
248 DEGs were identified between CLD (CHB, LC, and HCC; all collectively considered CLD) and healthy honor. The most significant biological process among the genes was Ribosome (25 genes), with adjusted P value = 1.11E-24, and blood microparticle (33 genes), with adjusted P value = 3.97E-24. In the KEGG pathway analysis, focal adhesion, was most significant pathways, with 16 genes enriched (adjusted P value = 0.001) (Fig. 2C-H).
Next, we established the diagnostic accuracy of the CLDEP-based broad classification algorithm, whereby each participant would be diagnosed with a CLD (CHB, LC, and HCC; all collectively considered CLD) or classified as a healthy donor. For the SVM algorithm for the training cohort (n = 114), we optimized it to again yield the ideal sensitivity, specificity, and accuracy (100%). The subsequent validation (n = 49) of this SVM algorithm yielded 73% sensitivity, 88% specificity, and 84% accuracy with an AUC of 0.900, illustrating high predictive strength in correctly differentiating patients with a liver disease from healthy volunteers (Fig. 3C); however, the model’s misdiagnosis rate was 11.8%. After a total of 100 random repetitions, it produced similar accuracy rates with a mean overall accuracy of 83.2% ± 4.30%, establishing a reproducible classification algorithm within this dataset and indicating that CLDEPs is a potential surrogate marker for screening CLD.
Diagnostic signature in four group
The same dataset is used to provide an all-in-one biosource for blood-based liquid biopsies in patients with CLD. All samples were categorized into four groups. The training set demonstrated an excellent distinction of patients. In addition, the classification capacity of the multiclass SVM-based classifier was established in the validation cohort of 49 samples. In this classifier, the sensitivity of a healthy donor was 73.33%; the probability of patients with CHB, LC, and HCC being correctly diagnosed was 58.33%, 58.33%, and 70%, respectively, with an AUC of 0.8588. The multiclass CLD diagnostic test resulted in an average accuracy of 65.31% (mean overall accuracy random classifiers: 66.35% ± 6.19%; P < 0.01), demonstrating the significant discriminative power of the multiclass CLD diagnostic test of platelet mRNA profiles (Fig. 3D).
Notably, 14 DEGs occurred in all pairs of CLD groups (TGM2, EPAS1, HAPA12B, H19, DOCK6, CARF10, KANK3, CASKIN2, RELN, IGFBP4, SLC9A3R2, LIMS2, PPFIBP1, and A2M; Fig. 3E). Then, we constructed an SVM/LOOCV discriminator algorithm based on 14 DEGs in all CHB, LC, and HCC cases to test the feasibility of a targeted panel-based diagnostic assay for liver diseases. The small targeted panel NGS assay depicts an attractive alternative to comprehensive assays (e.g., whole-genome and whole-transcriptome sequencing) in most clinical laboratory settings owing to their increased throughput (i.e., more patient samples per flow cell) and enhanced cost-effectiveness, especially for disease-specific assays such as the CLDEP diagnostic assay used herein. In the training cohort (n = 80), this simplified approach yielded sensitivities of 92% for patients with HCC, 81% for patients with CHB, and 75% for patients with LC (Fig. 3F). In addition, subsequent tests of the validation cohort yielded sensitivities of 80% for patients with HCC, 92% for patients with CHB, and 42% for patients with LC. For a total of 100 random class-proportional subsampling processes, the overall accuracy rate average was 68.2% ± 6.89%.