Profiling differential methylation of peripheral blood mononuclear cells in canine mammary gland tumor
We first made genome-wide differential methylation profiles of PBMCs in CMT. To evaluate the genome-wide effects of mammary tumors on PBMC DNA methylation, PBMCs were collected from 15 healthy dogs (Normal; N), 31 dogs with mammary adenoma (Benign; B), and 30 dogs with mammary carcinoma (Carcinoma; C) (Fig. 1A). The donor’s information is listed in Table S1. The healthy samples consist of six dog breeds, aged 1 to 12, and 13 females, including eight spayed females and two neutered males. Patient specimens comprise 16 dog breeds aged 5 to 16 and six significant subtypes of canine mammary tumors (ductal, simple, complex, mixed, inflammatory, and comedo). All patient dogs were females or spayed females.
Global CpG methylomes have enriched and analyzed by methyl-CpG-binding domain sequencing (MBD-seq) that has high coverage in highly methylated CpG and CpG-rich regions (Fig. 1A). The quality check for NGS data has also been performed (Table S2). Sequencing reads more than 5X depth (considered as signal peaks) show about 50% CpG coverage, indicating that the MBD-seq data was successfully produced and informative (Figure S1A). The R Bioconductor MEDIPS (v.1.46.0) [16] was mainly employed to calculate methylation levels and identify differentially methylated regions (DMRs) (Figure S1B). DMRs were further subjected to ML for modeling an immune classifier for CMT. Of the total, 4,655,287 bins (referred to as ‘Total bins’ in Fig. 1C-E) were generated at 500 bp size, and 1,220,164 bins (referred to as ‘Bins_used’ in Fig. 1C-E) with reading counts of 25 or more were used for further analysis.
Together with pair-wise comparisons (Normal vs. Benign (NB), Normal vs. Carcinoma (NC), and Benign vs. Carcinoma (BC)), we also compared Normal vs. Tumor (NT), in which tumor includes benign and carcinoma. From each comparison, 2840, 3373, 1876, and 168 DMRs were identified with significance (log2FC ≥ ± 0.585 (|Fold Change| ≥ 1.5), adjusted p-value (FDR) < 0.1) for NT, NB, NC, BC, respectively (Fig. 1B). The statistics and genomic features of each DMR group are listed in Table S3-S6 for NT_DMR, NB_DMR, NC_DMR, and BC_DMR, respectively. Interestingly, the NB comparison shows the highest number of DMRs, followed by NT. As expected, NT comparison shares more than half of DMRs (1514) with NB and NC comparisons. Of note, DMRs from NB and NC comparisons share 636 DMRs and methylation directions (that is, B-hyper = C-hyper, B-hypo = C-hypo), indicating the methylation status of immune cells against tumors are similar in benign and carcinoma (Figure S2). Most of all, we focused if DMR profiles of PBMC can distinguish corresponding tumor types (benign or carcinoma) as well as Normal. However, only a small number of DMRs were identified from BC, and most BC_DMRs were unique across all DMRs, indicating that they are not explicitly associated with tumor states.
The uniqueness of BC_DMRs was shown in the genomic and CpG regional distribution and gene types linked to DMRs (Fig. 1C-E). Total bins consist of five genomic regions. Compared with the ‘Total bins’, the intron region was increased when the intergenic region was decreased in the ‘Bins used’. Moreover, more numbers of the CpG island, Shore, and Shelf regions were enriched in the ‘Bins used’ compared to the ‘Total bins’. Interestingly, BC_DMRs were enriched in the promoter and exon regions and the CpG island regions, which are more associated with the protein-coding region.
We then analyzed the direction of DMRs using volcano plots and 100% stacked bar charts in eight genomic regions (Fig. 1F). Overall, methylation increased in tumors compared to Normal. In BC, the Carcinoma group was more methylated than the Benign group. Regionally, changes in methylation status were highly dynamic according to the comparison group. In the NB comparison, there were more hypomethylated DMRs in CpG islands, promoter, and exon compared to other regions. Although these characteristics were similarly shown in the NT comparison, hypermethylated DMRs are prominent across all eight regions in the NC comparison.
Nevertheless, exon, promoter, and CpG island regions were highly hypomethylated in the BC comparison. Most of BC_DMRs, indeed, were hypermethylated in carcinoma. It is an essential feature because hypermethylation of certain groups of genes and DMRs might be a cancer-specific signature.
We then tested if DMRs separate each comparison group. The pair-wise hierarchical clustering separated the Normal group from the Benign, Carcinoma, and Tumors groups (Fig. 1G, Figure S3). However, the Benign and Carcinoma groups were not entirely separated from each other, suggesting a new clustering algorithm for PBMC methylome classification for these group differentiation.
Differential methylation accompanies changes in immune cell populations and proliferation in malignant tumor patients.
Several studies have investigated the methylation patterns of blood immune cells, limited to specific target genes and not on a genome-wide scale [17–20]. Since PBMC is a mixture of a wide variety of immune cells, there is a limit to the regulation or role of various immune cells. To this end, single-cell bisulfite sequencing technology has been attempted, but several limitations exist in diagnosing cancer or defining the immune status. We analyzed the whole genome-wide methylation profile obtained from bulk PBMC samples and attempted to confirm various immune status changes in different tumors.
We defined DMGs using DMRs existing in promotor, exon, intron, and TTS and performed gene set enforcement analysis (Fig. 2, Figure S4, and Table S7). Figure 2 shows that the immunocyte-related terms are significantly enriched in Gene Ontology (GO), Mammalian Phenotype Ontology in Mouse Genome Informatics (MGI), and Human Gene Atlas (HGA) databases [21–23]. In all comparative groups, genes involved in signal pathways directly related to cell activity, receptor activity, and cytokine modulation are hypomethylated in tumors (both benign and carcinoma), whereas there is no significant term or pathway found in hypermethylated in carcinoma (Part of ‘GO’ and ‘KEGG’ in Fig. 2).
The MGI and HGA databases, which focus on the function of immune cells, provide clues to infer the immune status in the blood (Part of ‘MGI Mammalian Phenotype’ and ‘Human Gene Atlas’ in Fig. 2). Comparing the normal with the overall tumor, the terms associated with the increase or abnormal function of T-cells, B-cells, and NK cells were high. The comparison between normal and cancer showed that the gene group with higher methylation in cancer PBMC was involved in the increasing or decreasing of B-cells or T-cells. Among T-cell types, the genes associated with the increase in CD8 + T-cells were most highly associated. On the other hand, compared with benign and normal the highly methylated genes in the benign group showed abnormalities in NK and B-cells. The primary immune cell types responding to benign and carcinoma differ. As for the DMR of BC comparison, there was no significant difference in the gene enrichment analysis, as the number was minimal, as shown in Fig. 1B. Through the PBMC DMRs associated with immune responses to tumors, it is expected to find methylation biomarkers that can distinguish the presence or absence of tumors and the malignancy of tumors.
Immune cell markers functionally involved in cell proliferation and activation of B, T, and NK cells are hypermethylated in tumor PBMCs.
Through gene enrichment analysis (Fig. 2), we could expect that methylation of immune cells in tumor patient dogs is involved in the population or activity of specific cell types. The gene enrichment analysis mapped the highest terms. Using text mining for meaningful GO terms in adj. p < 0.1, words containing ‘receptor’, ‘signal’, ‘activity’, ‘pathway’, ‘T cell’, and ‘B cell’ were prominent in all comparisons (Fig. 3A). These enrichments suggest that hypermethylation occurs in immune cells responding to tumors and is involved in signal transduction of immune cells. To confirm whether the methylation change in PBMC is due to the alteration of immune cell populations and or the cell activity, we investigated the DMR distribution on the immune cell type marker genes in PanglaoDB (Fig. 3B). DMGs included in 11 types of immune cell markers are listed in Table S8. First, NB_DMRs was found increasingly on the marker genes of naive B-cells, T-cells, and T helper (Th) cells. Instead, NC_DMRs were found more in B-cells, NK cells, and many subtypes of T-cells. NT_DMRs were found more in naive B-cells, NK cells, and T, Th, and T memory cells, combined with NB and NC. On the contrary, it is of note that myeloid lineage cells, such as monocytes are decreased in tumors.
We then identified the most influenced genes by altered methylation among the cell type markers. Figure 3C shows the cell type marker genes highly enriched in the immune-related GO terms considering the gene expression levels. IL4 was most frequently altered in the GO terms, and the expression decreased significantly. The list of genes, including TBX21, BCL11B, UHRF1, BACH2, SH2D1A, COL4A6, PRDM11, LBH, and TXK, showed tumor-associated hypermethylation and a significant negative correlation to gene expression. The fold-change of methylation and expression and correlation coefficient calculated for every immune-related DMGs are also listed with corresponding DMRs’ genomic features in Table S9. We integrated RNA-seq data to show an association between methylation and gene expression in representative marker genes (Fig. 3D). Among them, BACH2, a B-cell marker; SH2D1A, a T-cell marker; TXK, an NK cell marker; and UHRF1, known to be related to NK cell number, showed a significant negative correlation between the RNA expression and overall gene methylation. These results showed that the well-enriched immune cell markers in the genome-wide methylation changes are closely linked to gene expression and affect overall tumor immune cell activity.
Bisulfite-sequencing validated the tumor-associated differential methylation in immune cell marker genes.
We showed that hypermethylation and gene expression of cell-specific gene markers are inversely correlated with integration analysis of MBD-seq and RNA-seq (Fig. 3C & 3D). Representative DMRs, which have a reverse correlation with the gene expression, verified the methylation status in vitro by the targeted bisulfite-sequencing (BS-seq). BACH2, an active marker gene of B cells, has hypermethylated DMRs consisting of 11 CpGs on the second intron out of six introns in tumors (benign and carcinoma). The SH2D1A gene, a T-cell activity-related marker, has a hypermethylated DMR consisting of seven CpGs in the TTS region in tumors. A representative carcinoma-related hypermethylated DMR was identified from the CpG shore location, consisting of nine CpG promotor-TSS regions of the TXK gene. A DMR harboring 22 CpGs, which were hypermethylated in carcinoma, was identified from the CpG shore region located in the second exon among 17 exons of the UHRF1 gene (Fig. 4A). The four pairs of primers targeting the flanking regions of DMRs used for BS-seq are described in Table S10.
Overall, the DMRs from the MBD-seq analysis were confirmed in the targeted BS-seq. However, the methylation frequency varied from each CpG (Fig. 4B). The targeted DMR of BACH2 was the most hypermethylated in benign, followed by carcinoma. DMR on the UHRF1 was most highly methylated in carcinoma, followed by benign. The methylation levels of TXK were similarly high in benign and carcinoma. In the case of SH2D1A gene sites, only the 5th CpG site was a differentially methylated CpG in tumors. This can still be sufficiently meaningful because studies have reported that even the presence or absence of methylation of a single CpG can affect transcription level and cell type specificity [24]. Figure 4C shows the distribution of methylation percentage across samples calculated as the number of methylated CpGs/total number of clones * 100 (%). The RNA-sequencing results performed on PBMCs of CMTs and normal dogs showed a significant decrease in the expression of these four genes (Fig. 4D). When compared between the methylation (Fig. 4C) and gene expression (Fig. 4D), overall methylation levels on the targeted regions by BS-seq were significantly opposite to RNA expression data. Our targeted BS-seq results confirmed that the high-throughput sequencing analysis after methylated CpG enrichment showed relevant genome-wide methylation status in PBMC samples. It then identified DMRs that may directly link to gene expressions that have crucial roles in cell activity and populations in PBMCs. Validation of MBD-seq results through BS-seq increases the likelihood that they can be developed for clinical tumor diagnosis.
Computational modeling of a PBMC methylome-based two-step classifier distinguished benign and malignant as well as healthy conditions.
Methylome-based classification is a potential diagnostic method that reflects the stage or subtype of tumors. Previous studies have reported the usefulness of tissue methylation-based classifiers in diagnosing CNS tumors [25], bone sarcoma [26], and renal cell carcinoma [27]. Recently, a model using DNA methylation for discriminating cancer from para-cancerous tissue has been developed [28]. To develop a liquid biopsy-based diagnosis, we attempted to establish a model for diagnosing mammary gland tumors using our genome-wide methylome data. Our results thus far showed immune methylome dynamics between normal and tumor PBMCs. However, it was difficult to define specific DMRs or functional terms that differentiate between benign and malignant tumors by PBMC DMRs. For efficient modeling, we devised a method to classify normal and tumor in step 1 (NT classifier), then classify benign and carcinoma in step 2 (BC classifier) and named it a two-step classifier (Fig. 5A). The process for modeling and performance evaluation is depicted in Fig. 5B.
First, NT classifier modeling was performed using 636 common DMRs with FDR-adjusted p-value < 0.1 and log2FC ≥ ± 0.585 in NB DMR and NC DMR (Fig. 5C-E). To overcome the problem that arising from the limited number of samples, 10-fold cross-validation (10-fold CV) was applied. The classifiers were modeled with six ML algorithms (Support Vector Machine with the linear kernel (SVM_L) or the radial kernel (SVM_R), Random Forest (RF), K-Nearest Neighbor (KNN), Gradient Boosting Machines (GBM), and Logistic Regression), and the performance of each was evaluated with the ROC curve (Fig. 5C). NT classifier shows strong performance with AUC = 1 in SVM_L, SVM_R, GBM, and KNN models except for RF (AUC = 0.99) and logistic regression (AUC = 0.7). In both the representative SVM_L confusion matrix and the 10-fold validation result, it is confirmed that benign and carcinoma are classified as T (Tumor) and normal as N (Normal) (Fig. 5D). The accuracy of each model is shown in Figure S5A. The high accuracy and AUC values of NT classifiers remind us that the PBMC methylome profile in tumors is entirely different from normal. To evaluate the predictive ability of the NT classifiers, PBMC MBD-seq data from 6 dogs with mammary gland tumors that were not used for methylome profiling due to uncertain diagnosis were validated in the six NT classifier models (Fig. 5E, the information of 6 unknown donors is listed in Table S11). Except for logistic regression, which had the lowest AUC value, six unknown CMT samples were diagnosed as T (Tumor) in other models.
Next, a BC classifier was developed using significant DMRs with FDR-adjusted p-value < 0.1 and log2FC ≥ ± 0.585 only in NB_DMR and NC_DMR and additional BC_DMR (NB only + NC only + BC DMR = total of 4,122 DMRs). Since the original BC_DMRs with FDR-adjusted p-value < 0.1 failed to cluster benign and carcinoma (Fig. 1G), the same modeling process was performed using 2,911 DMRs with FDR-adjusted p-value < 0.05 (Fig. 5F-H, Figure S5D-E). The BC classifier trained with the 2,911 DMRs showed the highest performance when using SVM_L (AUC = 0.95), followed by GBM (AUC = 0.92). However, the accuracy of SVM_L and GBM was 0.867 and 0.886, respectively, lower than that of the NT classifier (Fig. 5F). The accuracy was about 0.85, which was inferior to that of the NT classifier (Figure S5B). To improve the performance of the BC classifier, the modeling process was repeated one more time with DMRs of high importance in the initially selected model to increase the discrimination between benign and carcinoma (depicted in Fig. 5B). The performance of the models was measured using 127 DMRs, which showed high relative importance in GBM and the highest accuracy in the primary BC classifier (see the bar graph in Fig. 5F). The degree of feature importance and genomic features of 127 DMRs are shown in Table S12. It shows improved accuracy and performance than the first-order classifier using 2,911 DMRs (Fig. 5G-H, Figure S5B-C). As mentioned above, a parallel analysis was also executed with 4,122 DMRs with an FDR-adjusted p-value < 0.1 (Figure S5D-E). The performance of the primary classifier was similar to that using 2,911 DMRs. However, the remodeled classifier using 102 DMRs of high importance in GBM showed slightly lower accuracy than the previous classifier in the confusion matrix of Supplementary Fig. 6E. Both BC classifiers developed with important DMRs have the highest AUC values and accuracy in the SVM_L model. BC_DMR did not differentiate between benign and carcinoma (Fig. 1G). We performed PCA analysis to evaluate whether the DMRs selected for the classifier modeling discriminate between benign and carcinoma (Figure S6). DMRs with higher importance divide the two groups better, indicating that the GBM-based feature importance is relevant. We designed an optimal two-step classifier by utilizing various ML methods and comparing the performance of predictive models. Our result suggests a new diagnostic strategy using the PBMC methylome that can differentiate between normal, benign, and malignant tumors by liquid biopsy.