Methylation states
The overall methylation distribution was studied using methylation states defined in the Methods section. Separate bar plots were made to illustrate the distribution by both methylation states and samples for all CG sites, TSG CG sites, and HK CG sites, see Fig. 1. This figure showed the following patterns. First, when comparing based on methylation states (see the top two plots on the right panel), TSG sites and all CG sites had a very similar distribution. The main difference between the two groups were the percentages of “High-Median” and “Low,” as the “Low” state had a higher percentage (slightly above 30%) in the TSG plot. Second, the methylation states of Alive Tumor and Dead Tumor were dominated by CG sites classified as “Rest” (around 40%) in both TSG sites and all CG sites (see the two tall bars above “Rest” at the right side of the right panels of Fig. 1). This means that the methylation signals of these CG sites have more variation in tumors than in normal tissues (40% vs. <10%). This makes sense considering the heterogeneous nature of tumor tissues. Third, CG sites classified as “Median” (with methylation signals ranging from 0.2 to 0.8) were much fewer in tumors (far less than 10%) than in normal tissues (at least 10%). That is, normal tissues’ partial methylation (i.e., “Median”) patterns were lost or removed in tumor tissues. Finally, there were noticeable differences between the methylation states of the TSG and HK CG sites (second and third panels of Fig. 1). For the TSG sites, the most prevalent methylation state in the Alive Normal and Dead Normal datasets were “Low” (see green bars on the left panel’s middle plot), meaning that the methylation signals across these CG sites are generally low or unmethylated in normal tissues. In the Alive Tumor and Dead Tumor datasets, however, CG sites that were defined as the “Rest” state made up almost 40% of the TSG sites (see purple bars on the left panel’s middle plot). In the HK data, however, over 40% of all four datasets were made up of “Low” methylation states. That is, for HK CG sites, more than 40% of them remained unmethylated in both alive and dead patients’ normal and tumor tissues. This shows that most HK CG sites’ methylation patterns are relatively stable except that a small percentage of sites have changed to be more heterogeneous ones (i.e., the “Rest” state).
DM patterns
DM CG sites were identified by performing paired T-tests using M-values. A CG site was identified as a DM site if its p-value was < 0.05 and its absolute beta-value mean difference was ≥ 0.2. DM sites were summarized in Table 1. This table showed that Dead samples tended to have more DM sites than Alive samples when analyzing all 391,459 CG sites (8.68% vs. 6.94%) and all TSG sites (9.34% vs. 7.55%). The difference was statistically significant, see p-values shown in the last column of Table 1. As shown in Fig. 2, DM CG sites were then split into three different categories: alive-only DM (DM sites only found in the 53-Alive data), dead-only DM (DM sites only found in the 32-Dead data), and overlap DM (DM sites found in both the Alive and Dead data). There were 230 alive-only DM sites, 694 dead-only DM sites, and 1,731 overlap DM sites as shown in Fig. 2.
Table 1
Summary of DM sites and percentages between normal and tumor tissues.
Samples
|
DM sites (count)
|
DM rates (%)
|
p-value
|
Alive (for all 391,459 CG sites)
|
27160
|
6.94%
|
2.2e-16
(Compare: 6.94% vs. 8.68%)
|
Dead (for all 391,459 CG sites)
|
33974
|
8.68%
|
Alive (for all 25,977 TSG CG sites)
|
1961
|
7.55%
|
2.746e-13
(Compare: 7.55% vs. 9.34%)
|
Dead (for all 25,977 TSG CG sites)
|
2425
|
9.34%
|
Unique genes that were associated with DM sites were then found: 162 genes for alive-only DM sites, 483 genes for overlap DM sites, and 310 genes for dead-only DM sites. The shared or common genes between each combination of two DM sites were also identified, and the percentage of shared genes from each individual gene list was calculated. 116 genes were shared between alive-only DM and overlap DM (71.6% of Alive and 24% of overlap), 67 genes were shared between alive-only DM and dead-only DM (41.4% of Alive and 21.6% of Dead), and 234 genes were shared between dead-only DM and overlap DM (48.4% of overlap and 75.5% of Dead). Finally, 61 genes were common in all three DM groups (alive-only DM, dead-only DM, and overlap DM). When studying the number of CG sites on these genes, the authors found that the median and mean number of CG sites in these 61 common TSGs were 38 and 59 respectively. However, the median and mean number of CG sites in all TSGs were only 17 and 22 respectively. The 61 shared or common genes had more CG sites, and they could be relatively long genes. These CG sites might have different functions or have been playing different roles due to promoter usages.
Co-methylation
Co-methylation analysis part 1: high correlation counts
In order to study the co-methylation patterns between DM sites and all CG sites, Spearman correlation coefficients between each DM site and all the 391,459 CG sites were calculated for each of the four datasets: 53 Alive-Tumor, 53 Alive-Normal, 32 Dead-Tumor, and 32 Dead-Normal. There were three categories for the DM sites: alive-only, dead-only, and overlap. In total,12 different correlation matrices were calculated because there were 3 DM categories and 4 types of samples. The CG pairs with absolute correlation coefficients ≥ 0.8 were then further analyzed. These CG pairs were considered as highly correlated sites. For each DM CG site, the following four numbers were counted: the number of CG sites that were highly positively correlated with it, the number CG sites that were highly negatively correlated with it, the number of highly correlated CG sites that were on the same chromosome, and the number of highly correlated CG sites that were on different chromosomes. The authors then compared the co-methylation patterns between normal and tumor tissues as well as between Alive and Dead samples using different figures and tables.
The first 4 columns of Table 2 showed the total number and proportion of highly co-methylated sites for each of the 3 DM categories (overlap, alive-only, and dead-only) for both 53-Alive and 32-Dead breast cancer patients. The last 4 columns of Table 2 showed the total number and proportion of highly co-methylated pairs that were composed of sites on the same or different chromosomes, positively co-methylated, or negatively co-methylated. To see the patterns clearly, the total number of highly correlated sites was plotted, see Fig. 3. Table 2 and Fig. 3 showed that normal tissues had more highly co-methylated pairs than tumor tissues (the green bars are taller than red bars), and Alive Normal had more highly co-methylated pairs than Dead Normal (2 of the 3 left green bars are taller than the corresponding 2 of 3 right green bars). However, in tumor tissues, there were more highly correlated pairs in Dead Tumor than in Alive Tumor (at least 2 of the 3 right red bars are taller than 2 left red bars). For each combination of DM category and sample type, there were more positive than negative co-methylations (≥ 80% vs. ≤20%) and more co-methylations on different chromosomes than on same chromosomes (> 85% vs. <15%). These differences may indicate co-methylation plays a critical role in cancer development.
Table 2
Highly co-methylated pairs on the same and different chromosomes and with positive and negative correlations.
DM sites
|
Samples
|
total.pairs
|
high.cor (%)
|
same.chr (%)
|
diff.chr (%)
|
neg.cor (%)
|
pos.cor (%)
|
Overlap DM
(1731)
|
Alive.Normal
|
676116483
|
1108956 (0.164%)
|
65337 (5.89%)
|
1043619 (94.11%)
|
173580 (15.65%)
|
935376 (84.35%)
|
Alive.Tumor
|
676116483
|
61217 (0.00905%)
|
9068 (14.81%)
|
52149 (85.19%)
|
5368 (8.77%)
|
55849 (91.23%)
|
Dead.Normal
|
676116483
|
828833 (0.1226%)
|
48966 (5.91%)
|
779867 (94.09%)
|
50911 (6.14%)
|
777922 (93.86%)
|
Dead.Tumor
|
676116483
|
343043 (0.0507%)
|
27970 (8.15%)
|
315073 (91.85%)
|
27812 (8.11%)
|
315231 (91.89%)
|
Alive-only DM
(230)
|
Alive.Normal
|
90009005
|
126721 (0.1408%)
|
8728 (6.89%)
|
117993 (93.11%)
|
7618 (6.01%)
|
119103 (93.99%)
|
Alive.Tumor
|
90009005
|
5760 (0.0064%)
|
859 (14.91%)
|
4901 (85.09%)
|
716 (12.43%)
|
5044 (87.57%)
|
Dead.Normal
|
90009005
|
186396 (0.2071%)
|
11456 (6.15%)
|
174940 (93.85%)
|
5734 (3.08%)
|
180662 (96.92%)
|
Dead.Tumor
|
90009005
|
16785 (0.0186%)
|
1552 (9.25%)
|
15233 (90.75%)
|
1102 (6.57%)
|
15683 (93.43%)
|
Dead-only DM
(694)
|
Alive.Normal
|
271431381
|
973059 (0.3585%)
|
62739 (6.45%)
|
910320 (93.55%)
|
194705 (20.01%)
|
778354 (79.99%)
|
Alive.Tumor
|
271431381
|
23288 (0.00858%)
|
3291 (14.13%)
|
19997 (85.87%)
|
3636 (15.61%)
|
19652 (84.39%)
|
Dead.Normal
|
271431381
|
502902 (0.1853%)
|
32766 (6.52%)
|
470136 (93.48%)
|
32206 (6.40%)
|
470696 (93.60%)
|
Dead.Tumor
|
271431381
|
124846 (0.046%)
|
10087 (8.08%)
|
114759 (91.92%)
|
9638 (7.72%)
|
115208 (92.28%)
|
Using statistical tests, the following samples were then further compared: “normal vs. tumor” and “Alive vs. Dead” for the proportions on the same (or different) chromosome and with positive (or negative) correlation. This comparison was done for the 3 groups of DM sites: alive-only DM, dead-only DM, and overlap DM, separately. The test results are listed in Table 3. Only the test p-values of comparing Alive-Normal with Dead-Normal (for dead-only DM and overlap DM sites) were > 0.05 (0.64 and 0.11 respectively). All the other test results were statistically significant with extremely small p-values. In many rows, the percentage differences were at least 3%. For the rows with the percentage difference < 3%, if they still had small p-values in the proportion tests, it could be due to the impact of large counts. The percentage difference of > 5% was considered as biologically significant and meaningful. Figure 4 showed that the co-methylation patterns of three DM groups and various samples (tumor vs. normal, Alive vs. Dead) were very different. For example, Fig. 4’s top left plot showed that the red bars (for tumors) were taller than the green bars (for normal tissues). This means that tumor tissues tend to have larger percentages of highly correlated sites on the same chromosome than the normal tissues, although the counts are not as large as the normal ones. Figure 4’s top left plot showed that the percentage difference between normal and tumor is more obvious in Alive than in Dead samples. Figure 4’s bottom right plot showed two (out of three) blue bars (for Alive) were taller than two corresponding black bars (for Dead). This means that there are notably greater counts and percentages of highly negative co-methylation pairs in Alive samples than in Dead samples. All the above differences suggest that differential methylation leads to the change of co-methylation in various samples.
Table 3
Proportion tests for highly co-methylated pairs of different groups.
Alive vs. Dead
|
Same.Chr
p-value
|
Same.Chr
%
|
Neg.Cor
p-value
|
Neg.Cor
%
|
Alive Normal vs. Dead Normal – Overlap DM
|
0.64079
|
5.89% vs. 5.91%
|
0.00
|
15.65% vs. 6.14%
|
Alive Tumor vs. Dead Tumor – Overlap DM
|
0.00
|
14.81% vs. 8.15%
|
4.16E-08
|
8.77% vs. 8.11%
|
Alive Normal vs. Dead Normal – Alive-only DM
|
1.17E-16
|
6.89% vs. 6.15%
|
0
|
6.01% vs. 3.08%
|
Alive Tumor vs. Dead Tumor – Alive-only DM
|
4.34E-33
|
14.91% vs. 9.25%
|
5.17E-45
|
12.43% vs. 6.57%
|
Alive Normal vs. Dead Normal – Dead-only DM
|
0.11344
|
6.45% vs. 6.52%
|
0.00
|
20.01% vs. 6.4%
|
Alive Tumor vs. Dead Tumor – Dead-only DM
|
3.71E-192
|
14.13% vs. 8.08%
|
0.00
|
15.61% vs. 7.72%
|
Normal vs. Tumor
|
Same.Chr
p-value
|
Same.Chr
%
|
Neg.Cor
p-value
|
Neg.Cor
%
|
Alive Normal vs. Alive Tumor – Overlap DM
|
0.00
|
5.89% vs. 14.81%
|
0.00
|
15.65% vs. 8.77%
|
Dead Normal vs. Dead Tumor – Overlap DM
|
0.00
|
5.91% vs. 8.15%
|
0.00
|
6.14% vs. 8.11%
|
Alive Normal vs. Alive Tumor – Alive-only DM
|
0.00
|
6.89% vs. 14.91%
|
0.00
|
6.01% vs. 12.43%
|
Dead Normal vs. Dead Tumor – Alive-only DM
|
1.52E-55
|
6.15% vs. 9.25%
|
3.57E-127
|
3.08% vs. 6.57%
|
Alive Normal vs. Alive Tumor -- Dead-only DM
|
0.00
|
6.45% vs. 14.13%
|
7.42E-62
|
20.01% vs. 15.61%
|
Dead Normal vs. Dead Tumor -- Dead-only DM
|
1.32E-85
|
6.52% vs. 8.08%
|
1.90E-62
|
6.4% vs. 7.72%
|
Although Table 2 and Fig. 3 showed the total number (i.e., sum) of highly correlated pairs, they did not show the overall distribution for the co-methylation patterns of all DM sites. To thoroughly compare the co-methylation patterns of different datasets (i.e., normal vs. tumor, Alive vs. Dead), the authors counted the number of highly correlated (with correlation |r| ≥0.8) partners (CG sites) per DM site in each dataset. They then drew the density plots for the counts of all DM CG sites in Alive Normal, Alive Tumor, Dead Normal, and Dead Tumor data, as shown in Fig. 5. This figure showed the distribution of these counts. For the 3 types of DM sites, different line types were used: solid lines for overlap DM, dashed lines for alive-only DM, and dashed-dotted lines for dead-only DM. Because the counts were extremely large for some CG sites and the distributions were very skewed, the root 4 transformation was used to show a clear pattern. Below is a summary of the key patterns of Fig. 5.
First, the top left plot showed the comparison of Alive Normal with Dead Normal (i.e., green vs. blue lines). There was a peak around “0” in the Alive Normal (green lines) but not in Dead Normal (blue lines). That is, a larger number of CG sites in Alive Normal than in Dead Normal were not highly correlated with other sites. However, some CG sites were more highly correlated with many sites in Alive Normal (long green tails) than in Dead Normal (no long blue tails). Second, the top right plot showed the comparison of Alive Tumor vs. Dead Tumor (i.e., red vs. black lines). This plot showed the Alive Tumor had a much smaller number of highly correlated sites than the Dead Tumor (see the two red peaks near 0). On the other hand, the Dead Tumor samples had more CG sites with a high number of correlated partners (black lines). Third, the bottom left plot showed the comparison of normal and tumor tissues in the Alive data (i.e., Alive Normal vs. Alive Tumor). It showed that tumor tissues had a larger number of CG sites not highly co-methylated with others (see the two tall red peaks near 0, which were much taller than the green peak around 0). Again, normal tissues had some CG sites highly co-methylated with many CG sites (see the long green tail at the right end). Fourth, the bottom right plot showed the comparison of normal and tumor tissues in the Dead data (i.e., Dead Normal vs. Dead Tumor). This plot showed that Dead Tumor samples had more CG sites with certain correlations, but the correlations were not that strong (see the taller black lines between 0 and 5 on the left side). However, Dead Normal samples had more CG sites with strong co-methylations (see the longer blue tails at the right end). Finally, when comparing the three types of lines (solid, dashed, and dashed-dotted), their differences were much less notable than the differences between the two different colors (e.g., Alive Normal green lines vs. Dead Normal blue lines). This meant that the co-methylation difference among the three types of DM sites (i.e., different types of lines) were less obvious than the difference between various samples (i.e., lines of different colors).
In summary, in normal tissues, some CG sites tended to be highly co-methylated with many other CG sites. In tumor tissues, some of these co-methylations were lost and some new co-methylation relationships were developed. These patterns can be seen in both Alive and Dead samples. Meanwhile, the Alive and Dead samples have different co-methylation patterns. The above patterns may indicate that genes tend to maintain certain functions and interactions through co-methylation in normal tissues or in the early stage of cancers (as shown in Alive samples). However, with differential methylation, these co-methylation patterns were lost or weakened in tumor tissues or in the later stage of cancers (as shown in Dead samples).
Co-methylation analysis part 2: Investigating co-methylation changes based on 8x8 matrices
Following previous examples [15, 16], the authors calculated several 8x8 matrices as explained below. These matrices showed in detail how co-methylation patterns of DM sites changed from normal to tumor tissues in both Alive and Dead samples. For both normal and tumor data, the correlation coefficient values were divided into 8 intervals: [− 1, − 0.75), [− 0.75, − 0.5), [− 0.5, − 0.25), [− 0.25, 0), [0, 0.25), [0.25, 0.5), [0.5, 0.75), and [0.75, 1). Each CG pair in the TSG correlation matrix was mapped to its corresponding category. For example, if a CG pair had a correlation coefficient of 0.4 in the normal tissue and 0.8 in the tumor tissue, it would be counted in the cell with the row name normal [0.25, 0.5) and column name tumor [0.75, 1). For both Alive and Dead samples, 8x8 matrices were calculated for each of the 3 DM categories: alive-only DM, dead-only DM, and overlap DM. That is, in total there were 6 of these 8x8 correlation matrices, which were shown in the Supplemental File 1. Figure 5 showed that the co-methylation differences among the 3 types of DM sites were less notable than the differences among various samples (tumor vs. normal, Alive vs. Dead samples). Therefore, to save some space, only 2 of these 8x8 matrices calculated based on the overlap DM sites were shown in the main text, see Table 4. Each number in the 8x8 matrix was a count. The percentage of CG sites for each combination was the count divided by the total number of CG sites in a specific row of the normal correlation range. For example, the count in the first cell was 4,297 (0.757%), which meant 4,297 CG pairs (i.e., 0.757% of the total pairs) had correlation coefficients in the range of [-1, 0.75) in the Alive Normal data, and they remained in [-1, 0.75) in the Alive Tumor data.
Table 4
Two 8x8 matrices showing co-methylation changes between normal and tumor tissues
Normal
|
8 correlation intervals for tumor tissues (Overlap DM sites)
|
Alive
|
T[-1, -0.75)
|
T[-0.75, -0.5)
|
T[-0.5, -0.25)
|
T[-0.25, 0)
|
T[0, 0.25)
|
T[0.25, 0.5)
|
T[0.5, 0.75)
|
T[0.75, 1)
|
N[-1, -0.75)
|
4297
|
73818
|
182298
|
201921
|
88579
|
16019
|
785
|
4
|
|
0.7569%
|
13.0025%
|
32.1105%
|
35.5669%
|
15.6026%
|
2.8216%
|
0.1383%
|
0.0007%
|
N[-0.75, -0.5)
|
9387
|
446831
|
2224635
|
4629139
|
3733207
|
1167473
|
96376
|
496
|
|
0.0763%
|
3.6305%
|
18.0754%
|
37.6122%
|
30.3327%
|
9.4858%
|
0.7831%
|
0.0040%
|
N[-0.5, -0.25)
|
6650
|
564390
|
6011579
|
24309202
|
26269230
|
7781659
|
605428
|
4656
|
|
0.0101%
|
0.8610%
|
9.1706%
|
37.0834%
|
40.0734%
|
11.8708%
|
0.9236%
|
0.0071%
|
N[-0.25, 0)
|
3867
|
526811
|
10131385
|
62857780
|
86421033
|
26476788
|
1905978
|
15584
|
|
0.0021%
|
0.2797%
|
5.3793%
|
33.3748%
|
45.8858%
|
14.0580%
|
1.0120%
|
0.0083%
|
N[0, 0.25)
|
2414
|
350385
|
8278526
|
66488860
|
1.20E + 08
|
47972035
|
4210138
|
39078
|
|
0.0010%
|
0.1415%
|
3.3427%
|
26.8471%
|
48.5817%
|
19.3703%
|
1.7000%
|
0.0158%
|
N[0.25, 0.5)
|
897
|
152662
|
3177399
|
26650227
|
59823925
|
35385440
|
4881785
|
61851
|
|
0.0007%
|
0.1173%
|
2.4416%
|
20.4790%
|
45.9710%
|
27.1915%
|
3.7513%
|
0.0475%
|
N[0.5, 0.75)
|
255
|
38374
|
627733
|
4235769
|
11121962
|
10632349
|
2638530
|
65056
|
|
0.0009%
|
0.1307%
|
2.1381%
|
14.4270%
|
37.8813%
|
36.2137%
|
8.9868%
|
0.2216%
|
N[0.75, 1)
|
4
|
905
|
30043
|
196838
|
611727
|
895235
|
432303
|
30042
|
|
0.0002%
|
0.0412%
|
1.3674%
|
8.9590%
|
27.8425%
|
40.7463%
|
19.6761%
|
1.3673%
|
Dead
|
T[-1, -0.75)
|
T[-0.75, -0.5)
|
T[-0.5, -0.25)
|
T[-0.25, 0)
|
T[0, 0.25)
|
T[0.25, 0.5)
|
T[0.5, 0.75)
|
T[0.75, 1)
|
N[-1, -0.75)
|
2157
|
17975
|
34195
|
45471
|
38048
|
16498
|
2191
|
25
|
|
1.3777%
|
11.4812%
|
21.8415%
|
29.0438%
|
24.3025%
|
10.5378%
|
1.3995%
|
0.0160%
|
N[-0.75, -0.5)
|
13472
|
244397
|
1112990
|
2657168
|
2616850
|
1030260
|
129771
|
1918
|
|
0.1726%
|
3.1306%
|
14.2566%
|
34.0365%
|
33.5200%
|
13.1969%
|
1.6623%
|
0.0246%
|
N[-0.5, -0.25)
|
27130
|
795739
|
6105060
|
19946327
|
22732054
|
9355171
|
1238490
|
19905
|
|
0.0451%
|
1.3214%
|
10.1379%
|
33.1225%
|
37.7484%
|
15.5350%
|
2.0566%
|
0.0331%
|
N[-0.25, 0)
|
26559
|
1184801
|
10812917
|
41537423
|
57090359
|
27525978
|
4224644
|
81291
|
|
0.0186%
|
0.8315%
|
7.5889%
|
29.1523%
|
40.0679%
|
19.3186%
|
2.9650%
|
0.0571%
|
N[0, 0.25)
|
17384
|
1084594
|
11855469
|
52334484
|
8.43E + 07
|
47341213
|
8402419
|
196268
|
|
0.0085%
|
0.5276%
|
5.7668%
|
25.4570%
|
41.0294%
|
23.0281%
|
4.0872%
|
0.0955%
|
N[0.25, 0.5)
|
7709
|
586844
|
7584976
|
41334942
|
81234297
|
53617788
|
11292281
|
347076
|
|
0.0039%
|
0.2994%
|
3.8698%
|
21.0886%
|
41.4448%
|
27.3552%
|
5.7612%
|
0.1771%
|
N[0.5, 0.75)
|
1510
|
126824
|
1643208
|
9757545
|
22967367
|
20208793
|
6471237
|
327785
|
|
0.0025%
|
0.2062%
|
2.6717%
|
15.8648%
|
37.3427%
|
32.8575%
|
10.5216%
|
0.5329%
|
N[0.75, 1)
|
21
|
3841
|
48287
|
234001
|
590675
|
863640
|
561199
|
57167
|
|
0.0009%
|
0.1628%
|
2.0471%
|
9.9202%
|
25.0410%
|
36.6131%
|
23.7914%
|
2.4235%
|
Table 4’s top panel is for Alive samples and the bottom panel is for Dead samples of the overlap DM sites’ correlation or co-methylation with all 391,459 CG sites. For each correlation level or interval, there are two rows. One row is the count, and the next row is the corresponding percentage. The first column is for normal tissues’ correlation intervals (i.e., “N” means normal). The other 8 columns are tumor tissues’ correlation intervals (i.e., “T” means tumor).
In order to compare the 53-Alive and 32-Dead samples’ co-methylation changes between normal and tumor tissues, the authors calculated the percentage differences between corresponding Dead and Alive intervals, “% dead - % alive”, for each DM category: alive-only, dead-only, and overlap. They then plotted heatmaps for these three “% dead - % alive” matrices as shown in Fig. 6. For the three heatmaps, there are a few striking patterns. First, the left heatmap (for alive-only DM) had a lighter overall color. However, the middle and right heatmaps had a much darker red color. This means that the dead-only DM (middle) and overall DM (right) had a higher percentage of change than alive-only DM (left). Second, there were darker red and blue colors in the middle rows (i.e., around the [-0.5, 0.5) intervals) of the 3 heatmaps. These darker colors represented larger “% dead - % alive” differences. That is, the differences in percentages were greatest in the tumor [-0.5, 0.5) intervals. This means that many CG pairs had strong correlations in the normal tissues, but they lost those strong correlations in the tumor tissues. Third, this graph showed two dark red squares on the left side of the right heatmap (for overlap DM). These dark squares meant that the largest percentage of change between Alive and Dead data was from highly negative correlations [-1, -0.75) in normal data to “sleeping” correlations in tumor data. Note, “sleeping” meant no/low correlation in the range of -0.5 to 0.5. The above pattern-changes showed that when comparing Dead and Alive samples, there were more co-methylation changes between normal and tumor tissues in Dead samples than in Alive samples. This difference may indicate that co-methylation played a key role in cancer development. The change of the methylation levels (i.e., differential methylation) led to the change of co-methylation in various samples and at different cancer stages.
To see the patterns of normal to tumor co-methylation changes clearly, the 8x8 matrices were further condensed into 4x4 matrices that contained the following 4 intervals: [-1, 0.5), [-0.5, 0), [0, 0.5), [0.5, 1). Essentially, the authors combined the first 2 rows and columns, the middle 4 rows and columns, and the bottom 2 rows and columns of the 8x8 matrices. To modify the 8x8 matrices, they took the sum of the raw counts for each corresponding interval and then divided that sum by the total sum of the 4 rows/columns involved. This gave the percentage of CG pairs in each cell.
Bar plots were created to show the distribution of tumor correlation coefficients for each level of normal correlation, see Fig. 7. This figure showed the co-methylation change from normal to tumor based on the 4x4 matrix. These bar plots showed that for all DM groups and all normal correlation levels, tumor [-0.5, 0) and tumor [0, 0.5) intervals were the most frequent correlation levels (see the orange and light-green bars). Additionally, for normal [-1, -0.5) and normal [-0.5, 0) intervals (i.e., the top two plots), all DM groups had a higher count in Alive than Dead samples (that is, left bar-clusters were taller than the right bar-clusters). However, for normal [0, 0.5) interval (i.e., the bottom left plot), Alive and Dead data had very similar frequencies, while for normal [0.5, 1) interval (i.e., the bottom right plot), Dead was more frequent than Alive (that is, the bar-clusters of Dead samples were taller than the ones of Alive samples). In all intervals except for the normal [-1, -0.5) interval (i.e., in the top right plot and the bottom two plots), most CG sites tend to fall into the tumor [0, 0.5) interval (i.e., the light-green bars tended to be taller than other bars). As for the normal [-1, -0.5) interval (i.e., the top left plot), most CG sites tended to fall into the tumor [-0.5, 0) interval (orange bars). Within each of the 4 plots, the overall pattern/distribution of the CG pair intervals was similar among the 3 different DM types (alive-only DM, dead-only DM, and overlap DM). The above summary showed that the strong negative and positive correlations in the normal tissues tended to be weakened or removed in the tumor tissues (see the top left and bottom right plots). Normal tissues’ weak or low correlations tended to remain in the same or nearby intervals, except that some of these were changed to high correlations (see the purple bars on the bottom left plot).
Co-methylation analysis part 3: Identifying genes with complex co-methylation changes
To find important genes involved in complex co-methylation changes, the authors identified the genes associated with the CG sites that fell within certain correlation levels (or intervals) shown in the 8x8 matrix. This was done for the overlap DM sites in Alive and Dead data separately. A negative to positive CG pair (neg2pos) was defined as one that had a highly negative correlation coefficient in the normal data but a highly positive correlation coefficient in the tumor data. The inverse of that was referred to as a positive to negative (pos2neg) correlation change. A sleeping CG site pair (sleeping.neg or sleeping.pos) was defined as one that had a low or 0 correlation coefficient (i.e., “sleeping”) in the normal data and a highly positive or negative correlation coefficient in the tumor data. Similarly, a shutting-down CG site pair (neg2shutting or pos2shutting) was defined as one that had a highly negative or positive correlation coefficient in the normal data but a low correlation coefficient (i.e., “shutting”) in the tumor data. The specific interval criteria and shorthand notation for the 6 categories were outlined in Table 5. From this point forward, the categories will be referred to as “panels” and use the indicated shorthand.
Table 5
Definition of co-methylation pattern changes from normal to tumor tissues.
Panel
|
Panel Shorthand
|
Normal Interval
|
Tumor Interval
|
Normal Negative to Tumor Positive
|
neg2pos
|
[-1, -0.75)
|
[0.5, 1)
|
Normal Positive to Tumor Negative
|
pos2neg
|
[0.75, 1)
|
[-1, − 0.5)
|
Normal Sleeping to Tumor Negative
|
sleeping.neg
|
[-0.25, 0.25]
|
[-1, -0.75)
|
Normal Sleeping to Tumor Positive
|
sleeping.pos
|
[-0.25, 0.25]
|
[0.75, 1)
|
Normal Negative to Tumor Shutting
|
neg.shutting
|
[-1, -0.75)
|
[-0.1, 0.1]
|
Normal Positive to Tumor Sleeping
|
pos.shutting
|
[0.75, 1)
|
[-0.1, 0.1]
|
Gene analyses were further conducted by looking at the CG sites associated with TSGs and non-TSGs separately. All TSGs and non-TSGs associated with CG sites for each correlation (or co-methylation) change were obtained. The number of CG sites per gene in each of the 6 correlation changes was used to find genes of interest. For each of those 6 changes/panels listed in Table 5, Ng.Alive and Ng.Dead were used to denote the numbers of CG sites associated with a gene (g) in Alive and Dead data respectively. The 4 selection methods (A, B, C, and D) were listed below. Essentially, the authors separated the data into 4 categories: TSG overlap-DM Alive data, TSG overlap-DM Dead data, non-TSG overlap-DM Alive data, and non-TSG overlap-DM Dead data.
-
Only-alive for each change/panel: Ng.Alive ≥3 and Ng.Dead = 0.
-
Only-dead for each change/panel: Ng.Alive =0 and Ng.Dead ≥3.
-
Shared in ≥ 4 same panels: the gene has Ng.Alive ≥3 and Ng.Dead ≥ 3 in at least 4 of 6 common panels (neg2pos, pos2neg, sleeping.neg, sleeping.pos, neg.shutting, pos.shutting). That is, the authors extracted CG sites that were present in the same ≥ 4 panels between the Alive and Dead samples and were also associated with ≥ 3 CG sites.
-
Difference: |Ng.Alive - Ng.Dead | ≥ 3 and both Ng.Alive and Ng.Dead were not zero in at least 4 panels (neg2pos, pos2neg, sleeping.neg, sleeping.pos, neg.shutting, pos.shutting). That is, the authors obtained CG sites that were present in ≥ 4 same panels between Alive and Dead samples and had an absolute difference of ≥ 3 CG sites when comparing the respective genes/panels between Alive and Dead datasets.
Based on the selection methods A and B (i.e., only-alive or only-dead for each panel), genes were further filtered by only extracting those associated with ≥ 3 CG sites in each panel. The selected TSGs and non-TSGs were shown in Table 6. This table showed that Dead samples (2nd and 4th rows) had more genes associated with co-methylation changes than Alive samples (1st and 3rd rows). For the non-TSG panels (see Table 6 last two rows), there was a notable difference between the raw number of extracted genes in the Alive and Dead samples. In 5 out of 6 panels (neg2pos, po2neg, sleeping.neg, sleeping.pos, pos.shutting), the Dead samples contained a larger number of genes. Most notably, for the sleeping.pos and pos.shutting panels, Alive data contained 20 and 67 genes respectively, while Dead data contained 2,288 and 2,193 genes respectively. In the neg.shutting category, however, Alive samples contained 605 genes while Dead samples contained 100 genes. When looking at the TSGs (see Table 6 first two rows) the neg.shutting genes in the Alive data were similar to the Dead data (20 vs. 23). However, there were also more pos.shutting genes in Dead samples than in Alive samples (61 vs. 1). Furthermore, when examining genes found in Alive samples for each change, no genes were found in pos2neg and sleeping.pos panels. The neg.shutting panel had the largest number of TSGs (20 genes). Additionally, the sleeping.pos panel had the most genes associated with it (81 genes) in the Dead data.
Based on the selection method C (Alive and Dead shared in ≥ 4 same panels), 33 TSGs and 68 non-TSGs were identified. In the non-TSG data, only EIF2B5 had changes (or were present) in all 6 panels at high frequencies. That is, only 1.5% (1/68) of the multiple panel and high frequency genes were present in all 6 panels. In the TSG data, TP73, EPHB3, and MAD1L1 were present in all 6 panels with high frequencies. That is, 9% (3/33) of the multiple panel and high frequency genes were present in all 6 panels.
Based on the selection method D (Alive and Dead difference), 30 TSGs and 92 non-TSGs were identified. The only gene with changes in all 6 panels for the TSG dataset was TP73. In the non-TSG dataset, only DIIP2C and EIF2B5 were present in all 6 panels. In both datasets, the Dead-counts were typically higher than the Alive-counts. The only exceptions to this were pos.shutting counts for the non-TSG DIP2C (difference of -5) and the neg.shutting counts for the non-TSG EIF2B5 (difference of -9).
Table 6
Summary of number of genes with different co-methylation changes.
Category
|
neg2pos
|
pos2neg
|
sleeping.neg
|
sleeping.pos
|
neg.shutting
|
pos.shutting
|
TSG identified by Method A:
Only Alive for each change or panel, Ng.Alive ≥3 and Ng.Dead = 0
|
1
|
0
|
2
|
0
|
20
|
1
|
TSG identified by Method B:
Only Dead for each change or panel, Ng.Alive =0 and Ng.Dead ≥3
|
11
|
17
|
49
|
81
|
23
|
61
|
Non-TSG identified by Method A: Only Alive for each change or panel, Ng.Alive ≥ 3 and Ng.Dead = 0
|
4
|
5
|
5
|
20
|
605
|
67
|
Non-TSG identified by Method B: Only Dead for each change or panel, Ng.Alive = 0 and Ng.Dead ≥3
|
38
|
123
|
601
|
2288
|
100
|
2193
|
All genes identified by methods C and D were shown in the Supplemental File 2, which is an excel file with 4 sheets. These 4 sheets summarized the detailed co-methylation changes for 33 TSGs and 68 non-TSGs identified based on method C as well as 30 TSGs and 92 non-TSGs identified by method D. The above co-methylation analysis showed that multiple CG sites on the same gene can have different and complex co-methylation relationships with other CG sites in the genome. These co-methylation patterns changed between tumor and normal tissues and between Alive and Dead samples (i.e., at different cancer stages). These patterns and changes indicated complex interactions between genes, which might affect tumor growth and thus cancer development. The differences in certain co-methylation changes (e.g., sleeping.pos or pos.shutting) might play an important role in differentiating between Alive and Dead samples and between non-TSGs and TSGs.
Co-methylation analysis part 4: Network and pathway analysis
The ConsensusPath Database (CPDB) analysis [17–19] was conducted for the 30 TSG and 92 non-TSG lists identified based on the method D. These genes were selected based on |Dead – Alive| ≥ 3 and both were non-zero counts in at least 4 panels. This analysis was to further investigate how the selected significant genes interacted with others. The CPDB analysis showed different interactions and relationships among the genes from this study and other genes in the genome; see Fig. 8 (based on 30 TSGs) and Fig. 9 (based on 92 non-TSGs).
In the TSG CPDB figure (Fig. 8), TP73, PPP1CA, PAX6, POU6F2, ESR1, TFAP2A, and ZIC1 were notable hub genes (i.e., genes with many interactions). The first 5 genes mainly had key protein interactions with other genes (see the orange lines). TFAP2A and ZIC1 mainly had gene regulatory interactions with other genes (see the light blue lines). Additionally, most entities were proteins while a few were genes, and some intermediate nodes were protein complexes. TFAP2A and ZIC1 were extremely interconnected as they shared 6 gene intermediate nodes through gene regulatory interactions (see the right side of Fig. 8). As for the key hub gene ESR1, patients with ESR1 mutant-positive metastases exhibited poor overall survival [20]. PAX6 was also found to be highly expressed in breast cancer tumors and was involved in breast cancer cell proliferation and tumor progression [21]. ZIC1 was found to be negatively correlated with surviving. It was an inhibitor of apoptosis in breast cancer. Because it acted as a tumor suppressor, targeting ZIC1’s expression might be useful when developing a possible treatment method [22]. TP73-AS1 was upregulated in breast cancer tumors and associated with poorer prognosis [23]. Lastly, PPP1CA might serve as a possible breast cancer biomarker because it played a prominent role in the cancer progression by inhibiting the cell proliferation and migration [24].
For the non-TSG CPDB figure (Fig. 9), most entities were proteins and RNAs, while a few were genes. Most of the relationships among these entities were protein interactions (see orange lines). From the 92 gene list, HDAC4, MAGI1, and RPTOR, served as hub genes in this figure. Additionally, q69yn44, HNRNPL, HSPA5, DENND1A, SH3PXD2A, INPP5E, SALM2, and KIF13B were intermediate nodes that served as hub genes, which were not from the 92 non-TSG list. The last 5 of these genes were in the top left part of this figure. One interesting hub gene is q69yn44, which was not identified by this study, but it has many interactions with other genes. This gene is a protein coding gene. It is involved in mRNA methylation, mRNA alternative polyadenylation, and RNA binding activity [25, 26]. It acted as a key regulator of mammalian mRNA methylation N6-methyladenosine (m6A) methylation. It did so with METTL3 and METTL14, and m6A was linked to mRNA processing and regulates gene expression.
Among these non-TSG hub genes, HNRNPL has been found to affect gene expressions and was related to the occurrence of breast cancer. Silencing of HNRNPL impacted the expression of NSP 5a3a, B23, and reduced interactions with IncRNA [27]. It thus served as an important gene in suppressing tumor invasion. This suggested that HNRNPL could be targeted when looking for breast cancer treatment options. The hypomethylation of CG sites associated with RPTOR in blood was also found to be associated with breast cancer, presenting it as a promising biomarker for breast cancer detection [28]. HDAC4 was also found to play a role in breast cancer growth and invasion [29]. The targeting of HDAC4 could potentially decrease breast cancer cell growth and invasion. MAGI1 was reported as a new potential tumor suppressor gene in estrogen receptor positive breast cancer [30].