Hemimethylated CpG sites for both normal and tumor cells are identified using the Wilcoxon tests. Table 1 describes the proportions of hemimethylation sites that are in clusters depending on the p-value (p < 0.05) and three mean difference cutoff values. There are similar numbers of hemimethylation sites in tumor and normal samples, but the proportion in clusters is slightly higher in normal samples. When comparing the proportions between normal and tumor, we get the following three p-values, 0.00039, 0.00035, and 0.277, for the three mean difference cutoff values 0.4, 0.6, and 0.8 respectively. For the rest of this paper, our analysis will focus on the hemimethylation sites identified based on the p-value of 0.05 and the absolute mean difference greater than or equal to 0.4.
|Mean difference|
|
Normal
|
Tumor
|
Total
|
Sites in clusters
|
Percentage
|
Total
|
Sites in clusters
|
Percentage
|
≥0.4
|
7351
|
1510
|
20.54%
|
7330
|
1336
|
18.23%
|
≥0.6
|
2588
|
348
|
13.45%
|
2743
|
282
|
10.28%
|
≥0.8
|
723
|
53
|
7.33%
|
823
|
49
|
5.95%
|
Table 1: Number of hemimethylated CpG sites and percentage of sites in clusters. Each row is for a mean difference level. The two panels (3 columns each) are for normal and tumor samples respectively.
Tumor and normal samples’ hemimethylation CpG sites are compared in Table 2. The first row of this table, i.e., the T.MU row, indicates the total number of MU hemimethylation CpG sites in tumor (T) cells. Among these sites, 1697 of them are also hemimethylated in normal cells (N.MU), 1688 of them are not significantly hemimethylated in normal (N.NS), and 217 of them have no data in normal cells (N.NA). The first column of Table 2, i.e., the N.MU column, shows the total number of MU hemimethylation CpG sites in normal (N) cells. Among these sites, 1697 of them are also hemimethylated in tumor cells (T.MU), 1728 of them are not significantly hemimethylated in tumor (T.NS), and 268 of them have no data in tumor cells (T.NA).
|
N.MU
|
N.UM
|
N.NS
|
N.NA
|
T.MU
|
1697
|
0
|
1688
|
217
|
T.UM
|
0
|
1597
|
1892
|
239
|
T.NS
|
1728
|
1789
|
1895429
|
101322
|
T.NA
|
268
|
272
|
98209
|
27295013
|
Table 2: Comparison of normal and tumorous hemimethylation site patterns. Each row is for the tumor (T) sample and each column is for the normal (N) sample with various hemimethylation types. T.MU refers to CpG sites that are methylated (M) on the forward strand and unmethylated (U) on the reverse strand in tumor (T) samples. N.MU refers to CpG sites with the MU hemimethylation in normal (N) samples. T.NS and N.NS refer to CpG sites of a corresponding tissue type that are not significantly hemimethylated. Similarly, T.NA and N.NA refer to CpG sites that have no data for the given cell type.
|
|
|
Tumor and normal samples’ hemimethylation clusters are compared in Table 3. This table shows that most clusters only have two or three CpG sites and cluster frequency decreases with increased cluster length, meaning large congregations of hemimethylation are infrequent.
Cluster Pattern
|
Normal
|
Tumor
|
MMMMMMMMMMMM-UUUUUUUUUUUU
|
1
|
1
|
MMMMMMMMMM-UUUUUUUUUU
|
1
|
1
|
MMMMMMMM-UUUUUUUU
|
2
|
2
|
MMMMMMM-UUUUUUU
|
2
|
2
|
MMMMMM-UUUUUU
|
5
|
3
|
MMMMM-UUUUU
|
6
|
7
|
MMMM-UUUU
|
18
|
13
|
MMM-UUU
|
55
|
32
|
MM-UU
|
168
|
153
|
MMU-UUM
|
0
|
1
|
MU-UM
|
28
|
32
|
UMM-MUU
|
1
|
0
|
UM-MU
|
7
|
4
|
UUM-MMU
|
1
|
0
|
UU-MM
|
195
|
172
|
UUU-MMM
|
52
|
44
|
UUUU-MMMM
|
22
|
22
|
UUUUU-MMMMM
|
9
|
14
|
UUUUUU-MMMMMM
|
3
|
4
|
UUUUUUM-MMMMMMU
|
0
|
1
|
UUUUUUU-MMMMMMM
|
4
|
3
|
UUUUUUUM-MMMMMMMU
|
1
|
0
|
UUUUUUUU-MMMMMMMM
|
2
|
2
|
Total
|
583
|
513
|
Table 3: Normal and tumor hemimethylation cluster patterns. The first column is the cluster pattern, separating forward and reverse strands by “-”. The second and third columns are the counts of such patterns in normal and tumor samples respectively.
The length of a cluster is defined as the total number of base pairs between the first and the last CpG sites in the cluster. Figure 2 shows 4 histograms of cluster lengths. These histograms display the length distributions of polarity patterns in tumor, polarity patterns in normal, regular patterns in tumor, and regular patterns in normal samples. Regular and polarity patterns are analyzed separately because polarity clusters tend to be much shorter. In fact, many of the polarity clusters are less than 40 base pairs long and a majority of them are less than 10 base pairs long (see peaks in the top panels of Figure 2). Many of the regular clusters are relatively short, i.e., less than 60 base pairs long, but a small amount of them are longer than that with a maximum length of around 100 to 120 base pairs. A Wilcoxon rank-sum test is performed to compare the difference between the lengths of clusters in normal and tumor cells. The test result is insignificant (p-value =0.12).
Regular Clusters
|
Normal
|
Tumor
|
MM-UU
|
168
|
30.66%
|
153
|
32.075%
|
UU-MM
|
195
|
35.58%
|
172
|
36.059%
|
Bigger cluster
|
185
|
33.76%
|
152
|
31.866%
|
Total
|
548
|
100%
|
477
|
100%
|
Table 4: Regular clusters with corresponding percentages. Bigger clusters (see the fourth row) are the ones with 3 or more hemimethylated CpG sites.
Polarity Clusters
|
Normal
|
Tumor
|
MU-UM
|
28
|
80%
|
32
|
88.89%
|
UM-MU
|
7
|
20%
|
4
|
11.11%
|
Total
|
35
|
100%
|
36
|
100%
|
Table 5: Polarity clusters with corresponding percentages.
|
|
|
For the two main hemimethylation cluster patterns, regular cluster and polarity cluster, we summarize them in detail in Table 4 and Table 5. Table 4 describes the proportions of different regular clusters in normal and tumor DNA. Table 5 describes the proportions of different polarity patterns in normal and tumor DNA. Polarity clusters appear less frequently than regular patterns, as seen by the difference in the number of sites between Tables 4 and 5. For example, tumor samples have a total of 477 regular clusters and only 36 polar clusters.
One way to detect which clusters may be related to cancer is to compare the cluster locations between tumor DNA and normal DNA. Some clusters may appear in the same sites in both tumor and normal samples, but others may be found only in tumor or only in normal. In Figure 3, we show two typical hemimethylation clusters: one that is only identified in tumor DNA and one that is only identified in normal DNA. The first two pairs of bars represent two CpG sites in normal DNA. The second two represent two CpG sites in tumor DNA. We see in the first (or left) plot that there is a large difference between the forward and reverse strands in the tumor CpG sites, whereas the normal CpG sites are quite similar. This tells us that there is a cluster containing two CpG sites that is found only in tumor DNA. Similarly, the second (or right) plot describes a cluster that appears only in normal DNA. In fact, there is almost no methylation in the tumor reverse strands, while the normal reverse strands are almost fully methylated. The forward strand methylation levels are similarly low in tumor and normal DNA, so we observe normal-only hemimethylation in the two sites.
In order to study hemimethylation patterns thoroughly, we compare the 513 tumor clusters with the 583 normal clusters and summarize the results in Table 6. This table shows that multiple kinds of overlaps can be found between tumor and normal. Hemimethylation clusters that occur only in tumor or normal samples are shown in Column B. 695 (313 tumor only and 382 normal only) clusters fall into these categories, and these are the clusters or regions that may be associated with cancer. Column C counts the number of clusters that are exactly the same for normal and tumor. Column D indicates the situations in which a tumor cluster begins and ends within a normal cluster (i.e., tumor cluster contained within the bounds of a normal cluster), and vice versa as shown in Column E. For example, a tumor cluster’s start and end positions on a chromosome are 150 and 170 base pairs. It is located within a normal cluster that has the start and end positions of 120 and 190 base pairs. Column D, which represents tumor clusters that are embedded in normal clusters, shows different counts for normal and tumor samples because there are two instances of multiple normal clusters located in one tumor cluster. Similarly, Column E, which represents normal clusters that are embedded in tumor clusters, shows different counts because there are three tumor clusters that are located in one normal cluster. Column F represents all other kinds of overlap. For example, there are two normal clusters that have some overlap with the same tumor cluster.
The second row of Table 6 shows that among the 513 tumor clusters, 313 of them belong to tumor only; 140 clusters also show up in normal samples; 25 tumor clusters are short ones and they are located within long normal clusters; 23 tumor clusters are long ones in which short normal clusters are located; and 12 tumor clusters are partially overlapped with normal clusters. The third row of Table 6 shows that among the 583 normal clusters, 382 of them belong to normal only; 140 clusters also show up in tumor samples; 23 normal clusters are long ones and they cover short tumor clusters; 25 normal clusters are short ones and they are located within long tumor clusters; and 13 normal clusters are partially overlapped with tumor clusters. A detailed version of Table 6 is shown in the Supplemental Table 1, in which the number of different clusters in each chromosome is listed for both tumor and normal samples.
A
|
B
|
C
|
D
|
E
|
F
|
Tumor Total
513
|
Tumor Only
313
|
Exact Overlap
140
|
Tumor in Normal
25
|
Normal in Tumor
23
|
Other Overlap
12
|
Normal Total
583
|
Normal Only
382
|
Exact Overlap
140
|
Tumor in Normal
23
|
Normal in Tumor
25
|
Other Overlap
13
|
Table 6: Tumor and normal cluster comparison results. Columns are for different overlap (or non-overlap) patterns. The two rows are for tumor and normal, respectively.
After identifying hemimethylated CpG sites, we may also map them back to genes. That is, we provide the annotation for each CpG site by providing the gene name in whose gene body or promoter region a hemimethylation site is located. We call this analysis gene annotation and summarizing such will provide the frequency on how many hemimethylated CpG sites a gene has. This annotation analysis is important because highly hemimethylated genes may play an important role. Table 7 shows the frequency of hemimethylated CpG sites in gene bodies. Each column shows how many genes have n hemimethylated CpG sites in their gene bodies, where n is given in the first row. The second row describes the distribution for tumor genes and the third row describes the distribution for normal genes. Similarly, Table 8 describes the frequency of hemimethylated CpG sites in promoter regions. Table 7 displays that the large majority of gene bodies have at most 3 hemimethylated CpG sites in both tumor and normal samples, but a few have more than 10. When looking at promoter regions, Table 8 shows none have 10 or more and the large majority of genes have 1 or 2 hemimethylated CpG sites.
No. of Hemimethylation sites per gene body
|
1
|
2
|
3
|
4
|
5
|
6
|
7
|
8
|
9
|
>=10
|
Tumor
|
1133
|
250
|
79
|
37
|
17
|
4
|
7
|
2
|
0
|
4
|
Normal
|
1118
|
229
|
73
|
32
|
11
|
4
|
3
|
1
|
1
|
5
|
Table 7: Hemimethylation frequency measured in gene bodies for both tumor and normal samples.
No. of Hemimethylation sites per prom region
|
1
|
2
|
3
|
4
|
5
|
6
|
7
|
8
|
Tumor
|
223
|
23
|
5
|
6
|
0
|
2
|
0
|
1
|
Normal
|
256
|
36
|
13
|
3
|
2
|
1
|
1
|
0
|
Table 8: Hemimethylation frequency measured in promoter regions for both tumor and normal samples.
With the gene annotation analysis, we can identify genes that have relatively more hemimethylation sites. In particular, we select the genes that have at least 5 hemimethylation sites in tumor only, in normal only, and in both normal and tumor samples. These genes are summarized in Tables 9, 10, and 11 respectively. Note, there are not many genes with a large number of hemimethylated sites. Therefore, we choose a relatively small number (i.e., five) to find a reasonable number of genes that meet this criterion for us to do further analysis. In addition, the datasets used in this project are generalized using the reduced representation bisulfite sequencing method. For the method, only a small percent of the CpG sites in a genome are sequenced [12, 14]. If the methylation sequencing datasets used in this study are generated based on the whole genome bisulfite sequencing method, more hemimethylated CpG sites can be found in different genes.
There are 41 genes with the most hemimethylation in tumor DNA, see Table 9. Among these genes, TP73 [15-17], GNAS [18-22], and NOTCH1 [23, 24] are notable ones with known relations to cancer. Table 9 shows that among these 41 genes, one is a tumor suppressor (WT1), three are oncogenes (GNAS, NOTCH1, and PRDM16), and of those three, two are translocated cancer genes (NOTCH1 and PRDM16). There are also eight transcription factors in this table (HDAC4, IRX2, NFATC1, PRDM16, RUNX3, SIX3, TP73, and WT1). Table 10 shows 35 genes with the most hemimethylation in normal DNA. Among these genes, four are oncogenes (CBFA2T3, GNAS, PDGFB and PRDM16). Of the oncogenes, three are translocated cancer genes (CBFA2T3, PDGFB and PRDM16). There are also seven transcription factors in this table (CBFA2T3, HOXA3, IRX2, MEIS1, NFIC, PRDM16, and ZFPM1). Note that no tumor suppressor genes are hemimethylated in the normal cells. For genes belonging to two key gene families (i.e., transcription factor and oncogene), we have compared their proportions in tumor and normal samples using statistical tests. The test p-values are 0.96 for the transcript factor family and 0.54 for the oncogene family. There is no significant difference. Table 11 shows 36 genes with the most hemimethylation in both normal and tumor DNA. Among these genes, two are oncogenes and also translocated cancer genes (CBFA2T3 and PRDM16). There are also six transcription factors in this table (KLF5, HOXA2, CBFA2T3, HOXA3, ISL2, and PRDM16). All three gene tables have some transcription factor genes, which may affect the gene expression of other cancer-related genes that are not found to be hemimethylated.
Gene name
|
Count
|
Family
|
Gene Description
|
RGS14
|
17
|
-
|
regulator of G protein signaling 14
|
MEX3A
|
16
|
-
|
mex-3 RNA binding family member A
|
WT1
|
11
|
TF, TS
|
WT1 transcription factor
|
PRDM16
|
10
|
OG, TF, TCG
|
PR/SET domain 16
|
ZDHHC9
|
10
|
-
|
zinc finger DHHC-type containing 9
|
AGAP2
|
8
|
-
|
ArfGAP with GTPase domain, ankyrin repeat and PH domain 2
|
GNAS
|
8
|
OG
|
GNAS complex locus
|
EXOC3L2
|
8
|
-
|
exocyst complex component 3 like 2
|
PTPRN2
|
7
|
-
|
protein tyrosine phosphatase receptor type N2
|
FANK1
|
7
|
-
|
fibronectin type III and ankyrin repeat domains 1
|
UNC93B1
|
7
|
-
|
unc-93 homolog B1, TLR signaling regulator
|
IGSF9B
|
7
|
-
|
immunoglobulin superfamily member 9B
|
GNAS-AS1
|
7
|
-
|
GNAS antisense RNA 1
|
MAD1L1
|
7
|
-
|
mitotic arrest deficient 1 like 1
|
TSPAN9
|
7
|
-
|
tetraspanin 9
|
PTPRM
|
7
|
-
|
protein tyrosine phosphatase receptor type M
|
TP73
|
6
|
TF
|
tumor protein p73
|
IFT140
|
6
|
-
|
intraflagellar transport 140
|
NFATC1
|
6
|
TF
|
nuclear factor of activated T cells 1
|
DGKA
|
6
|
-
|
diacylglycerol kinase alpha
|
FMNL1
|
6
|
-
|
formin like 1
|
CACNA1I
|
6
|
-
|
calcium voltage-gated channel subunit alpha1 I
|
LOC101927636
|
6
|
-
|
RNA Gene affiliated with the lncRNA class
|
HDAC4
|
5
|
TF
|
histone deacetylase 4
|
IRX2
|
5
|
TF, HP
|
iroquois homeobox 2
|
ANKRD33B
|
5
|
-
|
ankyrin repeat domain 33B
|
LINC00537
|
5
|
-
|
Long Intergenic Non-Protein Coding RNA 537
|
NOTCH1
|
5
|
OG, TCG
|
notch receptor 1
|
ANO2
|
5
|
-
|
anoctamin 2
|
CACNA1H
|
5
|
-
|
calcium voltage-gated channel subunit alpha1 H
|
RUNX3
|
5
|
TF
|
runt related transcription factor 3
|
SIX3
|
5
|
TF, HP
|
SIX homeobox 3
|
FZD7
|
5
|
-
|
frizzled class receptor 7
|
ADGRA2
|
5
|
-
|
adhesion G protein-coupled receptor A2
|
IFFO1
|
5
|
-
|
intermediate filament family orphan 1
|
CHTF18
|
5
|
-
|
chromosome transmission fidelity factor 18
|
TMEM204
|
5
|
-
|
transmembrane protein 204
|
RECQL5
|
5
|
-
|
RecQ like helicase 5
|
SMIM5
|
5
|
-
|
small integral membrane protein 5
|
MAPK1
|
5
|
PK
|
mitogen-activated protein kinase 1
|
SYN1
|
5
|
-
|
synapsin I
|
Table 9: For genes with at least 5 hemimethylation sites in tumor samples. The gene name, corresponding number of hemimethylated sites, specified gene family, and a description of the gene are formatted in the table’s respective columns. Descriptions are derived from the Molecular Signature Database [25] and the GeneCards database [26]. Certain genes are indicated as members of specific gene families, as shown in the third column: “TF” for transcription factor, “TS” for tumor suppressor, “OG” for oncogene, “HP” for homeodomain protein, “TCG” for translocated cancer gene, and “PK” for protein kinase.
Gene name
|
Count
|
Family
|
Gene Description
|
ZFPM1
|
14
|
TF
|
zinc finger protein, FOG family member 1
|
GNAS
|
13
|
OG
|
GNAS complex locus
|
RGPD2
|
12
|
-
|
RANBP2 like and GRIP domain containing 2
|
SHANK3
|
11
|
-
|
SH3 and multiple ankyrin repeat domains 3
|
IRX2
|
10
|
TF, HP
|
iroquois homeobox 2
|
LTB4R
|
9
|
-
|
leukotriene B4 receptor
|
CPEB3
|
8
|
-
|
cytoplasmic polyadenylation element binding protein 3
|
PTPRN2
|
7
|
-
|
protein tyrosine phosphatase receptor type N2
|
MIR1268A
|
7
|
-
|
microRNA 1268a
|
GNAS-AS1
|
7
|
-
|
GNAS antisense RNA 1
|
CYP26C1
|
7
|
-
|
cytochrome P450 family 26 subfamily C member 1
|
TBL1XR1
|
6
|
-
|
transducin beta like 1 X-linked receptor 1
|
HOXA3
|
6
|
TF, HP
|
homeobox A3
|
CACNA1H
|
6
|
-
|
calcium voltage-gated channel subunit alpha1 H
|
NPEPPS
|
6
|
-
|
aminopeptidase puromycin sensitive
|
SEMA6B
|
6
|
CGF
|
semaphorin 6B
|
HOMER3
|
6
|
-
|
homer scaffold protein 3
|
PINLYP
|
6
|
-
|
phospholipase A2 inhibitor and LY6/PLAUR domain containing
|
GDI1
|
6
|
-
|
GDP dissociation inhibitor 1
|
HS3ST2
|
6
|
-
|
heparan sulfate-glucosamine 3-sulfotransferase 2
|
PRDM16
|
5
|
TF, OG, TCG
|
PR/SET domain 16
|
PLK3
|
5
|
PK
|
polo like kinase 3
|
GREM2
|
5
|
CGF
|
gremlin 2, DAN family BMP antagonist
|
MEIS1
|
5
|
TF, HP
|
Meis homeobox 1
|
MEIS1-AS2
|
5
|
-
|
MEIS1 antisense RNA 2
|
POLH
|
5
|
-
|
DNA polymerase eta
|
HOXA-AS2
|
5
|
-
|
HOXA cluster antisense RNA 2
|
EBF3
|
5
|
-
|
EBF transcription factor 3
|
CBFA2T3
|
5
|
TF, OG, TCG
|
CBFA2/RUNX1 translocation partner 3
|
RPL13
|
5
|
-
|
ribosomal protein L13
|
NFIC
|
5
|
TF
|
nuclear factor I C
|
CDH4
|
5
|
-
|
cadherin 4
|
PDGFB
|
5
|
OG, TCG
|
cytokine or growth factor, platelet derived growth factor subunit B
|
CCNT1
|
5
|
-
|
cyclin T1
|
SNORD68
|
5
|
-
|
small nucleolar RNA, C/D box 68
|
Table 10: For genes with at least 5 hemimethylation sites in normal samples. The gene name, corresponding number of hemimethylated sites, specified gene family, and a description of the gene are formatted in the table’s respective columns. Descriptions are derived from the Molecular Signature Database [25] and the GeneCards database [26]. Certain genes are indicated as members of specific gene families, as shown in the third column: “TF” for transcription factor, “TS” for tumor suppressor, “OG” for oncogene, “HP” for homeodomain protein, “TCG” for translocated cancer gene, and “PK” for protein kinase.
Gene name
|
Count
|
Family
|
Gene Description
|
RGPD5
|
16
|
-
|
RANBP2 like and GRIP domain containing 5
|
RGPD8
|
16
|
-
|
RANBP2 like and GRIP domain containing 8
|
ROCK1P1
|
13
|
-
|
Rho associated coiled-coil containing protein kinase 1 pseudogene 1
|
THAP4
|
8
|
-
|
THAP domain containing 4
|
SGTA
|
8
|
-
|
small glutamine rich tetratricopeptide repeat containing alpha
|
PTPRN2
|
7
|
-
|
protein tyrosine phosphatase receptor type N2
|
CNTNAP3
|
7
|
-
|
contactin associated protein like 3
|
NUTM2A-AS1
|
7
|
-
|
NUTM2A antisense RNA 1
|
RBFOX3
|
7
|
-
|
RNA binding fox-1 homolog 3
|
ESPNP
|
6
|
-
|
espin pseudogene
|
FOXK1
|
6
|
-
|
forkhead box K1
|
HOXA3
|
6
|
HP, TF
|
homeobox A3
|
LMF1
|
6
|
-
|
lipase maturation factor 1
|
USP45
|
6
|
-
|
ubiquitin specific peptidase 45
|
LOC101928782
|
6
|
-
|
RNA Gene affiliated with the lncRNA class
|
PRDM16
|
5
|
OG, TF, TCG
|
PR/SET domain 16
|
RGPD4
|
5
|
-
|
RANBP2 like and GRIP domain containing 4
|
MERTK
|
5
|
PK
|
MER proto-oncogene, tyrosine kinase
|
FAM160A1
|
5
|
-
|
family with sequence similarity 160 member A1
|
PRKAR1B
|
5
|
-
|
protein kinase cAMP-dependent type I regulatory subunit beta
|
MAD1L1
|
5
|
-
|
mitotic arrest deficient 1 like 1
|
HOXA2
|
5
|
HP, TF
|
homeobox A2
|
DPP6
|
5
|
-
|
dipeptidyl peptidase like 6
|
DIP2C
|
5
|
-
|
disco interacting protein 2 homolog C
|
FANK1
|
5
|
-
|
fibronectin type III and ankyrin repeat domains 1
|
GAL3ST3
|
5
|
-
|
galactose-3-O-sulfotransferase 3
|
FLJ12825
|
5
|
-
|
RNA Gene affiliated with the lncRNA class
|
KLF5
|
5
|
TF
|
Kruppel like factor 5
|
ISL2
|
5
|
HP, TF
|
ISL LIM homeobox 2
|
CBFA2T3
|
5
|
OG, TF, TCG
|
CBFA2/RUNX1 translocation partner 3
|
SBNO2
|
5
|
-
|
strawberry notch homolog 2
|
GIPR
|
5
|
-
|
gastric inhibitory polypeptide receptor
|
SCAF1
|
5
|
-
|
SR-related CTD associated factor 1
|
COL6A1
|
5
|
-
|
collagen type VI alpha 1 chain
|
NEXMIF
|
5
|
-
|
neurite extension and migration factor
|
GK5
|
5
|
-
|
glycerol kinase 5
|
Table 11: For genes with at least 5 hemimethylation sites in both tumor and normal samples. The gene name, corresponding number of hemimethylated sites, specified gene family, and a description of the gene are formatted in the table’s respective columns. Descriptions are derived from the Molecular Signature Database [25] and the GeneCards database [26]. Certain genes are indicated as members of specific gene families, as shown in the third column: “TF” for transcription factor, “TS” for tumor suppressor, “OG” for oncogene, “HP” for homeodomain protein, “TCG” for translocated cancer gene, and “PK” for protein kinase.
In order to understand the functions and relationships of these genes, we further analyze their biological interactions using the ConsensusPath Database (CPDB) software package [27-29], see Figures 4, 5, 6, and 7. Figure 4 describes the different types of biological relationships between genes based on the CPDB software. A gene with a black label is known to be hemimethylated (i.e., identified by our analysis) and a gene with a purple label is a gene that is not provided in our hemimethylation gene list but it interacts with one of the known genes. Figure 4 is the legend for Figures 5, 6, and 7. This legend figure summarizes the relationships for gene lists in Tables 9, 10, and 11 as shown in Figures 5, 6, and 7, respectively. These figures show the extent to which these highly hemimethylated genes interact and possibly affect the cell function of related genes.
Figure 5 shows genetic interactions between genes with the most hemimethylation in tumor samples, and these genes are recorded in Table 9. The gene network in Figure 5 contains a number of hub genes with complex interactions. These hub genes include GNAS, NFATC1, NOTCH1, MAPK1, HOAC4, TP73, and EGR1. We can see that if a hub gene like MAPK1 is hemimethylated, it may interact with dozens of other genes. Some of these genes, e.g., EGR1 [30-33] and UNC5B [34-37], are known to be associated with different cancers, including lung cancer. EGR1 has a promoting effect on cancer metastasis in OCT4-overexpressing lung cancer [38]. The pseudogene DUXAP8 may act as an oncogene in non-small-cell lung cancer, and it may play this role by silencing EGR1 and RHOB transcription via binding with EZH2 and LSD1 [39]. The expression of UNC5A, UNC5B, or UNC5C is down-regulated in multiple cancers including lung cancer [40], and UNC5B has also been indicated as a putative tumor suppressor [41].
Figure 6 shows genetic interactions between genes with the most hemimethylation in normal DNA, and these genes are recorded in Table 10. In this figure, we can see that GNAS is a hub gene interacting with many other genes that may not be hemimethylated themselves. GNAS is observed in both tumor and normal samples, as well as in the hemimethylation study for breast cancer cell lines [9]. MEIS1 is also a hub gene that interacts with genes like KMT2A [42] and TK1 [43]. While these genes are not hemimethylated in our samples, they are known to be associated with cancer. KMT2A and hTERT are positively correlated in melanoma tumor tissues, and KMT2A promotes melanoma cell growth by targeting the hTERT signaling pathway [44]. KMT2A has an epigenetic regulation role on NOTCH1 and NOTCH3, and this mechanism is essential for inhibiting glioma proliferation [45]. TK1 plays a moderate role as a diagnostic tumor marker for cancer patients [46], and it is a potential clinical biomarker for the treatment of lung, breast, and colorectal cancer [47]. A systematic review shows that TK1 overexpression is associated with the poor outcomes of lung cancer patients [48]. MEIS1 inhibits non-small cell lung cancer cell proliferation [49]. MEIS1 plays a crucial role in normal development [26] and it is also reported as an important gene related to leukemia [50-52]. Therefore, it is possible that the hemimethylation of hub genes like MEIS1 affects protein, biochemical, or regulatory functions of genes that are associated with cancer.
Figure 7 shows genetic interactions between genes with the most hemimethylation on identical locations in tumor and normal samples. These genes are recorded in Table 11. This means that the hemimethylation of CpG sites in this network is unchanged or unaffected by the formation of cancer. The HNRNPL gene is a major hub in this gene network. While we do not detect any hemimethylation in this gene, it directly interacts with 10 genes that we know to be hemimethylated. Some of these genes, like PTPRN2 and MAD1L1, can also be found in the tumor gene network, see Figure 5. There appears to be no common genes between Figure 6 (hemimethylated genes in normal samples) and Figure 7 (hemimethylated genes in both tumor and normal samples). Therefore, genes that have a large number of hemimethylated CpG sites found only in normal DNA seem to have very few CpG sites that remain the same when cancer forms.
In addition to the above analysis, we have conducted gene set enrichment analysis using the molecular signature database and the related software package provided by the Broad Institute [25]. Of the most hemimethylated genes in tumor DNA (Table 9), six are also significantly represented in cancer module 163 (with p-value < 0.05). This module is a collection of genes known to be overrepresented in cancer pathways and is reported by the Stanford research group [53]. The six genes are IFT140, IFFO1, SYN1, FMNL1, NOTCH1, and RGS14. There are no such overly represented genes and cancer modules among genes shown in Table 10 (for normal samples) and Table 11 (for both tumor and normal samples).