0-D Vertex cube
After data preprocessing, the standardized all-entity set is the 0-D vertex cube of the hypertension data cube (252 Disease corpus, 159 Gene corpus & 141 Drug corpus); this is the data foundation for further research in this study.
1-D Disease, gene, and drug cube
Due to the very low frequency of some genes and drugs in the literature, we excluded genes and drugs with frequency < 0.1%. We filtered the 0-D cube against DiseaseDictionary, GeneDictionary and DrugDictionary. The resultant 1-D gene cube consisted of 252 diseases, 1-D gene cube consisted of 185 genes and the 1-D drug cube consisted of 141 drugs (Table 1, Attachment 1-D.xlsx).
Table 1
Partial list of 1-D disease, gene and drug cube of hypertension
No.
|
Disease
|
Fre.
|
Gene
|
Fre.
|
Drug
|
Fre.
|
1
|
Atherosclerosis
|
11645
|
ACE
|
8636
|
Glucose
|
16362
|
2
|
Coronary Artery Disease
|
8863
|
TNF
|
1428
|
Insulin
|
15259
|
3
|
Cardiovascular Diseases
|
6318
|
PTH
|
698
|
Calcium carbonate
|
14786
|
4
|
Atrial Fibrillation
|
4546
|
AGT
|
595
|
Nitric Oxide
|
10499
|
5
|
Cardiomyopathy
|
3289
|
HBA1
|
574
|
Oxygen
|
9046
|
6
|
Aneurysm
|
3194
|
EGFR
|
521
|
Triglycerides
|
5590
|
7
|
Apnea
|
3189
|
ACE2
|
349
|
Norepinephrine
|
5134
|
8
|
Anemia
|
2979
|
RHOA
|
338
|
Captopril
|
3888
|
9
|
Arthritis
|
2219
|
BMPR2
|
303
|
Hydrochlorothiazide
|
3283
|
10
|
Ascites
|
2148
|
MTHFR
|
291
|
Nifedipine
|
3185
|
11
|
Asthma
|
2045
|
MTOR
|
184
|
Uric Acid
|
2865
|
12
|
Bradycardia
|
1917
|
NOX4
|
182
|
Propranolol
|
2860
|
13
|
Anxiety
|
1818
|
NOS3
|
181
|
Losartan
|
2788
|
14
|
Arrhythmia
|
1817
|
IRS
|
173
|
Enalapril
|
2593
|
15
|
Angina Pectoris
|
1714
|
WNK4
|
164
|
Amlodipine
|
2361
|
16
|
Coronary Disease
|
1363
|
WNK1
|
150
|
Nitroprusside
|
2321
|
17
|
Arteriosclerosis
|
1152
|
CTGF
|
148
|
Atenolol
|
2274
|
18
|
Coma
|
1100
|
STAT3
|
139
|
Phenylephrine
|
2242
|
19
|
Cough
|
1087
|
ADM
|
127
|
Dopamine
|
1981
|
20
|
Acidosis
|
1028
|
ALB
|
127
|
Epinephrine
|
1742
|
The top three diseases of the 1-D cube are Atherosclerosis, Coronary artery disease and Cardiovascular disease, which are closely related to Hypertension. For instance, Atherosclerosis is an inevitable result of degenerative aging, is the main cause of heart disease and stroke
[], and often coexists with hypertension
[]. |
There were few studies on the associations between hypertension and gene, and ACE was the most important one enzyme [], which was involved in the process of transforming ACE I into ACE II with physiological activity, may be related to myocardial infarction, SARS resistance, renal diabetes, Alzheimer's disease and other diseases. TNF gene was expressed in leukocytes and macrophages; it was involved in protein nuclear entry, positive regulation of protein amino acid phosphorylation, negative regulation of L-glutamate transport, glucose metabolism, etc.; it was involved in dementia, migraine, asthma susceptibility, septicemia susceptibility and other diseases. |
Glucose was the most frequently studied drug related to hypertension. Its "access number" in DrugBank was "db09341". It was mainly stored in animals as plant starch and glycogen. It helped various metabolic processes at the cell level, usually in the form of injection, providing nutritional supplement for metabolic disorder or improper regulation of blood glucose level. Glucose is one of the most important drugs in the World Health Organization (WHO) list of essential drugs. |
2-D Disease-gene cube
Table 1 provides the support and lift score between disease and gene in the 2-D cube.
Table 1
Top 10 of disease-gene associations and sorting by lift.
Rank
|
Disease
|
Gene
|
Co-occurrence
|
Support
|
Lift
|
1
|
Hypertension
|
ACE
|
71
|
0.302
|
114.8
|
2
|
CADASIL
|
NOTCH3
|
28
|
0.119
|
82.1
|
3
|
Angioedema
|
ACE
|
26
|
0.111
|
78.5
|
4
|
Cough
|
ACE
|
21
|
0.089
|
73.0
|
5
|
Hypertension
|
EGFR
|
21
|
0.089
|
69.2
|
6
|
Hypertension
|
AGT
|
13
|
0.055
|
64.3
|
7
|
Proteinuria
|
EGFR
|
11
|
0.047
|
58.7
|
8
|
Stroke
|
ACE
|
10
|
0.043
|
50.1
|
9
|
Proteinuria
|
ACE
|
8
|
0.034
|
48.6
|
10
|
Stroke
|
EGFR
|
8
|
0.034
|
43.0
|
Using the association rules, 235 significant associations between the 79 classes of hypertension-related clinical manifestations / symptoms and the 71 candidate genes are extracted from the 2-D cube (see Fig. 3 (a), attachment 2-D.xlsx). We found that Hypertension, ACE and EGFR were belong to the central nodes of the 2-D cube and were associated with 21 genes, 35 and 19 diseases respectively. There were four pairs nodes of one-against-one strategy [], including: Amenorrhea-CYP17A1, Goiter-LMNA, Melanoma-BRAF and (Pre-Eclampsia)-FLT1-Eclampsia. |
Using the Support method, we have grouped and sorted diseases / gene respectively in the 2-D cube. Top three of the sorting disease nodes are Hypertension (Support = 0.213), Atherosclerosis (Support = 0.051) and Inflammation (Support = 0.047). During algorithm development, we found that Atherosclerosis, Inflammation, Obesity and Smoking were associated with 12 genes (such as EGFR, TNF and CORIN, etc.), 10 genes (such as IL6, CCL2 and TNF, etc.), 10 genes (such as TNF, IRS and ACE, etc.) and 10 genes (such as ApoA1, ACE and ARMS2, etc.) respectively. Similarly, ACE (Support = 0.157), EGFR (Support = 0.081) and NTF (Support = 0.072) are the top three genes nodes. Among them, EGFR and NTF gene are associated with 19 diseases (such as Anemia, Atherosclerosis and Depression, etc.) and 15 diseases (such as Apnea, Atherosclerosis and Castration, etc.) respectively. |
2-D Disease-drug cube
Table 2 provides the support and lift score between disease and drug in the 2-D cube.
Table 2
Top 10 of disease-drug associations and sorting by lift.
Rank
|
Disease
|
Drug
|
Co-occurrence
|
Support
|
Lift
|
1
|
Hypertension
|
Glucose
|
130
|
0.032
|
109.7
|
2
|
Obesity
|
Glucose
|
78
|
0.019
|
79.4
|
3
|
Hypertension
|
Losartan
|
71
|
0.017
|
73.0
|
4
|
Hypertension
|
Water
|
55
|
0.014
|
63.5
|
5
|
Hypertension
|
Nitric Oxide
|
52
|
0.013
|
59.8
|
6
|
Hypertension
|
Amlodipine
|
51
|
0.013
|
57.1
|
7
|
Hypertension
|
Potassium
|
49
|
0.012
|
55.5
|
8
|
Hypertension
|
Nifedipine
|
40
|
0.010
|
49.3
|
9
|
Stroke
|
Perindopril
|
39
|
0.010
|
47.1
|
10
|
Stroke
|
Warfarin
|
38
|
0.009
|
45.4
|
196 significant associations between 43 diseases and 101 drugs in the 2-D disease-drug cube (see Fig. 3 (b), attachment 2-D.xlsx).We found that the central nodes in this 2-D cube were the disease nodes of Hypertension (Support = 0.383), stroke (Support = 0.122), and the drug node of Glucose (Support = 0.066), which were associated with 75, 24 and 14 different types of entities respectively. |
There are three pairs of one-against-one strategy between the disease and drug, including: Pain NOS-Morphine, Celecoxib-AIDS-Diclofenac and Dorzolamide-Glaucoma-Timolol. |
2-D Gene-drug cube
Table 3 provides the support and lift score between gene and drug in the 2-D cube.
Table 3
Top 10 of gene-drug associations and sorting by lift.
Rank
|
Gene
|
Drug
|
Co-occurrence
|
Support
|
Lift
|
1
|
ACE
|
Captopril
|
715
|
0.054
|
61.0
|
2
|
ACE
|
Enalapril
|
708
|
0.053
|
49.2
|
3
|
ACE
|
Perindopril
|
381
|
0.029
|
47.8
|
4
|
ACE
|
Ramipril
|
381
|
0.029
|
42.2
|
5
|
ACE
|
Lisinopril
|
339
|
0.026
|
39.1
|
6
|
ACE
|
Losartan
|
305
|
0.023
|
37.2
|
7
|
ACE
|
Glucose
|
222
|
0.017
|
25.7
|
8
|
ADM
|
Nitric Oxide
|
216
|
0.016
|
25.5
|
9
|
ACE
|
Quinapril
|
201
|
0.015
|
23.7
|
10
|
ACE
|
Amlodipine
|
185
|
0.014
|
20.4
|
160 significant associations between 31 genes and 104 drugs in the 2-D disease-drug cube (see Fig. 3 (c), attachment 2-D.xlsx), and the ACE (Support = 0.519) gene is the central node, which is associated with 83 drugs. TNF (Support = 0.087) and Glucose (Support = 0.081) are associated with 14 drugs and 13 genes secondly. There are 15 gene nodes with unique association, such as AGT, FGF23, GRK4, and so on. In addition, 83 drug nodes with unique association, 75 of which are related to ACE gene. |
In this cube, there are five gene-drug pairs of one-against-one strategy, including: MMP2-Doxycycline, GRK4-Dopamine, HFE-Iron-FGF23, DICER1-Progesterone and MYD88-Colchicine; Nitric Oxide is associated with S100B and NOS2 gene; NGF is associated with 9 drugs; CORT is associated with 7 drugs; The APP and CYP2C8 gene are associated with 4 drugs. Procaine can inhibit the expression of STAT3 at mRNA and protein levels. It is a potential therapeutic drug for the treatment of neuropathic pain []. |
3-D Disease-gene-drug cube
Based on the 1-D and 2-D cubes and their associated intensities and networks, we constructed the disease-gene-drug network of hypertension. We used association rules and the Lift threshold to estimate whether there was a significant association between diseases, genes and drugs, and we calculated the association strength. After removing data duplications, we found 591 associations between 90 diseases, 82 genes, and 145 drugs. The 3-D disease-gene-drug network of hypertension is shown in Fig. 4.
We found Hypertension, ACE and Stroke are the central nodes in the 3-D cube, which are associated with 123, 118 and 32 other nodes respectively (attachment 3-D.xlsx).
Using the Support method, we have grouped and sorted diseases / gene / drug respectively in the 3-D cube. Top three of the sorting disease nodes are Hypertension (Support = 0.208), Stroke (Support = 0.054) and Atherosclerosis (Support = 0.037), which are associated with 123, 32 and 22 nodes respectively; 41 nodes are association with unique node, including Bradycardia, Coronary Disease, Dehydration, Diabetic Retinopathy, Glucose Intolerance, etc.
The top three nodes of the sorting drug nodes are Glucose (Support = 0.044), Oxygen (Support = 0.022), Nitric Oxide (Support = 0.015) and Losartan (Support = 0.015), which are associated with 26, 13, 9 and 9 nodes respectively; 72 nodes are association with unique node, including: L-Argine, Lacidipine, Camphor, Bezafibrate and Bezafibrate, etc.
The top three nodes of the sorting gene nodes are ACE (Support = 0.200), TNF (Support = 0.049) and EGFR (Support = 0.041) are the top three nodes in the 3-D cube, which are associated with 118, 29 and 24 nodes respectively. There are 38 nodes are association with unique node, including: CYP4A11, HMOX1, BRAF, CCR2, RGS2, etc.
There are five pairs associations of one-against-one strategy besides the network model, including: Goiter - LMNA, Amenorrhea - CYP17A1 and Melanoma-BRAF, (Pre-Eclampsia) - FLT1 - Eclampsia and Glaucoma – Dorzolamide - Timolol.
Evaluation of association using ROC curve
We used ROC curve analysis to evaluate the ability of hypertension-related disease-gene-drug association to discriminate true positive (TP) from false positive (FP) hidden associations using literature partitioning (see Fig. 5). The area under the curve (AUC) was used as the accuracy indicator. Associations were validated using the following criteria: 1) TP = direct association and co-publications at least 3, e.g., Hypertension and ACE; and 2) FP = no direct association or co-publications < 3, e.g., Coma and ADM. The AUCs were (mean ± standard error) 0.842 ± 0.032, 0.858 ± 0.044, and 0.836 ± 0.045, and the asymptotic 95% confidence intervals were (0.778, 0.903), (0.773, 0.944), and (0.748, 0.924) for disease-gene, disease-drug, and gene-drug, respectively.
Taken together, these results show that our methods are robust and can be applied to quickly detect a variety of biologically relevant hidden associations. As with other studies of association extraction algorithms [], we also obtained some predictions as results. This is also a goal of biomedical entity association extraction: to propose hypotheses and assist researchers in designing the direction of related experiments [].
Prediction of disease-gene-drug associations
In this study, we used ABC discovery method [] to predict hypertension candidate diseases, genes and drugs, and to mine new association between hypertension related diseases, genes and drugs. Similarly, some unproven bio-entity association pairs are also obtained in this study, and the predictive results with false-positive are allowed [], because this is also one of the objectives of association mining: to put forward predictive research hypotheses, to help biological researchers develop novel ideas, so as to design innovative experimental directions [].
This study verifies all the associations between disease-gene, disease-drug, and gene-drug. After verification, 262 kinds of predictive association (i.e. false-positive association) were obtained, including 57 kinds of disease-drug, 84 kinds of disease-gene and 121 kinds of gene-drug. Table 4 provides a partial list of disease-gene-drug associations linked implicitly but not explicitly to hypertension through the literature.
Table 4
Partial list of predictive associations between bio-entities.
Association
|
Entity1
|
Entity2
|
Disease-gene
|
Depression
|
SLC2A9
|
Coma
|
SLC2A9
|
Coma
|
AGT
|
Pheochromocytoma
|
GRK4
|
Disease-drug
|
Anemia
|
Chlorthalidone
|
Dementia
|
Benazepril
|
Depression
|
Eplerenone
|
Glomerulonephritis
|
Doxazosin
|
Obesity
|
Spirapril
|
Gene-drug
|
ABCB1
|
Candesartan
|
ABCB1
|
Clonidine
|
APOA1
|
Amlodipine
|
EGFR
|
Telmisartan
|
MTHFR
|
Eprosartan
|
In the prediction of novel disease-gene association, for example, no study has yet reported whether an association exists between Depression and SLC2A9. The rs6855911 allele variation in SLC2A9 gene expression is strongly associated with serum uric acid concentration [], while the serum uric acid level in human body has a positive correlation with depression and anxiety; therefore, there may be potential association between the two bio-entity. AGT gene is expressed in adipose tissue, adrenal gland, brain, blood vessel and nervous system, which is involved in cell growth, positive regulation of cytokine synthesis, apoptosis and cardiac hypertrophy. AGT is a susceptible gene of hypertension, while hypertensive cerebral hemorrhage will cause coma; AGT is related to insulin resistance [], similarly, diabetes may also cause coma. It is suggested that AGT may be related to Coma. |
In the prediction of novel disease-drug association, there was a clinical case report [] that a 58 year old woman who was hospitalized with intermittent fever, accompanied by anemia and inflammation, had been taking atenolol and chlorothiazide to treat hypertension, and had increased diuretics due to uncontrolled hypertension six weeks before hospitalization; however, after discontinuing antihypertensive drugs, the fever symptoms relieved rapidly, and the diagnosis might be allergic to chlorothiazide. Therefore, Chlorothiazide may be associated with Anemia and may cause fever. For the related bio-entity pair of Glomerulonephritis-Doxazosin, three cases report [] of primary hypertension in children caused by renal diseases, including renal atrophy, hydronephrosis secondary to reflux nephropathy, nephrotic syndrome, and acute streptococcal angio coccal nephritis, were reported in the literature. After taking doxazosin and other drugs to control hypertension, the patients recovered to health. Therefore, there may be some associations between the Glomerulonephritis and Doxazosin.
In the prediction of novel gene-drug association, ABCB1 is ATP binding family B1. At present, the mechanism of action between ABCB1 and Candesartan / Clonidine is not clear. ATP binding protein transporters, such as P-glycoprotein (P-gp / ABCB1) and multidrug resistance associated protein (MRP / ABCCs), are involved in the regulation of drug absorption, distribution and excretion. Candesartan is used to treat essential hypertension; clonidine is a kind of psychoactive drug, which can be used to treat severe hypertension; they can be combined with other drugs. However, the interaction between them and their effect on ABC transporters are still unclear. Some studies have shown that candesartan cilexetil can significantly inhibit P-glycoprotein activity []. Therefore, ABCB1 may be the candidate gene of Candesartan / Clonidine.