4.1 Quantitative evaluation and analysis of the DREG model
To assess the validity of the model, it is usually quantified using the internal coverage method to measure the indicators. The coverage method evaluates the model by ranking the relationship between drugs and diseases. In this paper, the quantitative evaluation is carried out by learning the remaining entities, covering a single entity, and then predicting the single covered entity. This approach results in an overall quantitative evaluation of all predicted entities. The results of the quantitative evaluation experiments are presented in Table 2.
Table 2 presents a sample of randomly selected drugs from the database for coverage verification, where Randomly drawn percentage (RDP) indicates the percentage of randomly selected drugs, Effective ratio (ER) reflects the proportion of successfully verified drugs at a particular rank, and the sum of boost scores (SBS) is the product of the reciprocal of the rank of successful verification and the number of successful verifications. The calculation formula is the number of successful verifications * (1/rank before sorting). Draw proportional probability (DPP) represents the proportion of drawn drugs to all drugs in the database. Model effect multiplier (MEM) is a multiple that reflects the DREG model's effectiveness compared to untrained entity data, where MEM = SBS/DPP. For example, if 50 drugs are randomly selected from the database, the random sampling percentage is 0.88%, and two drugs are ranked before the 10th position, the effective ratio is 4%. The reciprocal is 1/10 before ranking 10 and 1/20 before ranking 20, and the sum of the improved scores is 2*(1/10) = 0.2. After calculation, the extraction probability ratio is 0.0088, and the model effect multiplier is 22.56. This indicates that the model effect multiplier is 22.56 times higher than that of the unused model effect multiplier. By quantifying the experimental results, it is possible to estimate the likelihood of reusing all drugs in the database.
Table.2 Model Validation Quantitative Metrics
DREG
|
RDP
|
ER
|
SBS
|
DPP
|
MEM
|
Position 10
|
0.08%
|
40%
|
0.2
|
0.0008
|
225.6
|
0.17%
|
20%
|
0.2
|
0.0017
|
112.8
|
0.88%
|
4%
|
0.2
|
0.0088
|
22.56
|
1.61%
|
5.49%
|
0.2
|
0.01613
|
12.3956
|
|
Position 20
|
0.08%
|
40%
|
0.1
|
0.0008
|
112.8
|
0.17%
|
20%
|
0.1
|
0.0017
|
56.4
|
0.88%
|
8%
|
0.2
|
0.0088
|
22.56
|
1.61%
|
8.79%
|
0.4
|
0.01613
|
24.7912
|
|
Position 50
|
0.08%
|
80%
|
0.08
|
0.0008
|
90.24
|
0.17%
|
40%
|
0.08
|
0.0017
|
45.12
|
0.88%
|
18%
|
0.18
|
0.0088
|
20.304
|
1.61%
|
19.78%
|
0.36
|
0.01613
|
22.312
|
|
Position 100
|
0.08%
|
80%
|
0.04
|
0.0008
|
45.12
|
0.17%
|
60%
|
0.06
|
0.0017
|
33.84
|
0.88%
|
28%
|
0.14
|
0.0088
|
15.792
|
1.61%
|
32.97%
|
0.3
|
0.01613
|
18.5934
|
|
Position 150
|
0.08%
|
80%
|
0.026666
|
0.0008
|
30.079248
|
0.17%
|
60%
|
0.04
|
0.0017
|
22.56
|
0.88%
|
30%
|
0.1
|
0.0088
|
11.28
|
1.61%
|
21.97%
|
0.13333
|
0.01613
|
8.2635
|
The DREG model has been evaluated quantitatively and analyzed to show that the top 100 drug candidates perform well in terms of coverage and verification. To determine their reusability, a case study was conducted using Advanced breast cancer as an example, where the top 100 drugs recommended by the DREG model were investigated. The results showed that among the top 100 recommended drugs, 20% belonged to the "other antineoplastic drugs" category, while the other 80% were categorized as "pain relief" and "antiviral drugs". Additionally, a search of ClinicalTrials.gov related results showed that the top 50 recommended drugs had a higher proportion of research results compared to the last 50 recommended drugs. Therefore, the top 50 recommended reusable drug candidates have been selected.
4.2 Override method for model validation results
The study suggests that the validity of the DREG model's recommended effects should be verified through the coverage method for a single drug. This involves randomly selecting a certain percentage of drugs for coverage experiments, covering one disease relationship among all related diseases of a drug, and training the model on the uncovered relationships. After training, the recommended disease entity data is obtained, and the associated covered diseases are found through the original dataset. If there are covered disease entities in the recommended disease entity database, it confirms the effectiveness of the drug repurposing model's recommendation results.
The data in Table 3 shows that some drugs perform coverage verification on diseases with existing relationships and verify whether there are diseases that cover the relationship in the recommended results. The recommended entities are ranked according to their relevance scores. The higher the drug-disease relevance, the higher the Top ranking. Verify the coverage relationship is the best Top (VRT) represents the sorted position of coverage of the disease. For example, the hypertensive disease is covered, and fish oils recommend treatable disease through training, and the recommended disease hypertensive disease is ranked 24th after the correlation score. The number of existing diseases related to the drug in the database (DRD) in the table is the number of each drug and all related diseases, and the database uses the Drug-Disease database. For example, fish oils have 31 known relationships with different diseases in the Drug-Disease database. The relevance Score is the correlation score between the drug and the disease calculated by the DREG model. For example, the correlation score between fish oils and hypertensive disease is 0.9995.
Table.3 Coverage verification example
Drug name
|
Override Disease name
|
VRT
|
DRD
|
Relevance Score
|
fish oils
|
hypertensive disease
|
24
|
31
|
0.9995
|
alteplase
|
myocardial infarction
|
13
|
20
|
0.9843
|
zotepine
|
hyperactive behavior
|
27
|
20
|
0.9891
|
reteplase
|
cardiovascular diseases
|
59
|
26
|
0.8897
|
levorphanol
|
substance withdrawal syndrome
|
21
|
18
|
0.9900
|
4.3 Knowledge Graph Visualization
This study utilizes a knowledge graph to illustrate the connections between entities, providing a more practical and intuitive approach to drug repurposing. The knowledge graph, from an R&D perspective, facilitates a semantic understanding of entity information and captures the relationships between entities. To obtain more valuable drug recommendation information, the study focuses on constructing a knowledge graph around diseases and recommends reusable drug candidates through the DREG model, achieving an effective and intuitive display of reference information. For instance, data on related drug entities of advanced breast cancer, related targets, information pathways, and other data information are integrated into a knowledge graph using Neo4j visualization software, as depicted in Fig. 5[20].
4.4 Literature Knowledge and Results of Clinical Trials
To prove the rationality of the recommended drug results of the DREG model proposed in this paper, the rationality was demonstrated by querying the published literature and clinical trial results. First, check whether the relevant literature on the reusable drug candidates and diseases recommended by the DREG model exists through the PubMed literature website. If there is, it proves that the drug candidates recommended by the proposed model are compelling. The greater the number of literature recommending reusable drug candidates and diseases, the more meaningful the research represents the drug candidates and the diseases they recommend.
Table 4 presents the suggested drug candidates, correlation scores, and the number of PubMed documents for select diseases, as determined by the model. For instance, the model recommends cetrimonium as a drug candidate for advanced cancer with a correlation score of 0.9152, despite the lack of an association between the two in the original database. The PubMed website features five relevant literature mentions for both, and the Fitted value is derived by the sigmoid fitting of the Relevance Score and Number of related literature.
Table.4
Literature verification example
Drug name
|
Disease name
|
Relevance Score
|
Is it known
|
Number of related literature
|
Fitted value
|
diphenhydramine
|
advanced cancer
|
0.9140
|
recommend
|
74
|
1.0
|
diazepam
|
advanced cancer
|
0.9186
|
recommend
|
25
|
1.0
|
cetrimonium
|
advanced cancer
|
0.9152
|
recommend
|
5
|
0.997
|
benfotiamine
|
advanced cancer
|
0.9161
|
recommend
|
2
|
0.949
|
ajmaline
|
advanced cancer
|
0.9260
|
recommend
|
2
|
0.949
|
To reduce the amount of research required when studying drug diseases, we have implemented a rare recommendation function that utilizes a resounding validation method for literature knowledge. The process involves separating recommended reusable drug candidates from their respective diseases to obtain their literary quantity. Next, the number of relevant literature on the co-occurrence of the recommended drugs and diseases is obtained. Finally, the rarity value calculation is performed, and the rarity values are sorted from small to large, with a smaller value indicating a higher rare value.
The rare value formula is determined using the following equation: Score = (number of articles in common between drugs and diseases/number of articles about drugs + number of articles in common between drugs and diseases/number of articles about diseases). For example, if the number of documents on the drug cisplatin is 84,337 on the PubMed website, and the number of documents on advanced cancer is 365,556, with 18,848 related documents appearing together, the rare value can be calculated using the formula: 18,848 / 84,337 + 18,848 / 365,556 = 0.0970, resulting in a rare value of 0.097002. If the recommended number of articles is 0, the rare value is classified as "UnKnown". Table 5 presents some rare examples of Drug-Disease validation.
Table.5
Examples of rare recommendations
Drug name
|
Disease name
|
Relevance Score
|
Is it known
|
Number of common literature
|
Fitted value
|
Rare value
|
temozolomide
|
advanced cancer
|
0.9874
|
known
|
899
|
1.0
|
0.0970
|
cisplatin
|
advanced cancer
|
0.9855
|
known
|
18848
|
1.0
|
0.2763
|
sunitinib
|
advanced cancer
|
0.9852
|
known
|
1803
|
1.0
|
0.2611
|
cabergoline
|
advanced cancer
|
0.9276
|
recommend
|
21
|
1.0
|
0.1079
|
diazepam
|
advanced cancer
|
0.9186
|
recommend
|
25
|
1.0
|
0.0010
|
cetrimonium
|
advanced cancer
|
0.9152
|
recommend
|
5
|
0.997
|
0.0018
|
amoxapine
|
advanced cancer
|
0.9548
|
recommend
|
0
|
0.722
|
UnKnow
|
To determine the validity of the experimental results of the DREG model in clinical trials, the clinical trial progress on the ClinicalTrials.gov website was verified. ClinicalTrials.gov[21] is a clinical trial database developed by the US National Library of Medicine (NML) and the US Food and Drug Administration (FDA) in 1997 and officially launched in February 2002. To confirm the validity of the model's recommendation, you can check the ClinicalTrials.gov website for the "drug-disease" clinical trials that correspond to the recommendation. If such trials exist, it verifies the effectiveness of the recommended drug and confirms the accuracy of the model's experimental results in clinical trials. Table 6 provides some examples of the clinical trials of drug-disease entities recommended by the model on ClinicalTrials.gov. For instance, a Phase I clinical trial for the use of paclitaxel in treating advanced breast cancer is scheduled to begin in 2022.
Table.6
Clinical Trials Clinical Validation Examples
Disease name
|
Drug name
|
Mode
|
Start Time
|
Phase
|
Relevance Score
|
advanced breast cancer
|
Paclitaxel
|
Active
|
2022
|
1
|
0.8378
|
Biliary Tract Cancer
|
leucovorin
|
Active
|
2023
|
1
|
0.8799
|
Prostate Cancer
|
Niraparib
|
Active
|
2020
|
1
|
0.8981
|
Prostate Cancer
|
Cyclophosphamide
|
Active
|
2023
|
1
|
0.8722
|
non-small cell lung cancer metastatic
|
Cyclophosphamide
|
Completed
|
2018
|
1
|
0.8721
|
non-small cell lung cancer stage iii
|
Carboplatin
|
Recruiting
|
2022
|
1
|
0.9305
|
non-small cell lung cancer
|
Cyclophosphamide
|
Recruiting
|
2024
|
1
|
0.8769
|
Colorectal Cancer Metastatic
|
Tipiracil
|
Recruiting
|
2024
|
1
|
0.8113
|
Colorectal Cancer Metastatic
|
celecoxib
|
Recruiting
|
2023
|
1
|
0.7924
|
Colorectal Cancer
|
Gevokizumab
|
Recruiting
|
2019
|
1
|
0.8676
|
Colorectal Cancer
|
Oxaliplatin
|
Recruiting
|
2023
|
1
|
0.8435
|
hormone refractory prostate cancer
|
Nivolumab
|
Recruiting
|
2021
|
1
|
0.8930
|
Prostate Cancer Metastatic
|
Olaparib
|
Recruiting
|
2022
|
1
|
0.8645
|
Prostate Cancer Metastatic
|
Enzalutamide
|
Recruiting
|
2023
|
1
|
0.8604
|
ovarian epithelial cancer recurrent
|
Paclitaxel
|
Recruiting
|
2021
|
1
|
0.8521
|
nervous system disorder
|
cyclophosphamide
|
Recruiting
|
2022
|
1
|
0.8882
|
hypertensive disease
|
Nitroglycerin
|
Recruiting
|
2020
|
1
|
0.8273
|
hypertensive disease
|
Trimethaphan
|
Recruiting
|
2022
|
1
|
0.8212
|
Seizures
|
levetiracetam
|
Recruiting
|
2021
|
1
|
0.7924
|
To strengthen the credibility of the experimental findings of the DREG model in clinical trials, a clinical trial case inquiry was conducted on the China Clinical Trials Registry website. The China Clinical Trials Registry is the primary registry of the International Clinical Trials Registration Platform of the World Health Organization, ensuring the reliability and validity of clinical trial data[22]. To verify the efficacy of drugs recommended by the model and validate the experimental results in clinical trials, we suggest searching for relevant clinical trial cases on the China Clinical Trials Registry website. If such cases exist, they can demonstrate the potential reuse value of the drugs recommended by the model. Table 7 presents the clinical validation cases of the candidate drugs and their related diseases, as identified by the Chinese Clinical Trial Registry. For instance, in 2020, the Cancer Hospital of the Chinese Academy of Medical Sciences will conduct Phase I and Phase II clinical trials of fluorouracil for the treatment of metastatic colorectal cancer.
Table.7
Examples of Clinical Validation by China Clinical Trials Registry
Drug name
|
Disease name
|
Time
|
Study Phase
|
Location
|
Relevance Score
|
fluorouracil
|
Colorectal cancer metastatic
|
2020
|
I + II
|
Chinese Academy of Medical Sciences Cancer Hospital
|
0.8780
|
Paclitaxel
|
Colorectal Cancer
|
2020
|
I
|
Ruijin Hospital Affiliated to Shanghai Jiaotong University School of Medicine
|
0.8283
|
Oxaliplatin
|
Colorectal Cancer
|
2021
|
II
|
Department of Medical Oncology, Union Hospital Affiliated to Fujian Medical University
|
0.8435
|
Paclitaxel
|
advanced breast cancer
|
2021
|
II
|
Liaoning Cancer Hospital
|
0.8378
|
capecitabine
|
advanced breast cancer
|
2020
|
Exploratory studies/pre-trials
|
Fudan University Cancer Hospital
|
0.8146
|
doxorubicin
|
advanced breast cancer
|
2019
|
Post-marketing drugs
|
The Fourth Hospital of Hebei Medical University
|
0.7716
|
Eribulin
|
advanced breast cancer
|
2021
|
Post-marketing drugs
|
Shandong Cancer Hospital
|
0.8409
|
Olaparib
|
Prostate Cancer Metastatic
|
2021
|
retrospective study
|
Renji Hospital, Shanghai Jiaotong University School of Medicine
|
0.8645
|
Enzalutamide
|
Prostate Cancer Metastatic
|
2022
|
Clinical trials of new therapeutic technologies
|
Beijing Hospital
|
0.8604
|
Carboplatin
|
non-small cell lung cancer iii
|
2021
|
Exploratory studies/pre-trials
|
The Fifth Affiliated Hospital of Sun Yat-Sen University
|
0.9305
|
levetiracetam
|
Seizures
|
2022
|
Exploratory studies/pre-trials
|
The Seventh Affiliated Hospital of Sun Yat-sen University
|
0.7924
|
4.5 Comparison with other literature methods
Through extensive research on the theme of drug repurposing, the MRR and Hits@N indicators are evaluated for the experimental results of the proposed DREG model, and the DREG model is compared with several other current advanced models for MRR and Hits@N evaluation metrics.
The evaluation calculation of MRR is shown in Eq. 5, where S is the triplet set, |S| is the number of triplet sets, and \({rank}_{i}\)refers to the link prediction rank (distance score) of the i-th triplet. The larger the MRR value, the better, indicating that the ranking conforms to the facts and that the embedding effect is good[23].
$$MRR=\frac{1}{\left|S\right|}\sum _{i=1}^{S}\frac{1}{{rank}_{i}}$$
5
The evaluation calculation of Hits@N is shown in Eq. 6, the symbol is the same as above, and II is the indicator function (if the condition is proper, the function value is 1; otherwise, it is 0). Generally, take n as 1, 3, or 10. The larger the Hits@N index, the better[24].
$$Hits@N=\frac{1}{\left|S\right|}\sum _{i=1}^{S}\text{Ⅱ}({rank}_{i}\le n)$$
6
Table.8
Overall predictive performance of drug-disease associations
Dataset
|
Model
|
MRR
|
Hits@1
|
Hits@3
|
Hits@10
|
GP-KG[29]
|
DistMult[25]
|
0.191
|
0.103
|
0.207
|
0.379
|
TransE[26]
|
0.209
|
0.116
|
0.226
|
0.399
|
ConvE[27]
|
0.216
|
0.126
|
0.232
|
0.399
|
RotatE[28]
|
0.212
|
0.119
|
0.231
|
0.403
|
KG-Predict[29]
|
0.261
|
0.174
|
0.266
|
0.447
|
DREG
|
0.308
|
0.171
|
0.314
|
0.628
|
The statistics for Hits@N and MRR scores of DREG and other models are presented in Table 8. Among the five knowledge graph embedding techniques, DREG exhibits the best MRR performance. For instance, DREG outperforms the best-performing model by 4.7% (MRR), 4.8% (Hits@3), and 18.1% (Hits@10) while outperforming ConvE by 4.5% (Hits@1). As illustrated in Fig. 6, the experimental results demonstrate that the DREG model outperforms other literature methods in terms of MRR and Hits@10. Thus, the DREG model is proven to have superior prediction performance compared to other models.