Machine learning for Drug-Virus Prediction



The 2019 Coronavirus (COVID-19) epidemic has recently hit most countries hard. Therefore, many researchers around the world are looking for a way to control this virus. Examining existing medications and using them to prevent this epidemic can be helpful. Drug repositioning solutions can be effective because designing and discovering a drug can be very time-consuming. In this study, we used a binary classifier learning method to predict the drug-virus relationship. The feature vector for each drug-virus pair is based on the similarity between drugs and the similarity between viruses. We calculated the similarities between the drugs using their structural properties (fingerprint) and their phenotype. We also calculated the similarities between viruses based on their genome sequence and the vector encoded by the Biobert model. Finally, using the HDVD dataset, we formed the similarity vectors of each drug-virus pair and considered it as input to neural network and random forest models. In these models, we randomly selected 20% of the positive data and the same amount of negative data. Finally, the performance of the proposed approach for this test data is considered, after five tests, as AUC=0.97 and AUPR = 0.96. We also used the Compressed Sensing (CS) matrix factorization model to predict the drug-virus association. After that, we investigated the importance of drug features in predicting drug-virus association by using Autoencoder and reducing the dimension of drug properties.

Full Text

This preprint is available for download as a PDF.