Conus is a kind of poisonous carnivorous tropical sea and ocean soft-body animals[1]. There are more than 500 species of Conus in the world, and there are at least 50,000 active peptides in the venom of Conus. The secreted toxin (called conotoxin) is mainly used in the predation and defense behavior of animals[2]. Conotoxin is extremely toxic and can cause animals to tremble, convulse, even paralyze and die. There are more than 700 kinds of conos in the world that secrete more than 100,000 toxins. However, the current experiments have only confirmed and recorded relatively few conotoxins (about 3,000 peptides)[3]. Conotoxin has strong biological activity and novel chemical structure. It has extremely high selectivity for ligand gates or voltage-gated ion channels[4]. It can distinguish between similar ion channel types and is widely used as an ion. Pharmacological reagents in channel research. Because the insectivorous conotoxin can kill many kinds of worms[5], it has the potential to cultivate new varieties of insect-resistant crops or develop it as a peptide insecticide. Therefore, conotoxin has become a new source of new drug development and a powerful tool for pharmacology and neuroscience[6], and it ranks first in the research of animal neurotoxins. It is called "the treasure house of marine drugs", and it has received attention from all walks of life and has broad development prospects.
According to the different target sites of conotoxin[7], it can be divided into three categories: (1)Conotoxin that acts on ligand-gated ion channels. (2)Conotoxin acts on voltage-gated ion channels, which are also called voltage-sensitive channels. (3)CTX acting on other receptors[8]. There are more than 300 ion channels in living cells. Many important functions in life, such as heartbeat, sensory conduction and central nervous system response, are controlled by cell signaling through various ion channels. Ion channel dysfunction can cause a variety of diseases, such as epilepsy, arrhythmia and type II diabetes. These diseases are mainly treated with drugs that regulate the relevant ion channels[9]. Ion channels are also an important target for the treatment of viral diseases. Due to their importance to human life, ion channels have become the second most common drug development target. The following three ion channels are usually targets of toxins: potassium (K) channels,sodium (Na) channels, and calcium (Ca) channels. Based on its function and target object, conotoxin can be divided into the following three types: (i) K channel targeting type; (ii) targeting non-channel type; (iii) calcium channel targeting type[10].
Due to the explosive growth of protein sequence data[11], traditional wet experiment methods can no longer meet the needs of rapid identification of protein sequences. Yuan et al. developed a feature selection technique based on binomial distribution to predict ion channels by using radial basis function networks The type of toxin targeted. Subsequently[12], they developed a predictor (iCTX type) to improve prediction accuracy. Zhang et al. applied mixed features in the prediction problem. Wang et al. combined variance and correlation (AVC) analysis with support vector machines to reduce attribute redundancy and improve prediction accuracy and calculation speed. However, none of these methods can be used to predict the type of conotoxin defined by its target ion channel. For example, δ-toxoid-like Ac6.1 and ω-toxin-like Ai6.2 both belong to toxoid C1. However, the former targets voltage-gated sodium channels, while the latter targets voltage-gated calcium channels[13].
To solve this problem, this article proposes a method to identify the three types of conotoxins by using their sequence information alone. In this research, we propose a deep learning long-term short-term memory (LSTM) neural network model to predict the classification of cono toxins[14], and use word embedding technology to represent the conotoxin sequence as a vector, which is because the protein sequence can be seen Into a natural language. Effective features are extracted from the conotoxin sequence in order to further evaluate the performance of the model. The target model is compared with the existing machine learning model SVM[15]. The experimental results show that the method has good prediction performance and is suitable for classification and prediction of conotoxin. The workflow is shown in Fig. 1.
In this paper, word embedding technology and LSTM are combined to construct a model for anticancer peptide prediction, so as to take advantage of LSTM's advantages in sequence modeling and long-term memory and word embedding in sequence representation.