SARS-CoV-2 caused huge impact to human production, living and even life, and has become a major challenge confronting the whole world. Development of vaccine is one of the effective means of long-term prevention of the virus. Epitope vaccine is the trend of development of vaccine due to the advantages of strong pertinence, less toxic and side effects and easy to transportation and storage[25].
A group founded in March 2020 by Preston Estep, calling themselves “the Rapid Vaccine Partnership” (Radvac), has developed a very simple vaccine. In early July, Radvac published a white book detailing the vaccine they developed(https://radvac.org/). The Radvac vaccine is a "subunit" vaccine because it is composed of fragments of a pathogen, in this case it was peptide, which is essentially a short fragment of a protein that matches the SARS-CoV-2 section but does not cause disease. Subunit vaccines are already used for diseases such as hepatitis B and human papillomavirus, and a number of companies are developing subunits for COVID-19, including Novavax Biotechnology. Reliable epitopes are particularly important for the development of subunit vaccines.
(https://www.technologyreview.com/2020/07/29/1005720/george-church-diy-coronavirus-vaccine/)
The determination of epitopes is the basis of the development and application of vaccine, and the clinical diagnosis. Herrera et al.6 reported antigenic analysis of S protein obtained by ELISA, but did not study the epitopes. The conserved epitopes were predicted based on the calculation by us, which provided more reference for the immunological study of S protein. Vashi et al predicted some epitopes based on the structure of S protein[26]. Although their studies predicted both B-cell and T-cell epitopes of S protein, they did not discuss the conservation of epitopes. We effectively supplement the study of epitopes with the conservative analysis based on a large amount of data, which can ensure the long-term effect and stability of epitopes in the application process. At the same time, their study is limited to S protein, while our study on E and M protein provides more options.
Moreover, Walls et al[27] reported the use of conservative glycosylation sequence in S protein of SARS-CoV can stimulate neutralizing antibody against SARS-CoV-2, and the study of the Yuan et al[28] reported that they researched the recognition of epitopes and antibodies by parsing the structure of antibody CR3022 from Rehabilitation in patients with SARS. Wang et al[29] reported a kind of human monoclonal antibodies, which could neutralize the SARS-CoV-2, from the cell culture. What these studies have in common is that they are based on some immune responses that have already occurred. In contrast, our calculation in the computer environment is faster, but the accuracy still needs to be verified experimentally. The two methods form an effective complement.
Currently, the methods which were mainly used are X-ray scattering method, immune experiment method and bioinformatics method. The first two are time-consuming and laborious, while the bioinformatics method is gaining more and more credibility among researchers[3, 25, 30]. There are many factors to be considered in the prediction of epitopes by bioinformatics method, such as the surface probablity and flexibility of the epitopes. At the same time, it is necessary to exclude the structurally stable and non-deformable α-helix, β-sheet, glycosylation sites which may obscure the epitopes or alter the antigenicity, etc[31]. Even so, the predicted epitopes are still inaccurate[4]. Our work takes the intersection of above methods to predict, which greatly improves the stability of the prediction. Compared with the current study on SARS-CoV-2, this work adopted various prediction methods and 3D structure databases developed in recent years, which were based on artificial neural network, Hidden Markov Model(HMM), Support Vector Machine(SVM), etc, such as ABCpred, BepiPred2.0, SEPPA 3.0, IEDB, etc. Compared with prediction by a single method[32], on the basis of a single protein[33]or on the basis of epitopes of SARS[28], these methods and databases greatly improved the accuracy of prediction and had more bioinformatic meaning. We comprehensively analyzed the prediction results from the tools which were widely used, set up screening criteria on the basis of primary structure, secondary structure and tertiary structure, so that the prediction results would more accurate and reliable.
The S protein, the E protein and the M protein are surface proteins of SARS-CoV-2 that form the outer layer of the coronavirus and protect the internal RNA, which have the potential as antigenic molecules. However, considering the current study on the epitopes prediction of SARS-CoV-2[34] and due to the fact that S protein has been reported to be the directly binding molecule of SARS-CoV-2 to ACE2[35], the prediction of epitopes is mainly focusing on the S protein, with few studies on the E protein and the M protein. In this work, we analyzed the S protein, the E protein and the M protein and predicted their epitopes. On this basis, 7 B cell epitopes were predicted, including 1 conformational and 6 linear B cell epitopes, one of the conformational and one of the linear are coincide. All of the epitope A, B, C, D located on the surface of the tail of the S protein, which is relatively easy to bind. The epitope E and the epitope F located at the end of the head of the E protein coincide, and this may be explained by the fact that they are all consecutive and the secondary structure avoiding the α-helix and the β-sheet. The epitope G is derived from the M protein, and the structure and conservation could not be determined due to the inability to predict reliable structure. However, it could be inferred from the surface probability scores that the epitope G is more likely to be located on the surface of the M protein.
The higher the conservation score calculated by the Consurf Server is, the more likely the site is to be mutated in the evolutionary process. When the score<1, the site is likely to be a conservative site; when the score is between 1 and 2, the site is a site which is likely to be a relatively easy mutation; when the score>2, the site is likely to be an easy mutation site[36]. In the 7 epitopes obtained, all the epitopes of the S, E, M protein were absolute conservative among all SARS-CoV-2 sequences. The conservation of the epitope G could not be calculated by the PDB file. Our work provides identified and conserved sites for further study. Mutations that occur during the spread of the virus can cause significant resistance to vaccine development. For example, the recently reported mutation of amino acid 614 of S protein[37] not only affects the ability of the virus to transmit, but also may affect the efficacy of vaccines involving this site. Our work provides reliable candidates for the development of epitope vaccines, but the application value of the epitopes needed further experimental verification. For example, the antigenicity of the epitope could be tested. Although the epitopes could be integrally considered to be conservative, the independent residues of these epitopes could still easy to mutate. Epitopes D and E had two and one residues, respectively, with conservative scores greater than 1, meaning that they were at risk for a single point mutation. More attention should be paid to these two epitopes in application.
The epitope detection in glycoproteins is significant to the study of the immunoreaction of SARS-CoV-2, but its challenge is less reliable than the epitope detection due to the presence of glycan[33]. In addition, SARS-CoV-2 would mutate frequently, and the epitopes predicted might mutate too, so conservative epitopes analyzed in the present study might be more reliable. According to the data from NGDC, the variation frequencies of S, E, and M proteins were 0.83, 1.02, and 0.73, respectively. Under the condition of relatively high variation frequency, the conservation of the proteins was analyzed to identify the epitopes with low mutation risk, which were important for the development of long-term and stable vaccines. However, this work is limited. Without the molecular dynamic analysis, the binding between epitopes and antibodies was not simulated to further determine the availability of epitopes, but researches from different perspectives can provide more epitopes choices for subsequent studies.