In the last two decades prior to the current SARS-CoV–2 outbreak, two coronaviruses gained prominence due to its novelty, infectivity, and virulence - the Severe Acute Respiratory Syndrome coronavirus (SARS-CoV) in 2002—2003 and the Middle East Respiratory Syndrome coronavirus (MERS-CoV) in 2012 (Son et al., 2017). The lessons learned in both epidemics are being applied by scientists around the world in the current SARS-CoV–2 outbreak as evidenced by increased data transparency and broader information sharing among stakeholders [20].
The availability of current technologies has also paved the way for a quicker response to human diseases. Molecular biology-based technologies, including advancement of sequencing methods, helped in the characterization of pathogens. Whole genome sequences can be done with remarkable speed, accuracy, and depth of information [21]. In addition, bioinformatics tools and global genomic and proteomic databases haver aided scientists worldwide in understanding molecular structures and characteristics, hence developing strategies to control human diseases [22].
The application of computational methods in immunology, such as in silico epitope prediction, enabled researchers to focus on and prioritize immune targets for experimental epitope mapping, saving time and resources, which are crucial in providing expedient epidemic containment and response [23–25]. In silico epitope mapping helped researchers to expeditiously identify epitopes essential in rational vaccine design and development of epitope-based diagnostic serological devices [26–27]. In this paper, we present putative epitopes of SARS-CoV–2 proteins, including sequence similarities with other viral proteins, which may potentially be used in the development of epitope-based vaccine against this recent emerging infection. One of the findings presented in this paper that may have impact in the disease control strategies is the high homology between the immune epitopes of SARS-CoV–2 and the 2003 SARS-CoV which also originated in China. We were able to identify high sequence homology between SARS-CoV–2 NSP1, NSP3, NSP7, NSP8, NSP9, NSP10, NSP12, NSP13, NSP14, NSP15, and surface glycoprotein of the SARS-CoV–2 with the corresponding proteins of SARS-CoV. This has been consistent with previous reports on the phylogenetic relatedness of SARS-CoV–2 with SARS-CoV, although, the highest genetic sequence similarity was observed with bat- derived SARS-like virus (~88% genetic identity) which proves its zoonotic origin [2,28, 29]. These observations may have possible implications on the therapeutic and surveillance strategies since protein similarities in NSPs and surface glycoprotein between these two betacoronaviruses may yield cross-protection between SARS-CoV–2 and SARS-CoV as previously observed in cases of other human coronavirus infection; explain possible similarities in the mechanism of infection, hence, treatment; and prevent the error of using SARS-CoV–2 and SARS-CoV homologous epitopes in antibody-based detection which, in serological assays, have been known to be the cause inability to correctly discriminate closely -related pathogens, thus, decreased specificity of the serological test [30–32].
The protein with the greatest number of homologous epitopes with SARS-CoV, based on the blastp performed, is the surface glycoprotein. Seventeen out of the 21 in silico-predicted epitopes of the SARS-CoV–2 surface glycoprotein are at least 64% homologous with the epitopes of the SARS-CoV spike glycoprotein. This observation is very important to note since the surface glycoprotein is pathogenically and serologically important because of its role in viral and host cell membrane fusion, hence, a good prospect as epitope-based vaccine due to its ability to produce viral-neutralizing antibodies [33–34].
In the polyprotein, the portion which has the highest number of predicted epitopes is at the putative position of the NSP3 protein located between amino acid position 920 to 2665. This portion also contains the most number (8 of 11) of SARS-CoV–2 unique epitopes not only for the polyprotein but for all the reported proteins analyzed based on the blastp analysis we performed. The finding is not surprising knowing that the NSP3 is the largest nonstructural protein of CoVs and has been reported to be heavily involved in proteolytic processing and polyprotein maturation. Furthermore, it was reported that NSP3 is involved in multiple interactions with other NSPs providing cooperative enzymatic functions. Surprisingly, the NSP3 is highly divergent among CoVs with mutations leading to evolutionary adaptations specific to certain coronaviruses [35–37]. The NSP3/4 macrodomain and transmembrane units are also critical for the ability of coronaviruses to evade the immune system. Experimental studies in both SARS-CoV and MERS-CoV revealed that subunits of NSP3/4 induced the formation of double-membrane vesicles (DMVs), which are specialized replicative organelles (ROs), that enhances viral RNA synthesis while hiding double-stranded RNA from detection by the innate immune system [6,38,39]. A study mentioned the detection of proteinases NSP3 and NSP5 in the mature virion along with the structural proteins [40]. This phenomenon should be elucidated further as data on NSP3 is relatively scarce compared to its structural counterparts.
On the other hand, the identified unique residues, especially for relevant proteins such as the surface glycoprotein, can be further explored experimentally to confirm its feasibility and uniqueness against other viruses, particularly, coronaviruses. During the SARS outbreak, there was difficulty in identifying actual SARS cases from common cold viruses based on serological tests as there was a high seroprevalence in the population of antibodies against the common cold, aggravated by the presence of cross-reactive antibodies against conserved coronavirus epitopes. Nevertheless, serological testing has its advantage of detecting asymptomatic infections, monitoring disease progression and study of post-infection transmission dynamics [25,41].