Immune Epitope Map of the Reported Protein Sequences of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2)

Identifying immunogenic sequences of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) proteins is important in developing epitope-based vaccine and diagnostics. This step is critical in designing potent vaccines and highly specific diagnostic tools which can help prevent the spread of this disease. In this study, we identified, using in silico analysis tools, immunogenic epitopes of the reported sequences of SARS-CoV-2 proteins and determined similar sequences with known viral proteins. The amino acid sequences of the SARS-CoV-2 proteins were acquired from the National Center for Biotechnology Information (NCBI) database. B-cell epitope prediction was done using in silico analysis tools available at the Immune Epitope Database and Analysis Resources (IEDB). Blastp was performed on the identified immunogenic sequences to determine similarities with known viral proteins and deduce possible locations in the coronavirus. We were able to identify B-cell epitopes of the SARS-CoV-2 polyprotein, surface glycoprotein, envelop, membrane glycoprotein, nucleocapsid phosphoprotein, orf3 , orf7a and orf8 . No epitope was identified in orf6 and orf10 . High similarities of the predicted immunogenic epitopes of the SARS-CoV-2 were observed with the 2003 SARS-CoV. However, unique epitopes were identified in non-structural proteins (NSP) 1 and 3 and surface glycoprotein of the SARS-CoV-2.


Introduction
Coronaviruses are single-stranded, positive-sense RNA viruses which are classified into four genera; namely, alpha, beta, delta, and gamma coronaviruses. The former two genera primarily infect mammals, whereas the latter two primarily infect birds [1,2]. Its genome is the largest among the RNA viruses and includes a variable number (around 6 to 11) of open reading frames (orf).
Coronavirus replication is somewhat unique wherein; it involves ribosomal frameshifting or slippage and having a large replicase gene with an open reading frame (orf1ab).. The replicase gene occupies around two-thirds of its genome and encodes the 16 nonstructural proteins (NSPs). The remaining one-third of the genome (~10kb) encodes for the structural and accessory proteins [1,3]. The main structural proteins include the viral envelope-bound membrane protein (M), envelope protein (E) and spike protein (S) and the RNA-bound nucleocapsid (N) [3,4]. A fifth structural protein, the hemagglutinin esterase (HE), may be present but only among betacoronaviruses [5]. Aside from the structural proteins, its gene encodes 16 non-structural proteins which are responsible either in viral gene replication, protein scaffold formation, proteolytic maturation of proteins, and protection from host's immune response [6].
Until recently, there were six coronaviruses (CoVs) known to infect humans; HCoV-229E, HCoV-OC43, HCoV-NL63, HCoV-HKU1, SARS-CoV and MERS-CoV which evolved between 1960 and 2015 [7]. By the end of 2019, however, a new coronavirus was detected in China among individuals suffering from acute respiratory distress [8]. From the initial cases identified to have links with the Huanan seafood and wildlife market in Wuhan City at the Hubei Province in Central China, this zoonotic emerging infection, has now reached 25 countries in Asia, North America, Europe, and Australia [9,10]. The exact source of exposure leading to this event is still under investigation.
Researchers worldwide rushed to sequence the viral genome to aid state authorities in building their diagnostic and rapid containment capabilities. This emerging threat has caused an unprecedented alarm among states and was immediately recognized by the World Health Organization (WHO) as a Public Health Emergency of International Concern [9,11]. As of 15 March 2020, the global confirmed cases of coronavirus disease 2019 (COVID-19) has already reached more than 153 thousand cases and has claimed 5,375 lives [13].
Coronaviruses have been notoriously implicated in recent high-profile, cross-border outbreaks affecting human populations. Phylogenetic studies of these viral family suggest a high capacity for transmission across species barriers having been found in bats, pigs, camels, and humans. The increasing frequency of its genetic recombination coupled with profound human-animal interface activities leads to higher probabilities of zoonotic spillover events [13][14][15]. The emergence of novel pathogens, such as the SARS-CoV-2, poses a serious threat to human health of up to global proportions because of the knowledge gaps on the pathogen causing the disease and the lack of preformed immunity among individuals [16]. This knowledge gap, particularly on the molecular characteristics of SARS-CoV-2, is a barrier in creating strategies in controlling the spread of the infection including the development of rapid diagnostic devices and designing of vaccines [17].
Fortunately, bioinformatics tools such as epitope analysis resources and sequence identity analysis tools can be exploited in identifying and mapping immunogenic sequences and their possible locations in viral polyproteins [18,19].
In an effort to contribute to the existing knowledge gap on the identity and genomic characteristics of the SARS-CoV-2, we aimed to identify, using in silico prediction tools, B-cell epitopes of the of the SARS-CoV-2 which can serve as basis for future recombinant engineering work and vaccine design studies. We also aim to determine similarities in the identity of the in silico-predicted epitopes with other viral proteins found in public databases, especially those which are closely related to SARS-CoV-2. Focus has been established on SARS-related coronaviruses (SARS-CoV) and other significant members of betacoronavirus as these were the apparent nearest relative of SARS-CoV-2 based on current phylogenetic data.

Results
We were able to identify, using in silico epitope prediction, tools available in the Immune Epitope Database and Analysis Resources (IEDB), potentially immunogenic epitopes of the reported amino acid sequences of SARS-CoV-2 polyprotein, surface glycoprotein, orf3, envelop protein, membrane glycoprotein, orf7a, orf8, and nucleocapsid phosphoprotein. For the polypeptide sequence of orf6 and orf10, none was found to be potentially immunogenic, and all values are lower than the cut-off.
Supplementary Table 1 and Supplementary Table 2 presents the position, sequences, antigenicity, surface accessibility, and hydrophilicity scores of the predicted epitopes. Combining continuous adjacent sequences of the predicted 10-mer epitopes generated 111 epitopes for the polyprotein, 22 for the surface glycoprotein, three for orf3, a single 11-mer epitope for the envelop protein, five for membrane glycoprotein, four for orf7a, five for orf8, and six for the nucleocapsid phosphoprotein. These sequences are presented in Table 1 and Table 2.  [20].
The availability of current technologies has also paved the way for a quicker response to human diseases. Molecular biology-based technologies, including advancement of sequencing methods, helped in the characterization of pathogens. Whole genome sequences can be done with remarkable speed, accuracy, and depth of information [21]. In addition, bioinformatics tools and global genomic and proteomic databases haver aided scientists worldwide in understanding molecular structures and characteristics, hence developing strategies to control human diseases [22].
The application of computational methods in immunology, such as in silico epitope prediction, enabled researchers to focus on and prioritize immune targets for experimental epitope mapping, saving time and resources, which are crucial in providing expedient epidemic containment and response [23][24][25]. In silico epitope mapping helped researchers to expeditiously identify epitopes essential in rational vaccine design and development of epitope-based diagnostic serological devices [26][27]. In this paper, we present putative epitopes of SARS-CoV-2 proteins, including sequence similarities with other viral proteins, which may potentially be used in the development of epitopebased vaccine against this recent emerging infection. One of the findings presented in this paper that may have impact in the disease control strategies is the high homology between the immune epitopes of SARS-CoV-2 and the 2003 SARS-CoV which also originated in China. We were able to identify high sequence homology between SARS-CoV-2 NSP1, NSP3, NSP7, NSP8, NSP9, NSP10, NSP12, NSP13, NSP14, NSP15, and surface glycoprotein of the SARS-CoV-2 with the corresponding proteins of SARS-CoV. This has been consistent with previous reports on the phylogenetic relatedness of SARS-CoV-2 with SARS-CoV, although, the highest genetic sequence similarity was observed with bat-derived SARS-like virus (~88% genetic identity) which proves its zoonotic origin [2,28,29]. These observations may have possible implications on the therapeutic and surveillance strategies since protein similarities in NSPs and surface glycoprotein between these two betacoronaviruses may yield cross-protection between SARS-CoV-2 and SARS-CoV as previously observed in cases of other human coronavirus infection; explain possible similarities in the mechanism of infection, hence, treatment; and prevent the error of using SARS-CoV-2 and SARS-CoV homologous epitopes in antibody-based detection which, in serological assays, have been known to be the cause inability to correctly discriminate closely -related pathogens, thus, decreased specificity of the serological test [30][31][32].
The protein with the greatest number of homologous epitopes with SARS-CoV, based on the blastp performed, is the surface glycoprotein. Seventeen out of the 21 in silico-predicted epitopes of the SARS-CoV-2 surface glycoprotein are at least 64% homologous with the epitopes of the SARS-CoV spike glycoprotein. This observation is very important to note since the surface glycoprotein is pathogenically and serologically important because of its role in viral and host cell membrane fusion, hence, a good prospect as epitope-based vaccine due to its ability to produce viral-neutralizing antibodies [33-34].
In the polyprotein, the portion which has the highest number of predicted epitopes is at the putative position of the NSP3 protein located between amino acid position 920 to 2665. This portion also contains the most number (8 of 11) of SARS-CoV-2 unique epitopes not only for the polyprotein but for all the reported proteins analyzed based on the blastp analysis we performed. The finding is not surprising knowing that the NSP3 is the largest nonstructural protein of CoVs and has been reported to be heavily involved in proteolytic processing and polyprotein maturation. On the other hand, the identified unique residues, especially for relevant proteins such as the surface glycoprotein, can be further explored experimentally to confirm its feasibility and uniqueness against other viruses, particularly, coronaviruses. During the SARS outbreak, there was difficulty in identifying actual SARS cases from common cold viruses based on serological tests as there was a high seroprevalence in the population of antibodies against the common cold, aggravated by the presence of cross-reactive antibodies against conserved coronavirus epitopes. Nevertheless, serological testing has its advantage of detecting asymptomatic infections, monitoring disease progression and study of post-infection transmission dynamics [25,41]. After epitope prediction, sequence homology of the predicted immunogenic epitope was done to identify related viral proteins. Proteins reported to have similarity with the predicted immunogenic epitope, its origin, and percent identity with the query sequence were noted and reported. Putative amino acid positions in the SARS-CoV-2 were compared with reference alignment of a bat coronavirus sequence (data not shown) and positions reported in recent literature [18,19,29].

Author's Contributions
LAG -performed in silico epitope mapping, blastp analysis and preparation of manuscript GLU -performed identification possible location of identified epitopes in coronavirus proteins and preparation of manuscript

Competing Interests
The authors declare no competing interests.