Structure/epitope-based immunoinformatics analysis of structural proteins of 2019 novel coronavirus

The newly identied 2019 novel coronavirus (2019-nCoV) has caused more than 81,400 laboratory-conrmed human infections, including 3261 deaths, posing a serious threat to human health. Currently, however, there is no specic antiviral treatment or vaccine. To identify immunodominant peptides for designing global peptide vaccine for combating the infections caused by 2019-nCoV, the structure and immunogenicity of 2019-nCoV structural protein were analyzed by bioinformatics tools. 33 B-cell epitopes and 39 T-cell epitopes were determined in four structural proteins via different immunoinformatic tools in which include spike protein (22 B-cell epitopes, 25 T-cell epitopes ), nucleocapsid protein (7 B-cell epitopes, 6 T-cell epitopes), membrane protein (2 B-cell epitopes, 7 T-cell epitopes), and envelope protein (2 B-cell epitopes, 1T-cell epitopes), respectively. The proportion of epitope residues in primary sequence was used to determine the antigenicity and immunogenicity of proteins. The envelope protein has the largest antigenicity in which residue coverage of B-cell epitopes is 24%. The membrane protein possesses the largest immunogenicity in which residue coverage of T-cell epitopes is 55.86%. The reason that immune storm was caused by 2019-nCoV maybe that the membrane and envelope protein expressed plentifully in cell infected. Further, studies involving experimental validation of these predicted epitopes is warranted to ensure the potential of B-cells and T-cells stimulation for their effective use as vaccine candidates. These ndings provide the basis for starting further studies on the pathogenesis, and optimizing the design of diagnostic, antiviral and vaccination strategies for this emerging infection.


Introduction
A novel coronavirus (2019-nCoV) associated with human to human transmission and severe human infection has been recently reported from the city of Wuhan in Hubei province in China [1]. The ongoing outbreak of a novel coronavirus (2019-nCoV) causes great global concerns. Based on the advice of the International Health Regulations Emergency Committee and the fact that to date 24 other countries also reported cases, the WHO Director-General declared that the outbreak of 2019-nCoV constitutes a Public Health Emergency of International Concern on 30 January 2020 [2]. A total of 1,320 con rmed and 1,965 suspect cases were reported up to 25 January 2020; of the con rmed cases 237 were severely ill and 41 had died [3]. Currently, however, there is no speci c antiviral treatment or vaccine [4].
Vaccines play a crucial role in providing protection against a particular disease to host organism, therefore it provides help in saving millions of lives annually across the globe [5]. Vaccine development processes has depended entirely upon experimental techniques, being generally very laborious and time-consuming [6]. There has been much encroachment in the area of computational biology that aids many types of research and helps in diminishing the expected time consumption [7]. Further combination of various bioinformatics prediction methods could signi cantly increase the prediction accuracy [8]. Thus bioinformatics studies could provide reliable guidance in selecting speci c immunogenic epitopes, which will be signi cant for vaccine design, epitope mapping and antibody studies.
Coronavirus is a class of enveloped RNA virus with a 27-3 l kb long single-stranded positive-sense genome [9]. The region downstream of ORF1 contains at least 10 small ORFs, encoding the spike protein (S protein), small envelope protein (E protein), membrane protein (M protein), nucleocapsid protein (N protein) and the assumed nonstructural proteins [10]. The genome is packed inside a helical capsid formed by the nucleocapsid protein 2.4. In silico prediction of T-cell epitopes T-cell epitopes are principally predicted on the basis of identifying the binding of amino acid fragments to the MHC complexes that can activate T-cells. The binding strength of each peptide to the given MHC is estimated by NetMHCII 2.2 at a set threshold level. Given the MHC alleles have tens of thousands of kinds, in order to ensure the representative and reliability prediction results, we selected the most common MHC in the population to predict their peptide binding activity with protein, including HLA-DR, HLA-DQ, and HLA-DP. HLA-DR 101, HLA-DR 301, HLA-DR 401 and HLADR 501 were used to predict HLA-DR-based T-cell epitope. HLA-DQA10102-DQB10502, HLA-DQA10201-DQB10301, HLA-DQA10501-DQB10302, and HLA-DQA10601-DQB10402 were used to predict HLA-DQ based T-cell epitope prediction. HLA-DPA10103-DPB10601, HLA-DPA10201-DPB10101, HLA-DPA10201-DPB10501, and HLA-DPA10301-DPB10402 were used to predict HLA-DP based T-cell epitope prediction. If the predicting results of all four alleles were non epitope, then the consensus result was 0% T-cell epitopes if the results were only one or no non-epitope, the consensus result was 75 or 100% T-cell epitopes, respectively [21]. Ultimately, the results that consensus epitope result was 75 or 100% were determined as the ultimate epitope results [22]. As a result, the ultimate consensus T-cell epitope results were obtained by combining the results of the HLA-DR alleles epitopes, HLA-DQ alleles epitopes, and HLA-DP alleles epitopes.

Physiochemical analysis of structural protein of 2019-nCoV
The complete genome sequence of 2019-nCoV was available at GenBank accession (No.MN975262). The 2019-nCoV includes four structural proteins that are required to drive cytoplasmic viral assembly: S protein, M protein, N protein and E protein. The characterization of physicochemical attributes of antigenic proteins was a major step discerning the information about the biological activity of viral protein sequences [23].
Physicochemical analysis of the structural proteins was performed by ProtParam such as instability index, extinction coe cient, GRAVY, aliphatic index, theoretical pI of the protein sequences of 2019-nCoV (Table 1).
The results show that The S protein is a large protein of 1273 amino acids, with a molecular weight of 141.2 kDa and a theoretical pI of 6.24. The M protein and N protein had 222 and 419 amino acids, with a molecular weight of 25.2 and 45.6 kDa and a theoretical Pi of 9.5 and 10.1. The CoV E protein is a short protein of 75 amino acids, ranging from 8.4 kDa in size. Instability index of four structural protein is 30-60, meaning that all structural proteins of 2019-nCoV are stable. The Aliphatic index and Grand average of hydropathicity (GRAVY) are 50-120 and -1 to 2, respectively. The GRAVY of S protein and N protein is negative, meaning they exhibited hydrophilic character. The M protein and E protein of 2019-nCoV exhibited hydrophilic character. The physicochemical character may provide a selective advantage in the infected host [24].

Structure and antigenicity of structural proteins S protein
The spike forms large protrusions from the virus surface, giving coronaviruses the appearance of having crowns (hence their name; corona in Latin means crown). In addition to mediating virus entry, the spike is a critical determinant of viral host range and tissue tropism and a major inducer of host immune responses [25]. The coronavirus S protein is a multifunctional molecular machine that mediates coronavirus entry into hosT-cells [25]. Homology modeling can construct a target structure on the basis of suitable templates extracted from homologous sequences [26]. Since S protein structure is not available in the PDB molecular database, we used S protein of SARS-CoV as template to predict the 3D structure of 2019-nCoV S protein through the SWISS-MODEL server and MODELLER software. The results revealed that the S protein of 2019-nCoV is a clove-shaped trimer with three S1 heads and a trimeric S2 stalk ( Figure 1A). Monomer of S protein comprised of S1 and S2 subunits ( Figure 1B). The S1 subunit contains a signal peptide, followed by N-terminal domain (NTD) and Cterminal domain (S1-CTD) ( Figure 1C). S1-CTD contains a receptor-binding motif, which presents a gently concave outer surface to bind ACE2 (Figure 2A). The S2 subunit contains conserved fusion peptide, heptad repeat, transmembrane domain, and cytoplasmic domain. During virus entry, S1 binds to a receptor on the hosTcell surface for viral attachment, and S2 fuses the host and viral membranes, allowing viral genomes to enter hosT-cells.
Previous studies suggest that protein antigenicity is generally determined by its speci ed epitopes instead of the full length sequence [27]. To identify the antigen epitopes, bioinformatics methods are used to predict their sequences. Predicted epitopes are further synthesized in vitro and validated with experiments. DNAStar, BepiPred 1.0 server, and COBEP were used to identify the potential antigen epitope regions of protein ( Table 2).
The nal B-cell epitopes of S protein included 22 sequences through combining the results of the three tools. The S1 and S2 of spike protein has 16 and 6 B-cell epitopes, respectively. These peptides were also shown in the primary and tertiary structure of the S1 and S2 of spike protein ( Figure 2). The residues of B-cell epitopes were summarized and extracted in the Table 7.

E protein
The E protein is the smallest of the major structural proteins in CoV which was identi ed as the structural component of the virus. The E protein massively expressed virus envelope protein and plays an important role in virus membrane packaging [28]. The tertiary structure structure reveals that E protein of 2019-nCoV is a pentamer and contains multiple short α-helix ( Figure 3B). Amphipathic α-helix oligomerizes to form an ionconductive pore in membranes [29]. BepiPred 1.0 server and COBEP were used to identify the potential antigen epitope regions of protein (Table 4). Two B-cell epitopes in each monomer of E protein was identi ed through combining the two results of bioinformatic tools. The epitopes were also shown in the primary and tertiary structure of the E protein ( Figure 3). Two epitopes are located at the head and tail of the monomer of E protein and are regularly arranged at the lower and outer sides of the pentamer ( Figure 3C and 3D). The residues of Bcell epitopes were summarized and extracted (Table 7).

M protein
The M protein is the central organiser of CoV assembly and the most abundant structural protein , interacting with all other major coronaviral structural proteins [30]. [31]. Homotypic interactions between the M proteins are the major driving force behind virion envelope formation and de nes the shape of the viral envelope [32]. The M protein composed of a long α-helices and three β-sheets ( Figure 4). BepiPred 1.0 server and COBEP were used to identify the potential antigen epitope regions of protein (Table 5). Through integrating the results from bioinformatic tools, the nal 2 potential B-cell epitopes were determined. The epitopes were also shown in the primary and tertiary structure of the M protein ( Figure 4). Two epitopes are located at C-terminus of the M protein ( Figure 4). The residues of B-cell epitopes were summarized and extracted (Table 7). N protein Among structural proteins, N protein is the only protein that functions primarily to bind to the CoV RNA genome. The nucleocapsid is formed by the association of nucleocapsid (N) protein with single-stranded viral RNA The N protein, which binds to the genomic RNA via a leader sequence, recognises a stretch of RNA that serves as a packaging signal and leads to the formation of the helical ribonucleoprotein (RNP) complex during assembly making up the nucleocapsid [33]. The domains are named at N-terminal ends (NTD) and C-terminal ends (CTD) ( Figure 5B). The two domains of the N protein were constructed by homology modeling (Figure 5C and 5D ). The CTD forms a tightly intertwined dimer, indicating that the basic building block for coronavirus nucleocapsid formation is a dimeric assembly of N protein [34]. The N protein is a highly immunogenic phosphoprotein also implicated in viral genome replication [34]. DNAStar, BepiPred 1.0 server, and COBEP were used to identify the potential antigen epitope regions of protein ( Table 6). The nal 7 potential B-cell epitopes were determined through integrating the results from bioinformatic tools. The epitopes were also shown in the primary and tertiary structure of the N protein ( Figure 5). The NTD and CTD has 3 and 4 B-cell epitopes, respectively ( Figure   5C and 5D ). The residues of B-cell epitopes were summarized and extracted (Table 7).
In short, the structure and immunogenicity of 2019-nCoV structural protein were analyzed by bioinformatics tools. Thirty-three B-cell epitopes of four structural proteins were determined in which include S protein (22), N protein (7), M protein (2), and E protein (2), respectively ( Figure 6A). The B-cell epitopes residues of four structural proteins were summarized and extracted ( Table 7). The proportion of epitope residues in primary sequence was calculated. The proportion of epitope residues in primary sequence was used to determine the antigenicity of proteins. The E protein has the largest antigenicity in which residue coverage of B-cell epitopes is 24% ( Figure 6B). The E protein might be the potential targets to design effective 2019-nCoV vaccines and facilitate the development of rapid diagnostic methods in the future.

T-cell epitopes prediction of structural protein
T-cell immune response plays a very important role in persistent viral infection [35]. Cellular immune response impact disease progression and is essential factor for the viral clearance from the patients. T-cells immune response usually has been observed within 4-5 days after after a viral infection [36]. Antigenic peptides of the coronavirus protein can be recognised by T-cells on the surface of infected cells [37]. The structure of the MHC-I molecule HLAA*1101 in complex with such a peptide derived from the SARS-CoV protein has recently been determined [38]. T-cell epitope prediction is for identifying the shortest peptides of allergen that bind to the MHC complexes. T-cell epitopes are principally predicted on the basis of identifying the binding of amino acid fragments to the MHC complexes that can activate T-cells.
The binding strength of each peptide to the given MHC is estimated by NetMHCII 2.2 at a set threshold level.
Given the MHC alleles have tens of thousands of kinds, in order to ensure the representative and reliability prediction results, we selected the most common MHC in the population to predict their peptide binding activity with protein, including HLA-DR, HLA-DQ, and HLA-DP. The predicted results of the four structural proteins are listed in Table 3-6. The number of MHC-binding peptides of the four structural proteins predicted by different alleles is counted in Table 8. The peptide will possess stronger immunogenicity in case it binds more alleles.
The number of MHC-binding peptides of different allele coverage is counted in Table 9. The nal T-cell epitopes was determined through the allele coverage of more than 70%. The nal T-cell epitopes included 39 sequences through combining the results of the three alleles, in which include spike protein (25), nucleocapsid protein (6), membrane protein (7), and envelope protein (1), respectively ( Figure 6A). The residues of T-cell epitopes were summarized and extracted (Table 10). The proportion of epitope residues in primary sequence was calculated ( Figure 6B). The proportion of epitope residues in primary sequence was used to determine immunogenicity of proteins. The membrane protein has the largest immunogenicity in which residue coverage of T-cell epitopes is 55.86% ( Figure 6B). The reason that immune storm was caused by 2019-nCoV maybe that the membrane and envelope protein expressed plentifully in cell infected. The knowledge of antigenic or epitopic sites of viral protein is important for the development of effective antiviral inhibitors or vaccine. For effective vaccine, both Tand B-cells epitopes identi cation is very important for prevention or clearance of infection [39].
These results indicate the importance of this region for the development of effective vaccine. Identi ed B-cell epitope can be utilized in the development of the effective 2019-nCoV vaccine because it is a vital step for development of epitope-based vaccines and diagnostic tools. Understanding the structural protein structure and function could possibly nd therapeutic targets to prevent and control the coronaviruses related diseases.
Therefore M protein and E protein was chosen as a potential antigenic target for the humoral immune responses, which might be signi cant for developing better diagnostic and research reagents in the future. Thus our study suggests that E protein could possibly be a good candidate for B cell-line epitopes in preparing monoclonal antibodies, vaccines and anti-viral inhibitors against CoV infection in the future.

Conclusion
In the present study, using bioinformatics methods, we analyzed the four structural proteins of 2019-nCoV and identi ed a potential T-cell epitope and B-cell epitope of the structural proteins, which might signi cantly improve our current 2019-nCoV vaccine development strategies. These predicted epitopes can be used as