The first objective was to identify the amino acid sequence of ON563414’s E8L protein which could be different from other monkeypox strains. We aimed to identify a highly conserved amino acid motif across monkeypox E8L proteins that could be used to locate ON563414 E8L. We started with an E8L amino acid sequence (Q8V4Y0) from the strain Zaire-96-I-16 which is well-annotated and documented on the Uniprot database.
>sp|Q8V4Y0|CAHH_MONPZ Cell surface-binding protein OS=Monkeypox virus (strain Zaire-96-I-16)
OX=619591 GN=E8L PE=2 SV=1
MPQQLSPINIETKKAISDTRLKTLDIHYNESKPTTIQNTGKLVRINFKGGYISGGFLPNEYVLSTIHI
YWGKEDDYGSNHLIDVYKYSGEINLVHWNKKKYSSYEEAKKHDDGIIIIAIFLQVSDHKNVYFQ
KIVNQLDSIRSANMSAPFDSVFYLDNLLPSTLDYFTYLGTTINHSADAAWIIFPTPINIHSDQLSKF
RTLLSSSNHEGKPHYITENYRNPYKLNDDTQVYYSGEIIRAATTSPVRENYFMKWLSDLREACFS
YYQKYIEGNKTFAIIAIVFVFILTAILFLMSQRYSREKQN
|
Table 1: Uniprot entry sequence for Q8V4Y0 cell surface-binding protein [4]
There were 250 BLAST hits for Q8V4Y0, 19 reviewed and 231 unreviewed. The worst hit had 38.4% percent identity while the best (excluding Q8V4Y0 itself) had 95.4% (rabbitpox virus, Utrecht strain).
Most of the top hits originated from the Vaccinia virus, but rabbitpox, taterapox, camelpox, and other poxviruses scored highly as well (Fig. 1). We noticed many BLAST hits were labeled as “carbonic anhydrase homolog”. Additionally, the Homo sapiens (human) enzyme carbonic anhydrase 3 and 13 (CA3, CA13) were hits, with percent identity of 38.2% and 37.7% respectively, suggesting that CA3 and CA13 are homologous with E8L. This homology has significant implications for vaccine and drug development. Researchers should be careful to avoid off-target effects and cross-reactivity with an important human enzyme.
Sequence alignment of CA3 and E8L showed significant homology, with 38.2% identity, a score of 388, 56.6% positives, and a low E-value of 6.1e-42 (Fig. 2).
The Clustal Omega multiple sequence alignment of the top twelve cell surface-binding protein sequences identified several perfectly conserved sequences, the longest being DDYGSNHL (Fig. 3). The amino acids around DDYGSNHL are also very similar among viruses. It appears that the cell surface-binding protein is well-conserved in poxviruses and is perhaps slower to mutate.
Given the perfect conservation of DDYGSNHL within twelve poxviruses, it was hypothesized that it would also be identical in the Massachusetts virus. So, in order to locate the position and sequence of E8L in the Massachusetts monkeypox virus genome, a text string search was performed within Expasy Translate for DDYGSNHL. The Massachusetts genome had been translated with Expasy to amino acids. DDYGSNHL was successfully located within an open reading frame.
The open reading frame (Fig. 4), highlighted in red, beginning with MPQQ and ending with EKQN) was compared to the Zaire strain cell surface-binding protein sequence. The sequences were perfectly identical except for a single residue. The nineteenth amino acid in the Zaire strain is threonine but it is alanine in the Massachusetts sequence.
> VIRT-11693:3'5' Frame 1, start_pos=31791
MPQQLSPINIETKKAISDARLKTLDIHYNESKPTTIQNTGKLVRINFKGGYISGGFLPNEYVLSTI
HIYWGKEDDYGSNHLIDVYKYSGEINLVHWNKKKYSSYEEAKKHDDGIIIIAIFLQVSDHKNV
YFQKIVNQLDSIRSANMSAPFDSVFYLDNLLPSTLDYFTYLGTTINHSADAAWIIFPTPINIHSD
QLSKFRTLLSSSNHEGKPHYITENYRNPYKLNDDTQVYYSGEIIRAATTSPVRENYFMKWLSD
LREACFSYYQKYIEGNKTFAIIAIVFVFILTAILFLMSQRYSREKQN
|
Table 2: E8L amino acid sequence for Massachusetts virus
VaxiJen 2.0 predicted Massachusetts E8L to be antigenic with a score of 0.5316, well over the standard threshold of 0.4. Next, the protein sequence was divided into all possible pentapeptides, or 5-mers, using Python. Three hundred pentapeptides were obtained.
PeptideMatch software searched for these three hundred pentapeptides in the UniProt and SwissProt databases’ human proteomes. For 272 pentapeptides, matches were found. There were 28 unique pentapeptides for which no match in the human proteome was found. The most promiscuous pentapeptide was AATTS for which 63 matching proteins were identified. Some of the pentapeptides were overlapping and could be combined to form a longer oligopeptide. For instance, IHIYW, HIYWG, and IYWGK are part of a longer oligopeptide IHIYWGK that is also not found in any human proteome.
Table 3
28 unmatched pentapeptides not found in any human proteins
Unmatched Pentapeptides |
PINIE | IYWGK | VYFQK | NYRNP |
DIHYN | VYKYS | NMSAP | RENYF |
IHYNE | INLVH | TLDYF | NYFMK |
GGYIS | KKHDD | DYFTY | YFMKW |
PNEYV | HDDGI | WIIFP | FMKWL |
IHIYW | SDHKN | PTPIN | MKWLS |
HIYWG | HKNVY | PINIH | YIEGN |
Table 4
4 longer oligopeptides formed by linking consecutive unmatched pentapeptides
Longer Unmatched Oligopeptides | Length |
IHIYWGK | 7 |
DIHYNE | 6 |
SDHKNVY | 7 |
NYFMKWLS | 8 |
A search of the IEDB found 21 epitopes containing at least one of the 28 pentapeptides. They came from ectromelia mousepox, vaccinia virus, Clostridium perfringens, and other pathogens. These epitopes are experimentally demonstrated to have immunogenic potential. Interestingly, two human epitopes appeared in the results. Supposedly, all human proteins should have been filtered out by the previous steps. It is possible that the human epitopes were not recorded in the UniProt and SwissProt databases.
Table 5
IEDB access IDs, epitopes sequence, and organism for the epitopes containing at least one of the 28 unique pentapeptides that are foreign to humans. The pentapeptides within the epitopes are bolded and highlighted for visibility.
IEDB ID | Pentapeptide (or longer) | Epitope | Organism |
874264 | IHYNE | IHYNESKPTTIQNTG | Ectromelia virus (Ectromelia mousepox virus) |
1400881 | GGYIS | KGGYISLNYL | Mus musculus (mouse) |
62863 | PNEYV | TAGPNEYVYYKVYATYRKYQ | Clostridium perfringens |
102955 | PNEYV | YISGGFLPNEYVLSSLHIYW | Vaccinia virus (vaccinia virus VV) |
102972 | HIYWGK | YVLSSLHIYWGKEDDYGSNH | Vaccinia virus (vaccinia virus VV) |
145659 | HIYWGK | YVLSSLHIYWGKE | Vaccinia virus (vaccinia virus VV) |
102535 | INLVH | INLVHWNKKKYSSYEEAKKH | Vaccinia virus (vaccinia virus VV) |
80628 | KKHDD | CDLFKKHDDAIVRLR | Argentinian mammarenavirus (Junin arenavirus) |
85490 | KKHDD | LLNLLCDLFKKHDDA | Argentinian mammarenavirus (Junin arenavirus) |
102627 | SDHKNVY | LQVSDHKNVYFQKIVNQLDS | Vaccinia virus (vaccinia virus VV) |
112306 | SDHKN | SDHKNYL | Homo sapiens (human) |
112421 | SDHKN | YSSDHKN | Homo sapiens (human) |
915942 | VYFQK | IRVYFQKL | Mus musculus (mouse) |
102627 | VYFQK | LQVSDHKNVYFQKIVNQLDS | Vaccinia virus (vaccinia virus VV) |
102371 | TLDYF | DSVFYLDNLLPSTLDYFTYL | Vaccinia virus (vaccinia virus VV) |
102371 | DYFTY | DSVFYLDNLLPSTLDYFTYL | Vaccinia virus (vaccinia virus VV) |
49730 | PTPIN | PTPINNEKDI | Plasmodium knowlesi |
49731 | PTPIN | PTPINNEKDII | Plasmodium knowlesi |
80445 | RENYF | ARENYFMRW | Vaccinia virus (vaccinia virus VV) |
102323 | RENYF | ATTSPARENYFMRWLSDLRE | Vaccinia virus (vaccinia virus VV) |
146581 | RENYF | TSPARENYF | Vaccinia virus (vaccinia virus VV) |