Culture of contemporary EHV2 isolates from peripheral blood mononuclear cells (PBMCs)
Contemporary EHV2 isolates were collected by co-culture of peripheral blood mononuclear cells (PBMCs) originating from 5 different horses, with equine fetal kidney (EFK) cell monolayers after 5 to 21 days of incubation. Of 149 plaques, most were identified as either EHV2 or equid gammaherpesvirus 5 (EHV5) by polymerase chain reaction (PCR) (Table 1). In total, more plaques tested positive to only EHV2 (n = 58/149, 39%) compared to plaques positive for only EHV5 (n = 26/149, 17%) (Table 1). To ensure distinct EHV2 isolates were selected for genome analysis, plaque purified isolates from the PBMC co-cultures were initially characterised by qPCR-high resolution melt (qPCR-HRM) curve analysis of the gB gene and then a subset of the 58 EHV2-positive isolates (n = 8) with distinct melt profiles were selected for sequencing (Table 1) along with a further 10 EHV2 historical isolates (Table 2).
Table 1 Equid gammaherpesvirus detection in PBMC-EFK co-cultures by qPCR-HRM targeting the glycoprotein B ORF.
Horse no.
|
Host
|
Single virus positive
|
EHV2 & 5 positiveb
|
EHV2 & 5 negativec
|
Isolates sequencedd
|
HRM Tm °Ce
|
EHV2b
|
EHV5b
|
1
|
4 yr old TBa
|
8/13 (61%)
|
1/13 (8%)
|
4/13
(31%)
|
0/31
|
H1-3-2018
|
86.8
|
2
|
17 yr old TBa
|
14/67 (21%)
|
17/67 (25%)
|
34/67
(51%)
|
2/67
(3%)
|
H2-80-2018
|
89.0
|
3
|
23 yr old pony
|
8/16 (50%)
|
3/16 (19%)
|
2/16
(12%)
|
3/16
(19%)
|
H3-25-2018
|
89.9
|
4
|
34 yr old pony
|
8/22 (36%)
|
5/22 (23%)
|
2/22
(9%)
|
7/22
(32%)
|
H4-147-2018
|
87.0
|
H4-18-2018
|
88.5
|
5
|
Retired TBa
|
20/31 (65%)
|
0/31 (0%)
|
10/31
(32%)
|
1/31
(3%)
|
H5-91-2018
|
88.3
|
H5-57-2018
|
85.0
|
H5-60-2018
|
89.2
|
Total
|
58/149 (39%)
|
26/149 (17%)
|
52/149 (35%)
|
13/149
(9%)
|
|
a TB = Thoroughbred
b Number (percentage) of viral plaques positive for EHV2 only, EHV5 only, or both viruses (dual positive)
c Number (percentage) of viral plaques where neither EHV2 nor EHV5 were detected
dH(1-5) denotes the horse number from which isolates were recovered
eHRM Tm = high resolution melting temperature
Table 2 Details of archived and contemporary Australian EHV2 isolates used in this study
EHV2 Virus
isolate ID
|
Year
|
Typea/
Similarity
|
Locationb
|
Source
|
References
|
Virus passagec
|
GenBank accession number
|
SRR accession number
|
1-141-67
|
1967
|
1-141-67
|
Parkville
|
Nasal
|
Turner et al.,1970
|
23
|
MW322569
|
SRR17631889
|
2-16-67
|
1967
|
86-67
|
Parkville
|
Nasal
|
Turner et al.,1970
|
21
|
MW322570
|
SRR17631888
|
ER34-67
|
1967
|
1-141-67
|
Bundoora
|
Nasal
|
Turner & Studdert, 1970
|
14
|
MW322583
|
SRR17631879
|
1039-94
|
1994
|
1-141-67
|
Euroa
|
Nasal
|
Holloway et al.,2000
|
8
|
MW322581
|
SRR17631878
|
691-82
|
1982
|
86-67
|
Bacchus Marsh
|
Pharyngeal
|
Browning & Studdert, 1974
|
9
|
MW322580
|
SRR17631877
|
Fin60-72
|
1972
|
86-67
|
Laverton
|
Nasal
|
Wilks & Studdert, 1974
|
9
|
MW322584
|
SRR17631876
|
Fin150-72
|
1972
|
86-67
|
Laverton
|
Nasal
|
Wilks & Studdert, 1974
|
9
|
MW322585
|
SRR17631875
|
Fin180-72
|
1972
|
86-67
|
Laverton
|
Nasal
|
Wilks & Studdert, 1974
|
9
|
MW322586
|
SRR17631874
|
9951E-69
|
1969
|
1-141-67
|
Frankston
|
Eye
|
Browning et al.,1988b
|
10
|
MW322582
|
SRR17631873
|
157IFeye-69
|
1969
|
86-67
|
Frankston
|
Eye
|
Studdert,1971
|
11
|
MW322579
|
SRR17631872
|
H1-3-2018
|
2018
|
86-67
|
Wallington
|
PBMC
|
Present study
|
6
|
MW322571
|
SRR17631887
|
H2-25-2018
|
2018
|
86-67
|
Wallington
|
PBMC
|
Present study
|
6
|
MW322573
|
SRR17631886
|
H3-80-2018
|
2018
|
1-141-67
|
Wallington
|
PBMC
|
Present study
|
6
|
MW322576
|
SRR17631885
|
H4-18-2018
|
2018
|
86-67
|
Wallington
|
PBMC
|
Present study
|
6
|
MW322572
|
SRR17631884
|
H5-91-2018
|
2018
|
86-67
|
Wallington
|
PBMC
|
Present study
|
6
|
MW322577
|
SRR17631883
|
H5-57-2018
|
2018
|
86-67
|
Wallington
|
PBMC
|
Present study
|
6
|
MW322574
|
SRR17631882
|
H5-60-2018
|
2018
|
86-67
|
Wallington
|
PBMC
|
Present study
|
6
|
MW322575
|
SRR17631881
|
H4-147-2018
|
2018
|
86-67
|
Wallington
|
PBMC
|
Present study
|
6
|
MW322578
|
SRR17631880
|
a Type based on gB (site 1) sequence (Holloway et al., 2000)
b All locations are suburbs or towns in Victoria, Australia
c Virus passage number (in EFKs) at which the viruses were used in this present study
Complete genome sequencing of 18 Australian EHV2 field isolates
Assembly of the genome was done with only one terminal repeat since they are direct repeats leaving the reference genome size at 166,886 bp instead of 184,439 bp for the full genome. Genome sequences of the 18 EHV2 isolates were assembled by mapping to the reference genome EHV2 86-67 (all isolates) and by de novo assembly (two isolates: Fin60-72 and 157IFEye-69). Comparison of the two genome sequences assembled by both reference mapping and de novo assembly methods showed that most disagreements between the two methods occurred in highly variable regions containing insertions/deletions (INDELs) and in repeat rich regions, particularly at the terminal repeats (TR). The genomes produced by the two methods of assembly (de novo and reference assisted) shared 89.7% pairwise identity in the TR region of (Fin60-72) and 91.3% for (157IFEye-69). by comparison, their unique regions showed 97.9% identity for Fin60-72 identify for de novo assembly and 97.3% identity for 157IFEye-69.for reference assisted assembly. The internal repeat regions IR1L, IR1R, IR2L, and IR2R assembled by de novo and reference-assisted methods had 99%, 100%, 88.4% (due to 96 bp gap in de novo assembly) and 98.9% identities respectively for 157IFEye-69, and 100%, 100%, 98.1% and 98.5% for Fin60-72, respectively. Alignments of the genomes produced by the two methods of assembly (see Supplementary Figure 1 and 2, Additional file 1). The genome length (including only 1 TR) produced using each method was comparable for both isolates, where the respective map to reference and de novo coverage was 166,763 bases compared to 166,709 bases, respectively, for Fin60-72, and 156,230 bases compared to 156,141 bases, respectively, for 157IFEye-69. Regions of difference between the two methods containing INDELs in some genes (ORFs 29, 34 and 48) were amplified by PCR in order to ascertain their sizes. The size of the PCR amplicons reflected the expected sizes of 400, 942 and 671 bp respectively) consistent with the map to reference method of genome assembly, rather than the de novo assembly method, which predicted sizes of 613, 465, and 361 bp respectively. The map to reference method was therefore used for determining the full genome sequences of all 18 Australian isolates of EHV2 in this study. Annotation of ORFs was compared to EHV2 strains 86-67 and G9/92 as references to identify discrepancies and validate the annotation method. The alignment of the 20 whole genome sequences (18 sequences from this study, along with the sequences of strains 86-67 and G9/92) is shown in Figure 1.
Alignment of the complete genome sequences was performed using MAFFT. The prototype strain EHV2 86-67 with the first terminal repeat (TR) removed was used as the reference sequence. Vertical black lines indicate SNPs compared to the reference and dashes indicate sequence gaps.
Nucleotide diversity of EHV2 whole genome sequences
The average size of the complete genomes containing both TRs is 183,470 bp and ranged from 173,753 bp (157-IFEye-69) to 184,828 bp (1039-94), where the genome of isolate 157-IFEye-69 had a large deletion resulting in the absence of ORFs 75, E9 and E10. A summary of sequence and assembly metrics is provided (see Supplementary Table 1, Additional file 2). The previously published sequence of strain 86-67 encodes a truncated ORF51 (homologue of EBV gp350) while the strain G9/92 encodes the full-length protein. Three other isolates in this study also contained mutations in ORF51. These, along with other genes containing mutations that disrupt ORFs (INDELs indivisible by three or deletions resulting in a frameshift) were predicted to result in a truncated protein being expressed as summarised in Table 3.
The percentage nucleotide identity between the 20 genome sequences revealed a high level of
genetic diversity amongst EHV2 strains, where nucleotide identities ranged from 86.2 to 99.7%. A nucleotide identity matrix is provided (see Supplementary Table 2, Additional file 2).
Table 3 Genes containing ORF-disrupting mutations
Gene
|
Type of mutation
|
Isolate
|
ORF length
(bp)b
|
Predicted protein (aa)
|
E6A
|
Deletiona
|
86-67
|
348
|
116 (FLd)
|
691-82
|
195c
|
64
|
Fin180-72
|
231c
|
76
|
ORF7
|
Deletiona
|
86-67
|
2187
|
729 (FLd)
|
1-141-67
|
2185
|
230
|
ORF51
|
Deletiona
|
G9-92
|
837
|
279 (FLd)
|
147-2018
|
826
|
33
|
86-67
|
831
|
97
|
1-141-67
|
288
|
95
|
2-16-67
|
723
|
62
|
ORF64
|
Deletiona
|
86-67
|
11,268
|
3755 (FLd)
|
80-2018
|
11,281
|
899
|
1039-94
|
11,254
|
899
|
ER34-67
|
11,248
|
1498
|
1-141-67
|
11,197
|
948
|
ORF74
|
Deletiona
|
86-67
|
993
|
330 (FLd)
|
3-2018
|
1000
|
83
|
ORF75
|
Deletiona
|
86-67
|
4038
|
1345 (FLd)
|
157IFEye-69
|
840
|
178
|
E9
|
Complete gene deletion
|
157IFEye-69
|
|
|
E10
|
Complete gene deletion
|
157IFEye-69
|
|
|
a – deletion (with some indivisible by 3) resulting in a premature stop codon
b – open reading frame
c – frame shift resulting in premature stop codon and shortened ORF
d – expected number of amino acids in full length (FL) product as predicted in already published sequences, reference strain 86-67 or strain G9/92
The phylogenetic analysis revealed EHV2 isolates grouped into 2 distinct clusters, with most of the isolates clustering with strain 86-67 (including G9/92 isolated in the UK), whereas only 1 contemporary and 3 archived isolates clustered with 1-141/67 (Fig. 2).
Genome sequence polymorphism was interrogated by DnaSP analysis (Rozas et al., 2017; Sijmons et al., 2015). Alignment of the 20 whole genomes contained 175,141 nucleotide sites. Of these, 149,886 sites contained no gaps and 9.1% (15,879) of these sites were polymorphic (S). The average number of nucleotide differences (k) between any two genomes was 4323. Estimated inter-strain nucleotide diversity (π) was 0.029 which represents 2.9% of the analysed sequence sites across the full genome. We analysed the π values of other selected viruses using publicly available complete genome sequences. For equine alphaherpesvirus 4 (EHV4, n = 14) and equine alphaherpesvirus 1 (EHV1, n = 22) strains published by Vaz et al (2016), π was much lower, at 0.0014 and 0.0011, respectively. The inter-strain diversity of EHV2 is more comparable to that of the highly variable betaherpesvirus human cytomegalovirus (HCMV, n = 124, π = 0.021) (Sijmons et al., 2015) than that of the gammaherpesvirus EBV (n = 60, π = 0.0079) (Palser et al., 2015).
Analysis of the diversity and divergence of EHV2 genes
The percentage nucleotide identity amongst the 20 EHV2 isolates for each gene was evaluated from the alignment of each gene sequence (Fig. 3a). The average nucleotide diversity (π) for individual gene sequences was determined for all pairwise comparisons of each gene from the 20 isolates (Fig 3b). Genes with low nucleotide diversity values (Ka < 0.002) in EHV2 were mostly involved in DNA-processing (ORFs 6, 9, 18, 25, 26, 44 and 54), while genes with high diversity values (Ka > 0.025) in EHV2 encoded structural proteins such as glycoproteins (gB, gH, gL, gp48, gM) and tegument (ORFs 52 and 64). Other genes showing high diversity includes ORFs 73, 74 and 51. Nonsynonymous substitutions per nonsynonymous site (Ka) and synonymous substitutions per synonymous site (Ks) were determined for each of the 78 EHV2 genes (Fig. 3c). Nine EHV2 genes (12%) had Ka values < 0.002 (conserved) (Table 4), while 19 EHV2 genes (24%) had a value > 0.025 (divergent) (Fig. 3c, Table 5) (Sijmons et al., 2015).
The EHV2 core genome contains 13 genes that are unique to the EGHVs (EHV2 and EHV5). These genes, annotated with the ‘E’ prefix are located towards both genomic termini, or close to the internal repeat regions and mostly (9 of 13) displayed high diversity (Ka values > 0.025 and π values 0.035 – 0.381) (Fig. 3, Table 5, and Supplementary Table 3, Additional file 2).
The Ka/Ks ratio is used as a measure of the selection pressure acting on a gene (Wang et al., 2009). A ratio close to zero indicates strong negative/purifying selection, a ratio close to 1 indicates neutral selection or genetic drift, while a ratio higher than 1 indicates positive/diversifying selection. Almost half of the EHV2 genes (47%) had a Ka/Ks ratio < 0.5, 17% had a ratio between 0.5 and 1, and 36% had a ratio >1 (see Supplementary Table 3, Additional file 2). Only a weak correlation (Spearman’s rank correlation analysis, ρ = 0.212, P = 0.06) was observed between selection pressure (Ka/Ks ratio) and nucleotide diversity (π) of EHV2 genes. High Ka/Ks ratios do not directly translate to a high diversity and vice versa.
Table 4 Highly conserved EHV2 genes (Ka values ≤ 0.002)
ORF
|
Kaa
|
Gene Familyb
|
Gene
Conservationc
|
Gene product
|
Functions
|
6
|
0.001
|
DNA binding protein
|
Core
|
ssDNA binding protein
|
DNA replication and possibly
involved in gene regulation
|
9
|
0.001
|
DNA polymerase type B
|
Core
|
DNA polymerase
|
DNA replication
|
10
|
0.002
|
Deoxyuridine triphosphatase-related protein (DURP)
|
Beta/Gamma
|
Protein G10
|
Unknown
|
18
|
0.001
|
UL79
|
Beta/Gamma
|
Protein UL79
|
Promote accumulation of late transcript
|
25
|
0.000
|
Major capsid protein
|
Core
|
Major capsid protein
|
Capsid morphogenesis
|
26
|
0.001
|
TRX2 protein
|
Core
|
Capsid triplex subunit 2
|
Capsid morphogenesis
|
33
|
0.001
|
UL16
|
Core
|
Tegument protein UL16
|
Possibly involved in virion morphogenesis
|
35
|
0.001
|
UL14
|
Core
|
Tegument protein UL14
|
Virion morphogenesis
|
36
|
0.002
|
Conserved herpesvirus protein kinase
(CHPK)
|
Core
|
Tegument serine/threonine
protein kinase
|
Protein phosphorylation
|
44
|
0.001
|
Helicase
|
Core
|
Helicase-primase helicase subunit
|
DNA replication
|
54
|
0.001
|
DURP
|
Core
|
Deoxyuridine triphosphatase
|
Nucleotide metabolism
|
aKa = nonsynonymous substitutions per nonsynonymous site.
bNA = not assigned to a gene family.
cGene conservation over different herpesvirus subfamilies; Core = conserved in all herpesviruses, Beta/Gamma = conserved in betaherpesvirinae and gammaherpesvirinae subfamilies
Table 5 Highly divergent EHV2 genes (Ka values > 0.025)
ORF
|
Kaa
|
Gene familyb
|
Gene conservationc
|
Gene product
|
Functions
|
E1
|
0.139
|
GPCR 1
|
EHV2
|
E1
|
Intracellular signalling (7TMR domains)
|
E2
|
0.089
|
NA
|
EHV2
|
E2
|
Signalling peptide (Ig domain)
|
E3
|
0.130
|
Complement Control Protein (CCP)
|
EHV2
|
E3
|
Membrane protein
|
8
|
0.042
|
Glycoprotein B
|
Core
|
gB
|
Cell entry and cell to cell spread
|
E5A
|
0.090
|
NA
|
EHV2
|
E5A
|
Unknown
|
E6A
|
0.285
|
E6A
|
EHV2
|
E6A
|
Contain signal peptide
|
22
|
0.067
|
Glycoprotein H
|
Core
|
gH
|
Cell entry and cell to cell spread
|
46
|
0.027
|
Uracil DNA Glycolase (UNG)
|
Core
|
uracil-DNA glycosylase
|
DNA repair
|
47
|
0.138
|
Phage_glycop_gL
|
Core
|
gL
|
Tegument
|
E7A
|
0.033
|
NA
|
Gamma
|
g42
|
Cell entry, binds MHC-II
(C-type lectin-like domain)
|
51
|
0.208
|
Epstein-Barr GP350
|
Gamma
|
gp350
|
Cell attachment
|
52
|
0.093
|
BLRF2
|
Core
|
virion protein G52
|
|
53
|
0.052
|
Glycoprotein N
|
Core
|
gN
|
Virion morphogenesis
and membrane fusion
|
64
|
0.036
|
large tegument protein
|
Core
|
large tegument protein
|
Capsid transport
|
E7
|
0.041
|
IL-10
|
EHV2
|
IL-10
|
Immune regulation
|
73
|
0.086
|
NA
|
Gamma
|
nuclear antigen LANA-1
|
Involved in latency
|
74
|
0.084
|
GPCR 1
|
Beta/Gamma
|
Membrane protein G74
|
Intracellular signalling (7TMR domains)
|
E9
|
0.109
|
NA
|
EHV2
|
Membrane protein E9
|
|
E10
|
0.143
|
CARD
|
EHV2
|
apoptosis regulator E10
|
Involved in apoptosis
|
aKa = nonsynonymous substitutions per nonsynonymous site.
bNA = not assigned to a gene family.
cGene conservation over different herpesvirus subfamilies: Core = conserved in all herpesviruses, Gamma = conserved in all members of the subfamily gammaherpesvirinae, Beta/Gamma = conserved in all members of the betaherpesvirinae and gammaherpesvirinae subfamilies, EHV2 = EHV-2-specific gene.
The gB, gH and DNA polymerase genes are commonly investigated in EHV2 studies and thus additional analyses were performed for gB and gH genes that incorporated other publicly available sequences from other EHV2 isolates (see Supplementary Table 4, Additional file 2 for details of other sequences). Of these 3 genes, the highest nucleotide identities were observed for the DNA polymerase gene, which also had the lowest Ka value (0.001) (Table 4). All three genes had Ka/Ks ratios <0.5 suggesting negative selection (see Supplementary Table 3, Additional file 2).
Amino acid sequence analysis of EHV2 gB sites
The amino acid sequence of the gB from 30 EHV2 strains (including 18 determined in this study and 12 sourced from GenBank, see Supplementary Table 4, Additional file 2) showed conservation of both the proposed endoproteolytic furin cleavage site R-X-K/R-R at residues 437-440 and the GQLG sequence at residues 580-591 reported to be conserved in all herpesviruses examined. The 13 cysteine residues are also conserved in the 30 gB sequences including synonymous substitution observed in two foreign isolates (Goltz et al., 1994; Holloway et al., 2000; Sharp et al., 2007; Sorem & Longnecker, 2009). Antigenic sites of EHV2 gB had previously been characterised based on variability of 4 strains (86-67, 2-141-67, 5FN and T-2) and their reactivity to a panel of monoclonal antibodies (Holloway et al., 2000). Variability amongst the 30 gB aa sequences within the antigenic site I (aa 27-51), show 18 isolates were 86-67-like and shared between 89.47% to 100% aa identity at this site. The antigenic sites designated II and III (Fig. 4) are further confirmed as regions of high variability amongst EHV2 strains and the 30 strains shared 55-100% and 31-100% aa identities at these sites, respectively. Figure 4 also shows regions of amino acid variation beyond those previously identified antigenic sites.
Phylogenetic analysis of EHV2 GPCR gene family
EHV2 encodes 3 GPCR genes (ORF74, E1, and E6) which share some similarities to cellular chemokine receptors (Telford et al., 1995). The nucleotide sequences of E6, ORF74 and E1 from this study were aligned with sequences (see Supplementary Table 4, Additional file 2) representing previously identified genogroups (Sharp et al., 2007). The genogroups identified by phylogenetic analysis in the current study were consistent with previous findings (Sharp et al., 2007) except for ORF74. E6 was the least variable gene and divided the isolates into 2 clear genogroups (Fig. 5A), while E1 generally grouped in the 6 existing genogroups (Fig. 5B). ORF74 grouped similarly to the Sharp study (Sharp et al., 2007) although several additional sequences identified in the current study might form a new genogroup (Fig. 5C). Consistent with this, both ORF74 and E1 are amongst the most divergent genes in EHV2, Ka > 0.025 (Table 5) with Ka/Ks ratios of 0.25 and 0.4, respectively, indicating they are under a negative/purifying selection (see Supplementary Table 3, Additional file 2). It was observed that the EHV2 viruses recovered from the same horse did not share similar genogroups in all the GPCR genes (E6, E1, and ORF74) (Fig. 5).
Genome diversity of EHV2 recovered from individual horses
Consistent with their PCR-HRM results, the EHV2 isolates recovered from the same horses at the same time were not identical. Isolates recovered from Horse 4 (147/2018 and 18/2018) shared 97% identity, while isolates from Horse 5 (91/2018, 57/2018 and 60/2018) shared an average of 95% identity across their genomes, including the genome termini. The similarity plot between isolates from the two horses show a similar trend of variability along the genome, highest at the termini (see Supplementary Figure 3 and 4, Additional file 1).
Recombination analysis of EHV2 genome
The 20 aligned EHV2 genome sequences were examined for evidence of recombination across the complete (excluding the left TR) and unique genome regions, as well as repeat regions, IR (1 and 2), and TR using SplitsTree4 (Huson, 1998). The reticulate networks and pair-wise homoplasy index (Phi) test detected significant recombination between the 20 EHV2 strains in all the genomic regions analysed (Fig. 6). Reticulate phylogenetic recombination networks were also generated for some genes (gB, gH and the 3 GPCRs) (see Supplementary Figure 5, Additional file 1). The RDP4 program was used to detect recombination events and recombination breakpoints. Evidence of multiple recombination events was detected (Fig. 6 and Supplementary Table 5, Additional file 2) and recombination breakpoints were widespread through the length of the genome (Fig. 7). Overall, 155 recombination events were reported, representing those detected by 3 programs or more. Of these 105 were detected by 5 or more programs (see Supplementary Table 5, Additional file 2). While a large number of breakpoints were shown across the length of the EHV2 genome, no significant recombination hot spots were detected.