HMPV prevalence in paediatric hospital admissions. Between January 2007 and December 2016, 9079 individuals below 60 months of age were admitted to the paediatric wards at KCH with severe or very severe pneumonia. Samples were collected from 6756 (74%) individuals and 274 (4.1%) were determined HMPV positive (Table 1). Decreased HMPV prevalence was recorded in more recent years i.e. from 2014 to 2016 (Table 1). Among the 274 HMPV positive cases, 43.1 % (n = 118) were in children <6 months of age and 71.5% (n = 196) in children <2 year of age. Children 3 years accounted for only 3.6% of the cases (n = 11) (Table 2).
HMPV subgroup assignment. Phylogenetic analysis (F and G gene) of Kilifi, global and the reference sequences showed clustering of HMPV viruses into subgroups A2, B1 and B2. Subgroup A1 was not observed. Subgroup A2 sequences further clustered into 3 clades, the two known clades A2a and A2b, and into the unique clade A2c supported by strong bootstrap values (Figure 1 and 2). The recently identified unique HMPV A2 strain, which possess 180 duplication in G gene clustered within A2c clade (Figure1) and was not observed in the Kilifi sequences. Based on F gene phylogeny constructed using full length coding regions, similar clades and sub-groups were also observed i.e. A2a, A2b, A2c, B1 and B2 (Additional file 3).
The prevalence of group A was 51.7% (106/205) and group B 48.3% (99/205). All the group A Kilifi sequences clustered within clade A2b (84/106, 79.2%) and within the unique clade A2c (22/106, 20.8%). No A1 or A2a sequences were observed. Within group B, 48.5% (48/99) of the sequences clustered within subgroup B1 and 51.5% (51/99) of the sequences clustered within subgroup B2.
Temporal occurrence of HMPV in Kilifi. During the 10-year period of surveillance described here, HMPV infections occurred in a seasonal pattern, mainly between the month of October of one year and April of the following year, Figure 3, which was used to define an epidemic year, i.e. starting September of one year through August of the following year. Thus, subsequent analyses were done by epidemic year.Multiple subgroups co-circulated within a single epidemic year from 2007–2016, Figure 4b.There was a shift in dominance from A2b (2007 to 2011) to B1 (2012 to 2014) with subsequent rise of the A2c in recent years. Notably, B2 subgroups remained persistent but non-dominant. A2b viruses circulated only until November 2012.
HMPV subgroup prevalence and dominance in Kilifi was compared to other countries around the world. This was based on analysis of 714 F gene sequences from 18 different countries and 500 G gene sequences from 15 countries retrieved from GenBank (Additional file 2). General agreement in subgroup prevalence patterns was observed for Kilifi and global data sets, the predominance of A2b and B2 in 2007–2010 and subsequent rise of B1 and A2c in years that followed, Figure 4. Notably, at finer levels there were differences in subgroup temporal occurrence. Globally, clades A2b, A2c and subgroup B1 persisted over the 9 years. In contrast, in Kilifi only subgroup B2 persisted over the 9 years. The dominance of B1 strains in 2011 was also evident globally. A1 and A2a were the least commonly detected subgroups globally and not observed in Kilifi.
Phylogenetic analysis of Kilifi HMPV viruses.Based onmaximum likelihood phylogenies of F and G genes, sequences from consecutive epidemics largely clustered together. However, across multiple epidemics sequences from the earlier years clustered together away from those in recent years, Figure 5 and 6. Further analysis of G gene sequences aligned temporally, revealed circulation of multiple variants of a single subgroup in a single epidemic year (see Additional files 4 and 5). Each variant differed by at least five nucleotides (Additional files 4 and 5) and clustered into separate clades, Figure 6. In comparison to contemporaneous viruses (714 F gene sequences and 500 G gene sequences) (Additional file 2), viruses from Kilifi and those from other locations were highly similar, with an average nucleotide identity of 97.1% and 95.6% (in F gene), and 90.4% and 85.9% (in G gene) for groups A and B, respectively. In phylogenies of F and G, combining contemporaneous and Kilifi sequence data, the Kilifi sequences did not cluster separately from others but interspersed with sequences from Canada, Spain, Malaysia, Japan, USA, India and Croatia (Figure1 and 2), and similarly, those viruses collected from earlier years in Kilifi clustered away from those in recent years.
F gene sequence analysis. Due to partial gene sequencing (345bp) of the previously reported F gene Kilifi sequences retrieved from GenBank (28), combined Kilifi sequences were trimmed to 345bp. The F gene sequences were highly conserved both at the nucleotide (nt) and amino acid (aa) level with overall mean sequence identity of 89% (nt) and 97% (aa). Within the individual subgroups, the nucleotide and amino acid sequences were more conserved: there was 100% aa sequence similarity within clades A2b, A2c and subgroup B1, and 99.9% within B2 subgroup, and the mean nucleotide identities were 98.2% within B1, 99.2% within A2c, 98.3% within both A2b and B2. The evolutionary rates for F gene were estimated at 1.220× 10–3 (95% highest posterior density (HPD) = 1.317× 10–3 to 5.518× 10–3), 1.857× 10–3 (95% HPD = 1.111× 10–3 to 2.289× 10–3), 1.242× 10–3 (95%HPD = 8.303× 10–4 to 1.689× 10–3) and 1.050× 10–3 (95%HPD = 5.937× 10–4 to 1.600× 10–3), nucleotide substitutions/site/year for A2c, A2b, B1 and B2, respectively. Selection pressure analysis of the F gene showed overall mean dN/dS ratio of ω = 0.10, ω = 0.15, ω = 0.01 and ω = 0.12 for A2c, A2b, B1 and B2, respectively (see Additional file 6). Using M2a and M3 models, no codon sites were identified as positively selected among the four types (sub-lineage A2b, sub-lineage A2c, B1 and B2 subgroups). (Additional file 6).
G gene sequence analysis (Kilifi). G gene sequences were less conserved compared to F gene sequences with overall mean sequence identity of 73% (nt) and 56% (aa). However, within subgroups the sequences were highly conserved: the mean identities were estimated at 97% (nt) and 96% (aa) for clade A2c, 93% (nt) and 89% (aa) for clade A2b, 94% (nt) and 90% (aa) for B1, 92%(nt) and 87% for B2 subgroups. The deduced G gene amino acid sequences differed in lengths among subgroups: clade A2c (220aa), clade A2b (218 and 229aa), B2 (238aa) and B1 (232 and 242aa) long.The varying protein lengths for clade A2b and B1 subgroups were due tochanges in stop codon positions. The evolutionary rates for the G gene were estimated at 3.420× 10–3 (95% HPD interval = 1.317× 10–3 to 5.518× 10–3), 8.022× 10–3 (95% HPD = 5.893× 10–3 to 1.060× 10–2) and 9.570× 10–3 (95% HPD = 5.911× 10–3 to 1.33× 10–2) nucleotide substitutions/site/year for A2c, A2b and B1, respectively. Evolutionary rate estimates for subgroup B2 viruses were not determined, effective sample size (ESS) for all parameters failed to reach the cut-off (200) required for confirmation of convergence in BEAST analysis. Among all the subgroups, A2c exhibited significantly lower substitution rates within the G gene (95% HPD = 1.317× 10–3 to 5.518× 10–3). The estimated times of the most common ancestor of the individual subgroups dated to recent origin in both F and G genes, i.e., 2009 (95% HPD = 2005 –2011) for clade A2c sequences, 2008 (95% HPD = 2006 –2009) for clade A2b, 2005 (95% HPD = 2002 –2007) for B1 and 2001(95% HPD = 1997–2005) for B2 sequences. The estimated mean dN/dS ratio (ω) was <1 for A2b (ω = 0.57), A2c (ω = 0.38) and B1(ω = 0.78). For the subgroup B2, a higher average dN/dS ratio (ω = 1.21) was determined for G gene, indicating a signal of positive selection (Additional file 6). Using M2a and M3 models, 3 positively selected sites were identified within A2c (17, 24 and 28) and A2b (87, 123 and 135), 1 site (96) in B1 and 2 sites (12,127) within B2 sequences (Additional file 6).
HMPV subgroup and Disease severity.We did not observe a statistically significant association between HMPV subgroup and pneumonia severity ( p-value = 0.264) (Table 3).