Patients and selection criteria
This study provides a retrospective analysis of 152 cases of hospitalization of children in the 3rd Children's Infectious Diseases Hospital in Astana from May 14 to October 12, 2022 with a preliminary diagnosis of enterovirus infection. Children's Infectious Diseases Hospital Astana is the only specialized institution providing inpatient treatment for childhood infections in Astana (population 1.35 million people).
Enterovirus infections were classified by clinical phenotypes, using the following definitions: herpangina (HA) and aseptic meningitis (AM) caused by enterovirus.
Herpangina and hand-foot-mouth diseases (HFMD) cause similar symptoms with ulcers in or around mouth, with HFMD causing vesicular lesions on the arms, legs, knees, or buttocks. A nasopharyngeal swabs (NSW) was taken from patients using a sterile cotton swab and placed in a transport medium (DMEM with 50 µg/ml of gentamicin) and stored at minus 70ºС until RNA extraction.
Aseptic meningitis was accompanied by an increase in body temperature up to 38–40ºС, the development of neck rigidity, headaches, photophobia was observed, and some patients experienced vomiting, loss of appetite, diarrhea, rash, pharyngitis, and myalgia. For patients with signs of meningitis, according to the standard protocol, a cerebrospinal fluid (CSF) puncture was prescribed, which was divided into 3 equal aliquots by 200 μl. According to a standard diagnostic protocol two CSF sample aliquots were used for enterovirus diagnosis. One sample was analyzed using bacteriology and cytology methods at the hospital laboratory. And the second aliquot was sent for laboratory analysis using RT-PCR in the RSE "Center for Sanitary and Epidemiological Expertise" Medical Centre Hospital of President's affairs Administration of the Republic of Kazakhstan. The third part of the CSF was stored at minus 80ºС for additional studies and genotyping. In case of presence of vesicular lesions of the oral cavity, a nasopharyngeal swab was also collected from patients.
In total, for the specified period, 139 children were hospitalized with aseptic meningitis diagnosis confirmed and the presence of enterovirus RNA in cerebrospinal fluid samples detected. Herpangina was diagnosed in 13 children patients, and enterovirus RNA detected from NSW samples.
RNA extraction
Viral RNA was extracted from 200 μl CSF or NSW samples in transport medium using GeneJET kit Viral DNA and RNA Purification Kit (ThermoScientific, Lithuania) according to the manufacturer's instructions.
Whole genome sequencing and assembly of genomes
Almost full-length genomes of enteroviruses were amplified as described previously by Isaacs et al. (2018), except that the readily-made reaction mix from BioMaster LR HS-PCR 2x (Biolabmix, Russia) was used for amplification step. Second-stage PCR products were purified with AMPure XP magnetic beads (Beckman Coulter, USA) in a 1:1 ratio, with elution in 20 µl of DNase/RNase-Free Distilled Water. Libraries were prepared using Illumina® DNA Prep kit, (M) Tagmentation (96 Samples) (Illumina, Catalog #20018705) with double barcoding. Libraries were sequenced on Illumina MiSeq using MiSeq Reagent Kit v3 (600-cycle) (Catalog #LMS -102-3003). Reagents were used according to the manufacturers' instructions.
The quality control of the obtained reads was carried out using the FastQC program (Andrews, 2017). Before denovo assembly, reads were trimmed using SeqTK (with options: - b 20 -e3) and Sickle (with options: - t sanger - q 30 - l 200 - g) followed by deduplication in the FastP program (with options: -- dedup -- dup _ calc _ accuracy 6 -- disable _ quality _ filtering). To assemble the genomes, Megahit v1.2.9 was used with varying the length of κ- mers individually for each sample, selecting the assembly with the longest lengths for subsequent contigs analysis. The resulting contigs were then used as corresponding references to obtain consensus sequences using BWA. Variants were determined using FreeBayes, consensuses were generated by BCFtools consensus. The ends of assemblies with less than 30x coverage were removed.
Determination of serotypes
Serotypes were determined by constructing a phylogenetic tree from the database generated with VP1 sequences of enteroviruses from GenBank (www.ncbi.nlm.nih.gov/genbank/). The database was selected using keywords such as "enterovirus", "VP1", "human enterovirus B". In total, approximately 26,000 genome sequence data with VP1 regions were downloaded. Using the generated database and our sequences, a phylogenetic tree was reconstructed, which made it possible to filter the 81 isolates most closely related to our sequences from the database. Further analysis was performed with 146 enterovirus isolates, including 65 obtained within the current study and Enterovirus 69 selected as outgroup (GenBank: AY302560.1) (Oberste et al., 2004). Statistical selection of nucleotide substitution model for phylogenetic analysis was performed using ModelTest -NG (v0.1.7) (Darriba et al., 2020) and the best fitting model was selected for the reconstruction of the phylogenies. Phylogenetic tree was built using the maximum likelihood (ML) method with the bootstrap random sampling method, n=1000. Estimated bootstrap (bs) values greater than 50% were shown for the tree nodes. Interactive Tree Of Life (iTOL) (Letunic & Bork, 2007) online tool was used for interpretation, visualization and annotation of the phylogenetic tree.
Genotyping by VP1 region
Representative datasets were downloaded separately for each serotype group to describe the molecular epidemiology of HEV-B serotypes in Kazakhstan. Phylogenetic relatedness of enterovirus isolates collected within current study with the most recent genotyping and sub-genotyping data was compared. Phylogenetic dendrograms were reconstructed based on entire VP1 sequences of the international strains downloaded from GenBank.
Genotyping of Coxsackievirus A9 (CVA9) was based on Zhao et al. (2022) algorithm, that included 110 international sequences and 25 obtained within current study. Coxsackievirus B5 (CVB5) genotyping of 134 isolates, including 14 Kazakhstani was performed using the same algorithm as described in He et al. (2022). Coxsackievirus B3 (CVB3) strains were analyzed based on Yang et al. (2022), including 250 worldwide strains and 1 isolate from Kazakhstan. Genetic diversity of Echoviruses group worldwide datasets and the relatedness of Kazakhstani isolates were reconstructed according to Cheng et al. (2021) for Echovirus 6 (E6)(n=73); Zhang et al. (2022) for Echovirus 9 (E9)(n=57); Li et al. (2019) and Grapin et al. (2023) for Echovirus 11 (E11) (n=140). However, there are no currently available genotyping schemes for Echovirus 21 (E21) and 25 (E25) groups. Therefore, entire VP1 coding sequences of available E21 and E25 serotypes were downloaded from GenBank (for 15th of August 2023) and phylogenetic dendrograms reconstructed, including sequences obtained within current study. Genotypes were distinguished based on estimated divergence between/within groups, calculating pairwise genetic distances in MEGA X (Kumar et al., 2018). Genotypes determined with >15% difference between, and <15% within groups.
The pairwise genetic distances for E21 and E25 serotypes were estimated within each serotype group using the Kimura 2-parameter model, with gamma distribution model (shape parameter = 5) in MEGA X (Kumar et al., 2018). Enterovirus sequences were classified into so-called genotypes within each serotype (shown in Figures S7-8) if they shared >85% sequence identity based within the VP1 capsid gene.
Sequences were aligned in Mafft (v7.520) (Katoh & Standley, 2013) and best-fitting substitution model was selected (for each serotype group) using ModelTest-NG (v0.1.7) (Darriba et al., 2020) program. Maximum likelihood tree reconstructed for VP1 sequences with bootstrapping of replicates 1000 times using RAxML-NG (v1.2.0). The ML tree was reconstructed to demonstrate the genetic diversity of the worldwide strains and the allocation of sequenced Kazakhstani enterovirus strains to particular genotype and sub-genotype group. Interactive Tree Of Life (iTOL) (v1.0) (Letunic & Bork, 2007) was used for illustration and annotation of the ML tree.
Analysis of whole genome sequencing data
Phylogenetic analysis of 65 whole genome sequences from 58 infected patients in Kazakhstan was performed. Nucleotide substitution model for evolutionary analysis was performed in ModelTest-NG (v0.1.7) (Darriba et al., 2020) and the best fitting model was selected as “GTR+I+G”. Phylogenetic tree was reconstructed using the maximum likelihood (ML) method with the bootstrap random sampling method in RAxML-NG (v1.2.0), with n=1000 bootstraps. Statistically significant bootstrap (bs) values greater than 50% were shown for the tree nodes. To detect if sufficient “temporal signal” was present in the data TempEst (v1.5.3) (Rambaut et al., 2016) software was used. Phylogenetic tree was visualized and annotated in Interactive Tree Of Life (iTOL) (Letunic & Bork, 2007) online tool. Pairwise genetic distances were estimated using ape package in R (R Core Team, 2022).