Determination of Genetic Characterization and Circulation Pattern of Respiratory Syncytial Virus (Rsv) in Children With a Respiratory Infection, Tehran, Iran, During 2018-2019

The RSV-associated disease accounts for a signicant health burden particularly in infants and young children who need to be hospitalized. Since continuous surveillance of circulating RSV genotypes is critical worldwide, this study aimed to investigate the genetic diversity of RSV circulating strains causing upper or lower acute respiratory infection. Our attention was geared towards studying the cases in hospitalized or outpatient children younger than 2 years of age in Iran during 2018/2019. In this study, nasopharyngeal aspirates collected from 206 children who presented with respiratory infection symptoms, were admitted to the referral pediatric ward of Bahrami children’s hospital in Tehran, Iran. RSV-positive samples were detected via Nested RT-PCR. The glycoprotein gene was sequenced, and virus genotypes were conrmed through phylogenetic analysis by the MEGA X program. A total of 74(35.92%) samples tested positive for RSV. Among them, sequencing was done in 10 specimens from 2018 (RSV-A: RSV-B =4:6) and 19 specimens from 2019 (RSV-A: RSV-B =16:3). According to Phylogenetic analysis, all RSV-A strains were assigned as ON1 genotype and RSV-B strains were assigned as BA9 genotype. A new N-glycosylation site in Iranian BA9 and positive selection in ON1 genotype was observed. Phylogenetic characterization of strains in the current study revealed co-circulation of ON1 and BA9 as the only prevalent genotypes of both RSV-A and –B groups.


Introduction
The overwhelming majority of acute lower respiratory infections (ALRI) and hospitalization in early life are caused by the Respiratory syncytial virus (RSV) (Shi et al. 2020). Newborn infants and young children and people aged > 65 years are more susceptible to develop severe RSV infection (Smithgall et al. 2020).
RSV infection usually shows mild, cold-like symptoms that recover in a short period, but young children may develop pneumonia or bronchiolitis and it may sometimes cause death (Garcia-Mauriño et al. 2019;Fergie et al. 2020). Mortality rates in infants hospitalized with RSV are approximately 10 times greater than in infants infected with the in uenza virus (Miyairi and DeVincenzo 2008).
The biological mechanisms that cause disease progression remain to be fully understood (Janssen et al. 2007). However, multiple contributing factors including environment, pathogen, and the host's heredity determine susceptibility to pathogens and also the course of the disease (Miyairi and DeVincenzo 2008).
RSV is a species of Orthopneumovirus genus within the Pneumoviridae family (Cui et al. 2013). The genome of RSV is a negative-sense ssRNA of about 15.2 kb, that contains 10 genes encoding 11 proteins (Malasao et al. 2015). Three surface proteins of virion are coded by attachment (G), fusion (F), and small hydrophobic (SH) genes (Gimferrer et al. 2015). The G glycoprotein is a highly glycosylated type II surface protein (Tabor et al. 2020). It contains two hypervariable (HVR) regions on the extracellular domain (HVR1 and HVR2), which are separated by a 13-amino acid highly conserved region (aa 163-189) (Zhang et al. 2018). The region used for Genotyping and evolution analysis is the HVR2 in the Cterminal (Tan et al. 2012).
ON1, the novel genotype that rst emerged in 2010 in Ontario Canada, rapidly became the dominant genotype of the RSV A group worldwide (Malasao et al. 2015;Kang et al. 2020). ON1 has 72 nucleotide duplications in the HVR-2 of the G gene (Trento et al. 2003) and resembles the BA genotype that contains 60-nucleotide duplications in the HVR-2 of the G gene (Dapat et al. 2010).
Our knowledge about the molecular phylogeny of RSV in Iran is insu cient. GA-1, GA-2, GA-5, and BA were the genotypes co-circulating in Iran from 2007 to 2013 (Salimi et al. 2016). In 2015-2016 a shift to ON-1 as the main circulating genotype occurred (Malekshahi et al. 2019). So, continuing molecular and epidemiological surveillance of the HRSV strains is needed to provide a better understanding of the virus pathogenesis and vaccine designs. Here, ndings of genetic characterization of the HRSV strains circulating in Iran during 2018/2019 are reported.

Patients and samples
In total, 206 nasopharyngeal swabs were taken from children under two years of age from January to March 2018 and December 2018 to May 2019 in the referral pediatric ward of Bahrami children's hospital in Tehran, Iran. The patients had symptoms ranging from mild to severe. Our cases were grouped as inpatient and outpatient. Demographic information of the studied cases is shown in table1.
Informed consent was obtained from all parents of infants following a description of the aim of the study. All samples were taken by Copan swabs containing transport media and were rapidly transported to the Virology Department, Tehran University of Medical Sciences.

Viral RNA extraction, Reverse transcription & HRSV detection
Brie y, 200μl of nasopharyngeal samples were subjected to Viral RNA extraction using the High Pure Viral Nucleic Acid Kit (Roche, Mannheim, Germany), according to the manufacturer's instruction and stored at -80°C for the next steps. Reverse transcription was performed on extracted RNA using First-strand cDNA synthesis kit (Roche, Mannheim, Germany) according to the manufacturer's instruction. For the HRSV detection, a Hemi-nested polymerase chain reaction of the N gene was done as previously described (Malekshahi et al. 2010 HVR2 region of the G gene was ampli ed using a Hemi-nested PCR as previously reported (Malekshahi et al. 2020) employing the primer pairs in Sato et al.'s method(Sato et al. 2005) which detected both the RSV A and B types (table2). To determine if the genotypes of RSV strains tested positive, the PCR products were sequenced using the ABI BigDye® Terminator Cycle Sequencing Kit v3.1, on the 3130 Genetic Analyzer Automated Sequencer (Applied Biosystems Foster City, California, USA).

Phylogenetic analysis of the G gene sequences
To determine the genotype of the detected viruses, sequences were separately aligned to reference sequences of the genotypes in RSV-A, and -B groups. Next, a phylogenetic tree of RSV-A strains was constructed using the maximum-likelihood method and the Tamura-Nei (TN93) model. The best substitution model was obtained by nding the best DNA/Protein models option implemented in MEGA X software. Subsequently, the reliability of the phylogenetic tree was evaluated by the 1000 bootstrap replicate with a cut-off of 70%. The Phylogenetic tree of RSV-B strains was constructed using the neighbor-joining method and the Tamura-Nei model. To determine the evolutionary distances between Iranian strains and strains circulating in other regions, another phylogenetic tree was drawn by including strains of related genotypes from other countries in the GenBank. Using the maximum-likelihood method and Tamura Nei-model were utilized for the ON1 strains, while the maximum-likelihood method and HKY model were employed for BA9 strains. The nomenclature of detected viruses was based on the method described by Salimi V et al. (Salimi et al. 2021) as follows: HRSV/subgroup/geographic identi er/unique identi er/year of collection (e.g., HRSVs/A/Tehran. IRN/ 8/2018). The sequences are available from GenBank under accession numbers: MW675694-MW675722.

Amino acid analysis:
To assay the possible synonymous mutations, amino acid sequences of the G gene of Iranian strains were subjected to multiple sequence alignments with corresponding reference sequences utilizing the Bio Edit v.7.2.5 software (Fig3). Moreover, the proportion of differences (p-distance) of both nucleotide and amino acid sequences were evaluated.

N-/O-glycosylation sites analysis:
The acquisition or loss of putative N-and O-glycosylation sites in the HVR2 region of the G protein were predicted by using NetN-Glyc 1.0 (www.cbs.dtu.dk/services/NetNGlyc) and NetOGlyc 4.0 Servers (www.cbs.dtu.dk/services/NetOGlyc) respectively, with a threshold value of 0.5.

Selection pressure analyses:
The synonymous (dS) and nonsynonymous (dN) mutation rates determine whether any sites in the G gene are subjected to Selection pressure. For interpretation, the dN>dS represent the index of positive selection, while dN<dS represent the negative pressure. Analyses were carried out on the Datamonkey website interface (http://www.datamonkey.org) (Pond and Frost, 2005) using the following methods: SLAC (single likelihood ancestor counting), FEL ( xed effects likelihood), MEME (mixed effects model for evolution) with a p-value threshold of 0.1 and FUBAR (Fast, Unconstrained Bayesian AppRoximationa) with a p-value threshold of 0. 9.

Patient characteristics
From January to March 2018, and December 2018 to May 2019, 206 nasopharyngeal swabs were collected from children (33.98% female, 66.01% male, 0 month-2 years) with upper or lower RTI. The age of the cases ranged from 1 month to 24 months. The mean (±S.D.) age of the studied cases was 6.5±5.9 (months) and a median age of 5.00 months. Among the study patients, 110 (53.39%) were less than 6 months old; 62 (30.09%) were between 6 and 11 months old, and 34(16.50%) were between 12 and 24 months old. The study comprised 148 inpatients (71. 84%) and 58 outpatients (28.15%).

Phylogenetic analysis of RSV genotypes
According to Phylogenetic analysis, all RSV-A strains were classi ed as the ON1 genotype (Fig2). The phylogenetic tree of ON-1 strains in various parts of the world is clustered in two different lineages ( Figure   3). Even though the Iranian strains are clustered in both lineages, most of them, however, are grouped in Lineage I. Strains in lineage I, are clustered with closely related strains from Lebanon, Egypt, Saudi Arabia, Thailand, India, Italy, Germany, USA, Australia, and Greece. Lineage II consists only of our isolates which are clustered separately.
Based on Phylogenetic analysis, all RSV-B strains were classi ed as the BA9 genotype (Fig2). The phylogenetic analysis of Iranian sequences and BA9 strains of various parts of the world is shown in Figure 3. The phylogenetic tree of BA9 strains in various parts of the world is clustered in tow lineages. All BA9 Iranian strains fall into lineage I, clustered with closely related strains from Bulgaria, Korea, India, China, Taiwan, and Russia. At the nucleotide level, the average pairwise distances within the ON1 genotype in our strains was 0.04. In addition, the sequenced p-distance of our ON1 strains compared to the ON1 prototype strain (JN257693) was in the range of 0.01 to 0.05. The average pairwise distances within the BA9 genotype in our strains was 0.01. The nucleotide distances between the study strains and prototype BA-9 strain (DQ227395.1) were in the range of 0.04 to 0.06.

Amino acid analysis
Compared to the ON1 prototype (JN257693.1), 17 amino acid substitutions were detected in more than 2 Tehran strains, of which L274P, L298P, Y304H, and L310P were the most common substitutions. The position of the rst three of these substitutions is in the insertion site or its duplicated site. Six unique substitutions were detected including E224G (n=2), L226P (n=2), T231S (n=2), N242S (n=2), S277P (n=4) and Y309H (n=4). An association was observed between substitutions of E224G and T231S, as well as S2770P and Y309H. 2 amino acid sites, L226P and G284S, subjected to substitutions in the Iranian strains (RSVs/A/Tehran.IRN/53/2019, RSVs/A/Tehran.IRN/61/2019, and RSVs/A/Tehran.IRN/11/2018) were detected as an epitope in escape-mutant strains. These two sites plus 225A are predicted to be putatively positively selected. Compared to the BA reference strain (AY333364.1), ve substitutions were found in BA9 studied strains as characteristics of the BA9 genotype: L223P, S247P, T270I, V271A, and H287Y. Among them, V271A is speci c to the BA9 genotype consistent with a previous report (Dapat et al. 2010). Compared to the BA-9 reference strain (DQ227395.1), four substitutions occurred in all nine studied strains including T254I, I281T, T290I, and T312I which seemed to be characteristic of a separate sub-lineage. Seven unique substitutions were detected in individual strains including T228A (n=1), I229L (n=1), T250A (n=1), P295L (n=1), S307P (n=1), T308S (n=1) and S311C (n=2). Interestingly, the mutation in the stop codon (Stop 313 Q) was found in Iranian strains circulating in different cities from 2009 to 2016 as well as in Iraq, Turkey, the Philippines, and Belgium. This substitution transformed the stop codon into a coding site in position 313. As codon 313 in the BA prototype (AY333364.1) encodes an amino acid, this constitutes a reverse substitution called a ip-op pattern.

N-/O-glycosylation sites analysis:
The pattern of N-glycosylation of the HVR2 region in all RSV-A studied strains was the same as the ON1 prototype and included a potential N-glycosylation site at positions 237. The potential N-glycosylation site in the BA9 reference sequence is in position 296. In all our strains this N-glycosylation motif was conserved. Besides, P231L substitution in four of the studied strains resulted in a potentially new Nglycosylation site. Some Ser/Thr amino acids in the HVR2 region of both ON1 and BA9 strains were substituted with other amino acids that resulted in the loss of the O-glycosylation sites. That notwithstanding, some other substitutions resulted in the acquisition of O-glycosylation sites.

Selection pressure analyses
Using MEME, FUBAR, and FEL Methods, evidence of positive or diversifying selection of ON1 strains at site 262 was observed. Moreover, strains were subjected to pervasive negative or purifying selection in 230 and 239 codons using FUBAR, FEL, and SLAC. Selection analyses of BA9 strains exhibited sites 311 as positive or diversifying (p=0.92), and 296 and 304 as negative or purifying selections (p=0.92) using the FUBAR method. Other methods did not nd this positive and negative selection. Since a consensus of at least two models is essential to con rm the selection pressure sites (Haider et al. 2018) we do not consider them as selection evidence.

Discussion
In this study, we determined the molecular epidemiology and genetic diversity of RSV circulating strains among children aged less than 2 years with acute RTI in Iran over two consecutive seasons between 2018-2019. The prevalence rate of RSV infection (36%), was lower than those for the 2015-2016 study by 46.2%, but higher than those for other developing countries ( , resulted in selective advantage and predominance. Binding enhancement may also contribute towards the survival of ON1 strains although this needs further analysis. Based on the phylogenetic tree of RSV-A ON1 (Figure 2), there were 2 lineages in Iran. Most of the studied sequences were in lineage I along with sequences from across the world. For example, Lebanon, Egypt, Saudi Arabia, Thailand, India, Italy, Germany, and Greece. Lineage II included 2 Iranian strains which were not clustered with any of the ON1 strains reported from other countries. Previously ON1 circulating strains in Iran in 2015-2016 were closely related to our studied strains. One of our strains clustered to the Lebanese strains of 2016-17 which shared seven substitutions compared to other studied strains. These were G284S, E295K, P320K, Y304H, S314P, T319I, and P320K with bootstrap support of 98 and a p-distance <0.001, considered by Abou-EI et al. as a new genotype [32]. Contrarily, this did not appear as a separate genotype in our phylogenetic tree. Some amino acid changes of the Iranian strains which were found nearly in all samples detected in the initial emergence of the ON1 genotype, comprised of the L274P, L298P, L310P, and Y304H substitutions.
In addition to the mentioned changes, unique substitutions were detected in our isolates; E224G, L226P, T231S, N242S, S277P, and Y309H that may be speci c to the Iranian strains. This may be monitored in the future as it occurred in a previous unique change in Iran; the K321E, which was also identi ed in the current study. Another previous unique change was the I243S which was not only detected in the current study but also seen in Italian isolates in 2020. Given that multiple epitopes-neutralizing antibodies interactions in the hypervariable C terminal region of the G gene occurs (Cane 1997), substitutions in this region indicate the antigenicity variation due to selection pressure (Kenmoe et al. 2018). According to Figure 3, all BA9 Iranian isolates were assigned to lineage I and were closely related to strains detected since 2017 in various parts of the world. This included Bulgaria, Korea, India, China, Taiwan, and Russia showing a broad geographic spread. These strains are in a separate sub-lineage with speci c substitutions; T254I, I281T, T290I, T312I. Besides, BA9 strains circulating in different cities in Iran from 2009 to 2016 clustered in separate lineages compared to our strains. The mutation in the stop codon (Stop 313 Q) that was detected in the strains circulating in different cities of Iran from 2009 to 2016 as well as in Iraq, Turkey, the Philippines, and Belgium, altered the length of the G protein from 313 amino acids to 320 amino acids. No study has yet been found that examines the effect of this change on the pathogenicity of the virus. The N-Glycosylation pattern of ON1 Iranian strains was the same as the prototype ON1 strain in codon 237. Also, in BA9 strains the N-Glycosylation site was the same as the BA9 prototype in codon 296. However, in four of the BA9 strains, a P231L substitution resulted in the new Nglycosylation site. This substitution is also present in other countries such as India, Thailand, China, Korea, Taiwan, Myanmar, Cameroon, and Bulgaria. Given that Glycosylation accounts as one of the important viral strategies for evasion of pre-existing antibodies (Cui et al. 2013), this substitution probably changes the antigenicity of strains and will be selected through the evolution. 231L N-Glycosylation site has also been detected in the BA-10 genotype strains (Kenmoe et al. 2018). The G protein of the ON1 isolates is under positive selective pressure at site 262, as described earlier (Yoshihara et al. 2016;Zheng et al. 2017). This site is subjected to E262K substitution in some of our isolates as detected in other regions such as Spain, Vietnam, China, Italy, and Portugal. The evolution analysis also revealed 2 negatively selected sites. The G protein of the BA-9 isolates is under positive selective pressure at site 312 using the FUBAR method, and under negative or purifying selection pressure at the 225, 297, and 305 sites. These selected sites in Iranian isolates are not supported by other methods. Therefore, these results are not strongly con rmed and were not found by other studies. A possible explanation for this may be the lack of any evolutionary study in the strains carrying this T312I, which was observed in strains detected around 2017 onwards This may be categorized as a separate sub-lineage containing selected sites showing the selection pressure from the host. The current study only examined a phylogenetic analysis of isolates successfully sequenced. Some samples which tested positive for the N gene were negative for the G gene. It may be due to low RNA concentration. As genotyping is based on the G gene, samples that tested negative or weakly positive for the G gene were not sequenced. The low RNA concentration may have resulted from either RNA degradation or improper sampling.

Conclusion
In summary, this study demonstrated that the RSV prevalence among children under the age of 2 was 36% with co-circulation of ON1 and BA9 genotypes in Tehran, the capital of Iran. Continuing Phylogenetic and molecular surveys is critical for tracking epidemic-circulating strains, and the correlation between distinct genotypes and pathogenesis to develop effective prophylactic and therapeutic interventions. Further studies in all regions of Iran with more focus on the F gene or whole genome sequences along with the G gene via Next Generation Sequencing are strongly recommended to give a more comprehensive insight on the circulating RSV strains.    Phylogenetic tree of HVR2 G gene sequences of RSV-ON1 (A) and BA9 strains (B) from Iran (solid triangles and circles) compared with the sequences with high similarity (97%-100%) were constructed with maximum-likelihood and Tamura-Nei model for ON1 and HKY for BA9 strains using MEGA X software . bootstrap values >70% are displayed at the branch nodes. The lineages (I, II) are indicated by Roman numerals. Our RSV strains from Tehran/Iran are indicated by "solid triangle" and isolates in the previous study in Tehran are indicated by "solid circles" for (A ) and by solid circles and un lled circles, respectively, for (B).