Genomic and in silico Structural Characterization of Dobrava-Belgrade orthohantavirus Isolate from European Side of Turkey

32 Orthohantaviruses are transmitted to humans mostly through small mammals that are the reservoirs of these 33 viruses. Because orthohantaviruses show high genetic variability through geographic regions, the genetic 34 characterization of these viruses with whole genome sequencing is of great importance to clarify the molecular 35 epidemiology and track their genetic changes in the reservoir hosts. We have previously reported the presence of 36 Dobrava-Belgrade orthohantavirus (DOBV) in the Igneada region, Kirklareli province by showing antibodies 37 against the virus in rodents and by sequencing partial genomes of the virus. Here we report the whole genome 38 sequencing of DOBV Igneada strain directly from Apodemus flavicollis ’ lung tissue by next-generation sequencing 39 followed by phylogenetic analyses. In addition, viral protein structures of DOBV Igneada strain were modelled, 40 and in silico prediction analyses of amino acid changes on viral protein function and stability were performed. 41 The phylogenetic analysis showed a close relation between the DOBV Igneada strain from Turkey and 42 DOBV Ano-Poroia strain from Greece. Similarity plot analysis revealed also similarities between DOBV Igneada 43 strain and other DOBV strains from the Balkans such as Greece, Croatia, and Slovenia. Additionally, in silico 44 prediction suggested that G318E, Y322H, and S324P mutations on Gn glycoprotein are deleterious, and all amino 45 acid changes decrease the stability of both Gn and Gc glycoproteins. 46 In conclusion, full orthohantaviral genomes can be obtained directly from rodent lung tissues allowing 47 detailed genetic and structural analyses of orthohantaviruses. The DOBV Igneada strain shows great similarity to 48 the prototype Ano-Poroia strain, yet it was predicted that DOBV Igneada strain may have some changes on its 49 pathogenicity and its structure warranting further research. next generation sequencing Thus, RNA library preparation kits with specific modifications ought to be used in order to obtain whole genome sequences of RNA viruses. Here, we report the discovery of the whole genome sequence of DOBV Igneada strain from lung tissue for the first time from Turkey as a natural representative of the virus that was n’t taken to the viral cell culture in order to increase the viral load. Also, we report detailed genetic characterization and in silico prediction of structural properties of DOBV Igneada strain.


51
Orthohantaviruses are enveloped, negative-sense, single stranded, and tri-segmented RNA viruses and 52 members of the Hantaviridae family. While rodents and small mammals are the main reservoirs for these viruses, 53 humans are accidental hosts. Humans can be infected via inhalation of viral particles from the secretions of 54 reservoir animals or via direct contact with the reservoirs [1].

55
The distribution of the orthohantaviruses depends on the geographic distribution of their reservoir hosts, 56 and orthohantaviruses are separated into two main groups: New World and Old World Orthohantaviruses. The

57
New World Orthohantaviruses are mainly distributed through North and South America, and they cause 58 "Hantavirus Pulmonary Syndrome" (HPS) in humans [2]. The Old World Orthohantaviruses are mostly distributed 59 in Europe, Asia, and Africa, and they are the causative agents for "Hemorrhagic Fever with Renal Syndrome" 60 (HFRS) in human infections. Additionally, a member of the Old World Orthohantaviruses, Puumala 61 orthohantavirus (PUUV), causes a mild type infection in humans, which is called "Nephropathia Epidemica" 62 (NE), [2].

70
The Kirklareli province became an important region in point of orthohantaviruses after the detection of 71 positive rodents in our previous study [11]. Orthohantavirus positivity was initially reported serologically and then 72 confirmed as DOBV by sequencing partial S-, M-and L-segments followed by phylogenetic analysis that showed 73 the genetic relation between DOBV Igneada strains and other DOBV strains [11]. The study suggested that the 74 DOBV Igneada strains were similar to strains from Balkans. This particular region can be seen as a barrier between 75 Europe and Asia due to its various geographical properties and its location. In addition, the reservoir host 76 Apodemus flavicollis is distributed in both European side and Anatolia (Asian side of Turkey), and it shows 77 different genetic properties on these sides [26][27]. Therefore, orthohantaviruses that exist in this region, might 78 have genetic properties of viruses from both of these continents [11,25]. Thus, there is great importance to obtain 79 whole genome sequences of DOBV from this region as these might help us to understand the genetic relation 80 between orthohantaviruses from both Asia and Europe. Furthermore, very limited data is available on DOBV 81 whole genomes and additional sequences could provide detailed information about genetic similarities and 82 differences between DOBV strains as well as inform of structural and functional properties of different strains.

83
In the past, it was challenging to detect whole genome of viral zoonotic agents directly from rodents or 84 small mammals which are carriers for many different and important zoonosis agents, yet it has become easy to 85 obtain whole genome sequences with the second-generation sequencing systems such as Illumina systems over 86 the past decade [13]. On these days, the biggest challenge for the sequencing of such viruses is the low abundance 87 of viral particles in their host tissues [13]. Additionally, RNA viruses have non-coding regions (NCR) on their 5' 88 and 3' ends of their genomes, and these NCRs can't be covered if cDNA synthesis is performed before the library 89 preparation for next generation sequencing [13][14]. Thus, RNA library preparation kits with specific modifications 90 ought to be used in order to obtain whole genome sequences of RNA viruses. Here, we report the discovery of the 91 whole genome sequence of DOBV Igneada strain from lung tissue for the first time from Turkey as a natural 92 representative of the virus that wasn't taken to the viral cell culture in order to increase the viral load. Also, we 93 report detailed genetic characterization and in silico prediction of structural properties of DOBV Igneada strain.

116
In order to clean host sequences from the raw data, a dataset was created from the NCBI GenBank with 117 the sequences of host rodent Apodemus flavicollis. After that, the sequences that belonged to the host were cleaned 118 from the raw data using mirabait tool on MIRA 5.0 program [24]. After cleaning, the remaining sequence reads 119 were de novo assembled with MIRA 5.0 program. Later, de novo assembled contig was used as reference in 120 UGENE tool for remapping to confirm the accuracy of de novo assembly.

121
The assembled sequences were checked on BLAST tool from NCBI GenBank and compared with both 122 nucleotide database and protein database. Consequently, all clinically important viruses that were existing at 123 reservoir rodents, were screened and it was confirmed that DOBV sequences had the highest number and obtained 124 accurately.

125
Finally, a dataset which contained all available complete DOBV sequences, was created from the NCBI 126 GenBank database. The obtained sequence of DOBV Igneada strain was added to that dataset and later, this dataset 127 was aligned with ClustalW in MEGA X tool to see the gaps on the nucleotide sequence of DOBV Igneada strain.

128
New primers were designed according to the gaps (Supplementary Table 1

159
In addition, in silico prediction analysis for function of glycoproteins was conducted using PROVEAN

175
The complete coding sequence of DOBV Igneada strain was obtained with next generation sequencing, 176 yet there are some missing sequences at the ends of the segments. The details about the segments were shown at 177

232
After ORFs were determined and phylogenetic trees were constructed, percentage identities were calculated 233 with both nucleotide and protein sequences by MAFFT. These percentage identities for DOBV Igneada strain were 234 shown at Table 3.     In addition, genetic and geographical distances were analyzed to search for a correlation between them.

317
The DOBV-Igneada strain and Kirklareli province were set as reference points for genetic distance and 318 geographical distance, respectively. A slight positive correlation between the genetic and geographical distance 319 was observed (Figure 9).

359
An HFRS outbreak occurred in Bartın and Zonguldak provinces in Turkey in 2009, and that was the first 360 time that DOBV was detected in Turkey with a field study to determine the causative virus in those outbreaks [15].

364
After these outbreaks and diagnoses, a field study was performed by collecting Apodemus flavicollis mice from 365 the Kirklareli province by our team, and the circulation of DOBV was confirmed in this species of mice both 366 serologically and molecularly [11]. Studies done to date have shown that DOBV causes severe form of HFRS in 367 humans, and the mortality rate is higher in DOBV infections than the other HFRS causing orthohantaviruses. The 368 mortality rate varies between 5 to %15 establishing DOBV as an important virus to study in Turkey [15,[16][17][18].

369
This variation on mortality rate is caused by the pathogenicity of different DOBV genotypes that varies greatly

390
The prediction of viral structural protein models was performed in order to identify increased emerging 391 potential of DOBV Igneada strain, and also, to provide preliminary data for further study such as determining the  Figure 6). We also acknowledge that the viral protein prediction 396 using only bioinformatics tools is challenging, yet these models are the first models that belong to DOBV and they 397 might be guiding models for further and better study such as de novo viral protein modelling.

398
PROVEAN software is a tool that gives prediction on the effect of different amino acid substitutions [30].

399
The substitution might be deleterious meaning that such amino acid substitution may affect the function of a

576
X axis represents the nucleotide position on the sequences, while Y axis shows percentage similarity.