Animal sampling.
A total of 3,284 lung samples of small mammal individuals were collected from different sites in the three countries between April 2006 and November 2018. These comprised 3,191 lung samples from rodents, 66 lung samples from shrews and two lung samples from short-tailed gymnure under the order Eulipotyphla, and 25 lung samples from tree shrews under the order Scandentia. Samples were obtained from a total of 18 provinces, including the Bangkok urban area, Kanchanaburi, Chiang Rai, Tak, Prachuap Khiri Khan, Loei, Nan, Songkla, Udon Thani, Kalasin, Phrae, and Nakhon Ratchasima in Thailand, Vientiane, Champasak, and Luang Prabang in Lao PDR, and Preah Sihanouk, Mondulkiri, and Pursat in Cambodia (Figure 1A). The samples included 25 species of rodents from the families Muridae, Spalacidae, and Sciuridae, three shrew species from the family Soricidae, and two species of tree shrew from the family Tupaiidae that reside in urban, rural, and wild areas throughout the Indochina Peninsula. The most commonly sampled species were Rattus exulans, R. tanezumi, Bandicota savilei, B. indica, Maxomys surifer, and Mus cervicolor (Table S1). Rattus was the dominant genus, accounting for 49.66% of the total samples. The genus Bandicota accounted for 18.30% and Mus accounted for 14.92% of the total samples. These rodent species are ecologically closely associated with humans, arthropod vectors, and other animals in these areas.
Meta-transcriptomic analysis and overview of the virome
Due to the repeated sampling of some species in the same location, the 3,284 isolated lung total RNA samples were combined into 233 pools with equal quantity. To determine whether viral RNA was present in each pool, an rRNA excluded RNA library was constructed and processed for NGS-based meta-transcriptomic analysis. A total of 262.38 GB of nucleotide data was obtained. Reads that were classified as cellular organisms (including bacteria, archaea, and eukaryotes) and reads with no significant homology to any amino acid sequence in the NCBI NR database were removed. The remaining 495,579 reads were used to identify their best-matched hit with viral proteins available in the NCBI NR database (Table S2). Due to the presence of numerous transcripts from the hosts and other cellular organisms, most pools had low levels of viral RNA.
To analyze the virome, virus-associated reads were classified into a group of unclassified RNA viruses and 98 known families under the double-stranded (ds) RNA viruses, reverse-transcribing viruses, single-stranded (ss) RNA viruses, dsDNA viruses, and ssDNA viruses. By further characterization of dietary habits and other host traits, non-vertebrate associated viral reads and retrotransposon related sequence reads that have previously been described [23, 29] were removed. The remaining 406,869 viral reads (approximately 82.1% of the total viral hits) were then assigned into 24 mammal-related viral families and a group of unclassified RNA viruses. The reads for each viral family in each pool were normalized by the viral genome size and the proportion of total viral reads, and the prevalence of each viral family in each province and animal species was shown in Figure 1B and Table S3. Viral reads from the families Arteriviridae, Arenaviridae, Flaviviridae, Hantaviridae, Herpesviridae, and Phenuiviridae were widely distributed, in differing abundances, in a variety of rodent and insectivore species from the different regions. Virus families Adenoviridae, Astroviridae, Anelloviridae, Coronaviridae, Caliciviridae, Hepeviridae, Herpesviridae, Paramyxoviridae, Picornaviridae, Peribunyaviridae, Rhabdoviridae, and Reoviridae were found in fewer species in the different regions. Although sequence reads related to the families Hepadnaviridae and Orthomyxoviridae were occasionally present in some of the virome, when we used RT-PCR to amplify genomic sequences of these viruses we failed to amplify any sequences. This might suggest that the Hepadnaviridae and Orthomyxoviridae viruses were of low viral load or spurious sequence similarities. In addition to the family assigned reads, a substantial number of viral reads were for unclassified RNA viruses in the realm Riboviria, including diverse Chuviridae-, Nodaviridae-, or Totiviridae- related viruses. Although most viral hits (98.64%) were assigned into the RNA virus group, a small number of DNA viral reads were found in sequence data due to their corresponding RNA transcripts being detected. Due to their low number, we did not perform any further analyses of these DNA viruses.
Based on the virome data provided by these meta-transcriptomes, the prevalence and diversity of viruses in families including pathogens that are known to cause human and animal infection or are novel in rodent were then confirmed by PCR screening of individual lung samples. In total, 216 representative virus strains were selected for genomic or partial genomic sequencing (Figure 2 and Table S4). Below we outline the characteristics of these different types of viruses.
Characteristics of negative-stranded RNA viruses
HanVs. Rodent HanVs are segmented negative-stranded RNA viruses with three genome segments (L, M, and S). These viruses of the genus Orthohantavirus, family Hantaviridae (under the order Bunyavirales), can be divided into two groups: the Murinae-related phylogroup III HanVs and the Sigmodontinae-, Neotominae-, and Arvicolinae-related phylogroup IV HanVs [10, 13, 42]. Here, we identified eleven HanV strains that were found in three species (B. indica, R. exulans, and R. tanezumi) of Murinae from five provinces (Bangkok, Chiang Rai, Kalasin, Loei, and Nan) of Thailand in multiple years (Table S4). B. indica is the main host for these HanVs. Eight strains were determined for genome sequences, and three strains were selected for sequencing of partial L, M, and S. The complete genome sequence (including non-coding regions (NCRs)) of these viruses showed more than 93.5% nucleotide (nt) identity with each other, suggesting that they all belonged to the same viral species. The open reading frame (ORF) for L of this species showed 79%-81.4% nt identity with those of Anjozorobe HanVs detected in R. rattus and Eliurus majori from Madagascar in 2014 [43], the M and S ORFs of this species showed 95%-96% and 96.1%-99.4% nt identities with those of Thailand virus strains found in B. indica of Thailand in 1994 and 2004 [44, 45] (Table S5). Phylogenetic trees based on the complete M segment-encoded glycoprotein precursor (GPC), L segment-encoded RNA-dependent RNA polymerase (RdRp), and S segment-encoded nucleocapsid protein (N) amino acid sequences were constructed (Figure 3). All HanVs identified here were assigned to the Murinae-related phylogroup III, clustered together, and formed a separate clade associated with the clade of Anjozorobe HanVs that are closely related to the lineage of the Seoul virus strains. Although the exact relationship between these HanVs and Thailand virus cannot be resolved, due to the absence of complete L sequences from the Thailand viruses in GenBank, we propose that a single lineage of the species Thailand orthohantavirus [46] includes all of the HanV strains detected in Thailand from the M, S, and partial L based alignments and our phylogenetic analysis. This suggested that these HanVs have circulated in diverse Thai provinces for several decades.
PhleVs. Similar to HanVs, viruses of the genus Phlebovirus are also linear segmented negative-stranded RNA viruses that belong to the family Phenuiviridae of the order Bunyavirales [47]. This genus contains many highly virulent viruses such as Rift Valley fever virus (RVFV), Toscana virus, and severe fever with thrombocytopenia syndrome virus. These viruses are arthropod-borne, and naturally harbored by ruminant or camel reservoirs, are transmitted by mosquitoes, sandflies, or ticks, and cause severe diseases in humans and animals [48-50]. After mapping the sequencing reads, a total of 21 rodent PhleVs’ genome sequences were completely or partially confirmed in the lung samples from B. savilei, M. surifer, Niviventer fulvescens, and several diverse Rattus species from eight Thai provinces and two Laotian provinces (Table S4). Species of Rattus species were the main hosts of the PhleVs. Pairwise alignment revealed that the partially sequenced L segments of these viruses showed less than 71.2% nt identity with all other known PhleVs, which suggested that these rodent PhleVs represent novel species (Table S6). Phylogenetic analysis of the partial L nucleotide sequences revealed two distinct lineages of rodent PhleVs in the genus Phlebovirus, lineage 1 related to Uukuniemi PhleVs, and lineage 2 related to RVFV and Salehabad PhleVs (Figure 4). The different PhleVs identified in species of the Rattus genus from different locations showed a very close genetic relationships and can be further divided into two clades within lineage 1. However, some of the PhleVs from diverse rodent species shared high sequence identities and close genetic relationships. For example, RtRl-PhenV/Tt2018, RtMs-PhenV/Ts2013, and RtBs-PhenV/Tl2009 identified in three different rodent genera are closely related to each other within lineage 1, and as well, a close relationship was seen between RtBs-PhenV/Tp2006 and RtRt-PhenV/Lv2015 within lineage 2.
AreVs. Rodent AreVs of the genus Mammarenavirus, family Arenaviridae, can be divided into the Old-World and New-World complexes [16]. These viruses are a group of linear segmented negative-stranded RNA viruses with two genome segments (L and S). A total of nineteen Old-World AreVs strains were identified from lung samples from B. indica, M. cookii, M. surifer, Menetesberdmorei, and Rattus species from five Thai provinces and the Cambodian Sihanouk province (Table S4). Species of Rattus are the main hosts for these AreVs. Fourteen strains were characterized for genome sequences, and five strains were selected for sequencing of partial L and S. Pairwise alignment of the complete genome sequences (including NCRs) revealed that seven strains shared high sequence similarity and were closely related to Cardamones virus found in Veal Renh, Cambodia, in 2009, with 96.1-100% nt identities, and 12 strains shared high sequence similarity and were closely related to the Loei River virus found in Loei province, Thailand, in 2008, with 88.2%-95.1% nt identities [51] (Table S7). In accordance with sequence alignment results, phylogenetic analyses based on the RdRP (L), glycoprotein (G), and nucleocapsid (N) proteins suggested that these AreVs could be assigned into two different lineages related to AreVs reported in China (Figure 5). We designated them as Thai-AreV lineage and Cambodian-AreV lineage. These results revealed that the Thai-AreVs have circulated in five Thai provinces for at least nine years (2010-2018), and that the Cambodian-AreV circulated in the Sihanouk region of Cambodian between 2008-2009. The only exception to this, was RtMb-AreV/Tu2016, which is a member of the Cambodian-AreV lineage but was found in Udon Thani province of Thailand in 2016.
Rhabdoviruses (RhaVs). RhaVs are a large group of linear negative-stranded RNA viruses of the family Rhabdoviridae, which currently has 20 genera [52, 53]. These viruses infect diverse vertebrates, invertebrates, and plants, and some of them can cause mild-to-severe diseases such as vesicular stomatitis virus and rabies virus [54, 55]. Here, we identified five RhaVs in N. fulvescens and Rattus species from four Thai provinces, and the complete genome sequences of two viruses were confirmed (Table S4). Unlike the previously reported rodent lyssavirus, mokola virus and murine feces-associated rhabdovirus (MuFARV) [56, 57], these four RhaVs showed less than 66.5% nt homology with known Rhabdoviridae members (Table S8). The most closely related virus was the Xingshan nematode virus 4, a newly identified RhaV in Spirurian nematodes [23]. Phylogenetic analysis based on the complete L nucleotide sequences suggested that these novel rodent RhaVs clustered with the Xingshan nematode virus 4 of genus Alphanemrhavirus (Figure 6A).
Paramyxoviruses (ParaVs). The family Paramyxoviridae is a group of enveloped viruses with negative-stranded RNA genomes that are responsible for many mild-to-severe human and animal diseases [33, 58-62]. Twelve rodent ParaVs’ sequences were identified in the lung samples from B. indica, Berylmys bowersi, Leopoldamys neilli, M. surifer, and diverse species of Rattus from five Thai provinces and two Cambodian provinces (Table S4). We obtained full-length sequences of ten virus strains. Of these, nine ParaVs were closely related to members of the genus Jeilongvirus with high sequence similarity (75.3%–77% nt identities for L ORF), two were Mossman virus related (75.8%–76.5% nt identities for L ORF), and one was Sendai virus related (91.9% nt identities for partial sequenced L) (Table S9). Phylogenetic analysis based on the partial L nucleotide sequences revealed that these rodent ParaVs were assigned into the genera Narmovirus, Jeilongvirus, and Respirovirus (Figure 6B).
Characteristics of positive-stranded RNA viruses
Hepaciviruses, pegiviruses, and pestivirus. The genera Hepacivirus, Pegivirus, and Pestivirus are within the family Flaviviridae and are positive, single-stranded RNA viruses. These viruses infect a variety of mammalian hosts, including primates, bats, horses, and rodents [63-65]. The hepatitis C virus of the genus Hepacvirus is an important causative agent of hepatitis and hepatocellular carcinoma in humans [66], and the two classic types of pestiviruses, bovine viral diarrhea virus and classical swine fever virus, are important causative agents of mild-to-severe disease in cattle and pigs [67, 68]. Here, we found a total of 51 viral members of the family Flaviviridae within the diverse rodent and shrew species lung samples from almost all sampling sites across Thailand, Lao PDR, and Cambodia (Table S4). Thirty-four strains underwent genome sequencing and seventeen strains were selected for sequencing of partial polyproteins. Pairwise alignment of the complete or partial genome sequences suggested that twenty-eight of them were hepaciviruses, with 41.4%-100% nt identities with each other, twenty-two were pegiviruses, with 42.1%-96.1%% nt identities with each other, and one was pestivirus, with 75.2% nt identity to a known rodent pestivirus (Table S10). Phylogenetic analysis based on the partial polyproteins revealed that the 51 novel viruses could be assigned to distinct novel lineages within the genera Hepacivirus, Pegivirus, and Pestivirus (Figure 7). Several host-specific lineages, including a Rattusexulans related lineage for Hepacivirus, and N. fulvescens-related and Rattus-related lineages for Pegivirus, were detected that suggested that the phylogenies of most of these viruses were strictly congruent with the relationships of their rodent or insectivore hosts. For the first time, we reported hepacivirus and pegivirus (SoSm-HepaV/Cs2009, ScTb -PegV/Tb2018, and ScTb-PegV/Tn2013) for insectivores (shrew and tree shrew). The virus SoSm-HepaV /Cs2009 represented a separate hepacivirus clade with less than 44.4% nt identity with any known virus. ScTb -PegV/Tb2018 and ScTb-PegV/Tn2013 represent a separate pegivirus clade with less than 57.6% nt identity with known viruses. These viruses, together with previously reported bat viruses, formed the main evolutionary frames for these two genera.
ArteVs. ArteVs, of the family Arteriviridae, are a group of enveloped viruses with positive single-stranded RNA genomes that are responsible for a variety of mild to severe diseases in horses, simians, and swine, such as equine arteritis virus (EAV), simian hemorrhagic fever virus (SHFV), and porcine reproductive and respiratory syndrome virus (PRRSV) [69-72]. A total of 49 ArteVs’ genomic or partial genomic sequences were identified in diverse rodent species from almost all sampling sites throughout Thailand, Lao PDR, and Cambodia (Table S4). Pairwise alignment of ORF1b revealed that these viruses shared 55.7%-98.6% nt identities with each other, and less than 66% nt identity with known ArteVs (Table S11). These sequences are most similar to previously reported Betaarterivirus and Gammaarterivirus members of the genera, PRRSVs, lactate dehydrogenase-elevating virus (LDV), and unclassified rodent ArteVs. However, unlike the results of our previous study of rodent pharyngeal and anal samples from China that showed a diverse phylogenetic scattering of ArteVs throughout the family Arteriviridae, here, phylogenetic analysis based on ORF1b and ORF5 revealed that all ArteVs found here in lung tissues clustered with each other as a separate clade within the subfamily Variarterivirinae with different host-specific lineages (such as Rattus-related, Maxomys-related, and Bandicota-related lineages) (Figure 8A). These lineages represented distinct viral classifications that differ from the previously identified genera Betaarterivirus and Gammaarterivirus of the subfamily Variarterivirinae.
CoVs. CoVs are a group of enveloped viruses with a large positive single-stranded RNA genome within the subfamily Coronavirinae, and includes viruses that result in human diseases such as colds, severe acute respiratory syndrome (SARS), Middle East respiratory syndrome (MERS), and COVID-19 [1, 3, 73]. The subfamily Coronavirinae is divided into four recognized genera, Alphacoronavirus, Betacoronavirus, Deltacoronavirus, and Gammacoronavirus [74-77]. Previously, we had found a large number of rodent CoVs in the diverse rodent species in China that were assigned into two separate lineages within Alphacoronavirus and Betacoronavirus [27]. Here, however, we only found nine CoVs within lung samples from species of Bandicota and Rattus from five Thai provinces and the Laotian Champasak province, and two of them were identified for genome sequencing (Table S4). Sequence similarity and phylogenetic analysis of RdRp revealed that all of these CoVs could be classified within Embecovirus under the genus Betacoronavirus, with nt sequence identities between 93.7% and 100% (Figure 8B and Table S12). Despite our large sample size, alpha-CoV was not found in our samples.
Hepatitis E viruses (HEVs). HEVs of the family Hepeviridae are a group of small, nonenveloped, positive single-stranded RNA viruses. Members of the species Orthohepevirus A are one of the most common causative agents of hepatitis in humans, and rodent-borne Orthohepevirus C was recently reported to be zoonotic and cause persistent hepatitis in humans [21, 40, 78, 79]. The partial genome sequences of four HEVs were identified in M. surifer of Thai Loei province, R. losea of Laotian Vientiane province, and R. exulans of Cambodian Sihanouk province (Table S4). All of these viruses were closely related to previously reported rodent HEV, strains Vietnam-105, and human patient HEV strain LCK-3110, with 77.7%-80.7% nt identities in ORF1, but less than 58.9% nt identity in ORF1 compared to HEVs from other hosts (Table S13). Phylogenetic analysis based on ORF1 assigned these HEVs into the species Orthohepevirus C, and they were closely related to the Vietnam-105 and LCK-3110 lineage, which was suspected to be a causative agent of human persistent hepatitis (Figure 9A).
Picornaviruses (PicoVs). Members of the family Picornaviridae are small, non-enveloped, positive single-stranded RNA viruses. Diverse PicoVs cause mucocutaneous, encephalic, cardiac, hepatic, neurological and respiratory diseases in a wide variety of vertebrate hosts [80]. In our samples, three PicoVs were identified in B. indica of Thai Chiang Rai province, R. tanezumi of Laotian Luang Prabang province, and R. tanezumi of Cambodian Sihanouk province (Table S4). Sequence similarity and phylogenetic analysis of complete RdRp indicated that all of these PicoVs were closely related to known rodent PicoVs with nt sequence identities between 42% and 88.4% (Table S14 and Figure S10).
Astrovirus (AstroV). AstroVs comprises positive single-stranded RNA viruses,and are members of the genus Mamastrovirus within the family Astroviridae and infect many mammals and cause gastroenteritis [81]. We identified only one AstroV in B. savilei of the Thai Loei province. This virus shared 82.05 % nt identity in complete RdRp with an AstroV previously reported in China (Table S16 and Figure S11).
Characteristics of unclassified RNA viruses
Recent viral surveillance studies in invertebrates, amphibians, reptiles, and fishes have revealed a new view of the RNA virosphere that is more diverse than the current taxonomy [23, 26]. Here, we identified a total of 28 unclassified RNA viruses that were found in the lungs of diverse rodent species from different Thai and Laotian provinces (Table S4). After ORF annotation for these complete or partial sequenced viruses, Pairwise alignment revealed that the code regions of partial RdRp for these viruses showed 22.7%-99.9% nt identities with each other, and less than 66.7% nt identity with other known RNA viruses (Table S16), suggesting that these newly identified viruses are highly diverse and distinct from known and undefined viral families. To further examine the evolutionary relationships between these viruses, phylogenetic trees were constructed based on the partial RdRp proteins of viral genomes from all related known families, genera, and unclassified taxa of invertebrates, amphibians, reptiles, and fish. The unclassified RNA viruses found here formed at least 10 distinct lineages (Figure 9B). Most of the viruses tended to form different lineages such as the partit-like-related viruses found in the M. surifer and B. savilei lineages, the Rhabdoviridae-related viruses found in the Rattus lineage, and the Totiviridae-related viruses found in the Rattus lineage. These data suggested that RNA viruses in rodents occupy a broader range of phylogenetic diversity, and is similar to the RNA viral spectrum observed in invertebrates.