Animal sampling.
Lung samples of 3,193 rodents in different sites from three countries were collected between April 2006 and November 2018. Simultaneously, 69 lung samples of soricomorphs of the order Soricomorpha, 22 lung samples of tree shrews of the order Scandentia, and two lung samples of Hylomys suillus were also collected, which accounted for 2.83% of the total samples. A total of 19 provinces were chosen, including Bangkok urban area, Kanchanaburi, Chiang Rai, Tak, Prachuap Khiri Khan, Loei, Nan, Songkla, Udon Thani, Kalasin, Phrae, and Nakhon Ratchasima in Thailand, Vientiane, Champasak, and Luang Prabang in Lao PDR, and Preah Sihanouk, Mondulkiri, and Pursat in Cambodia (Fig. 1A). Samples included 25 rodent species of the families Muridae, Spalacidae, and Sciuridae, three soricomorph species of the family Soricidae, and two tree shrew species of the family Tupaiidae that reside in urban, rural, and wild areas throughout Indochina Peninsula. The most common species sampled were Rattus exulans, Rattus tanezumi, Bandicota savilei, Bandicota indica, Maxomys surifer, and Mus cervicolor (Table S1). Rattus is the dominant genus, accounting for 49.66% of the total samples. Genus Bandicota accounted for 18.30% and Mus accounted for 14.92% of the total samples. These rodent species are ecologically closely related to humans, arthropod vectors, and other animals in these areas.
Meta-transcriptomic analysis and virome overview
Because of repeated sampling of some species in the same location, the 3,284 isolated total RNAs from all lung samples were combined into 233 pools with equal quantity. To evaluate the presence of viral RNA in each pool, the rRNA excluded RNA library was constructed and then processed for NGS-based meta-transcriptomic analysis. A total of 262.38 GB of nucleotide data was obtained. The reads that were classified as cellular organisms (including bacteria, archaea, and eukaryotes) and the reads with no significant homology to any amino acid sequence in the NR database were removed from the data. The remaining 495,579 reads were best-matched with viral proteins available in the NR database (Table S2). Due to the presence of numerous transcripts from hosts and other cellular organisms, most pools have low levels of viral RNA proportion.
For virome analysis, the virus-associated reads were annotated into a group of unclassified RNA viruses and 98 families under the double-stranded (ds) RNA viruses, retro-transcribing viruses, single-stranded (ss) RNA viruses, dsDNA viruses, and ssDNA viruses in the virus root. By further screening out the dietary habits and other host traits related non-vertebrate viral reads and retrotransposon related sequence reads as described previously[19, 27], 406,869 viral reads (approximately 82.1% of the total viral hits) were assigned into 28 mammal related viral families and a group of unclassified RNA viruses. The reads of each viral family in each pool were normalized by the viral genome size and the proportion of the total viral reads, and the prevalence diagram of each viral family related to province, and animal species were shown in Fig. 1B and Table S3. Viral reads from the families Arteriviridae, Arenaviridae, Flaviviridae, Hantaviridae, Herpesviridae, and Phenuiviridae were widely distributed in a variety of rodent and insectivore species from different regions with high or low richness. Viral reads from the families Adenoviridae, Astroviridae, Anelloviridae, Coronaviridae, Caliciviridae, Hepeviridae, Herpesviridae, Paramyxoviridae, Picornaviridae, Peribunyaviridae, Rhabdoviridae, and Reoviridae were found in fewer species from different regions. Although sequence reads related to the families Hepadnaviridae and Orthomyxoviridae were occasionally present in some samples, we failed to amplify any sequences of viruses in these families, which may have been a result of low viral loads or biased comparison information. Besides the family assigned reads, substantial numbers of viral reads were included into unclassified RNA viruses under Riboviria, such as diverse Chuviridae-, Nodaviridae-, or Totiviridae- related viruses. Although a small number of DNA viral reads can be found in sequence data because corresponding RNA transcripts were detected, most viral hits (98.64%) were assigned into the RNA virus group. We described these DNA viruses only in this section of virome overview, and we did not perform further verifications for them.
Based on virome data provided by these meta-transcriptomes, the prevalence and diversity of viruses in families including pathogens known to cause human and animal infection or are novel in rodent were further confirmed overall by PCR screening of individual lung samples. In total, 211 representative virus strains were selected for genomic or partial genomic sequencing (Fig. 2).
Characteristics of negative-stranded RNA viruses
HanVs. As causative agents of HFRS and HPS, traditional HanVs are a group of segmented negative-stranded RNA viruses with three genome segments (L, M, and S). The approved genus Orthohantavirus of the family Hantaviridae (under the order Bunyavirales) was divided into different groups that are associated with the taxonomy of their hosts[9]. Two of these groups are related to rodents: Murinae-related phylogroup III HanVs and Sigmodontinae- and Arvicolinae-related phylogroup IV HanVs[6, 40]. Here, eleven HanV strains were identified separately in three species (Bandicota indica, Rattus exulans, and Rattus tanezumi) of Murinae from five provinces (Bangkok, Chiang Rai, Kalasin, Loei, and Nan) of Thailand in multiple years (Table S4). Bandicota indica is the main host for these HanVs. The complete genome sequence of these viruses showed more than 93.5% nt identity with each other, suggesting that they belong to the same viral species. The L ORF of this species showed 79%-81.4% nt identity with those of Anjozorobe HanVs detected in Rattus rattus and Eliurus majori from Madagascar in 2014[41], the M and S ORFs of this species showed 95%-96% and 96.1%-99.4% nt identities with those of Thailand viruses found in Bandicota indica of Thailand in 1994 and 2004[42, 43] (Table S5). Phylogenetic trees based on the complete L, M, and S proteins were constructed (Fig. 3). All HanVs identified here were assigned to Murinae related phylogroup III, clustered together and formed a separate clade associated to the clade of Anjozorobe HanVs which is closely related to the lineage of Seoul viruses. Although the exact relationship between these HanVs and Thailand virus cannot be confirmed because the complete L of Thailand virus is not available in GenBank, we could still propose a single species that cover all these HanV strains detected in Thailand from M, S, and partial L based alignments and phylogenetic analysis, and revealed that this HanV species have circulated in diverse Thai provinces in recent decades.
PhleVs. Similar to HanVs, the genus Phlebovirus is another group of linear segmented negative-stranded RNA viruses of the family Phenuiviridae under the order Bunyavirales[44]. Many high virulent PhleVs such as Rift Valley fever virus, Toscana virus, and severe fever with thrombocytopenia syndrome virus are arthropod-borne viruses, they are naturally harbored by ruminant or camel reservoirs, are transmitted by mosquitoes, sandflies, or ticks, and cause severe diseases in humans and animals[45–47]. After mapping sequencing reads of the family Phenuiviridae with various rodent hosts, 22 rodent PhleVs’ sequences were confirmed in samples of Bandicota savilei, Maxomys surifer, Niviventer fulvescens, and diverse Rattus species from eight Thai provinces and two Laotian provinces (Table S4). Rattus species are the main hosts of PhleVs. The sequenced L segments of these viruses showed less than 71.2% nt identity with all other known PhleVs, which suggested that rodent PhleVs may represent novel species (Table S6). L-based phylogenetic analysis revealed two distinct novel lineages of rodent PhleVs in the genus Phlebovirus, lineage 1 located next to Uukuniemi PhleVs, and lineage 2 located next to Rift Valley Fever and Salehabad PhleVs (Fig. 4). Different PhleVs identified from the Rattus genus in different locations showed very close genetic relationships and can be further divided into two clades under lineage 1. However, some PhleVs of diverse rodent species shared high sequence identities and close genetic relationships. For example, RtRsp-PhenV/Tt2018, RtMs-PhenV/Ts2013, and RtBs-PhenV/Tl2009 identified in three different rodent genera were closely related to each other in lineage 1, as well as the close relationship between RtBs-PhenV/Tp2006 and RtRt-PhenV/Lv2015 in lineage 2.
AreVs. Rodent AreVs of the genus Mammarenavirus under the family Arenaviridae can be divided into the Old-World complex and the New-World complex[12]. They are a group of linear segmented negative-stranded RNA viruses with two genome segments (L and S). Many rodent AreVs are confirmed or suspected to be zoonotic and can cause severe human hemorrhagic fever and related diseases[5, 13, 14, 48]. A total of nineteen Old-World AreVs strains were identified from Bandicota indica, Mus cookii, Maxomys surifer, and Menetes and Rattus species of five Thai provinces and Cambodian Sihanouk province (Table S4). Rattus species are the main hosts of AreVs. Seven strains shared high sequence similarity and were closely related to Cardamones virus found in Veal Renh of Cambodian in 2009 with 96.1–100% nt identities, 12 strains shared high sequence similarity and were closely related to Loei River virus found in Loei province in Thailand in 2008 with 88.2%-95.1% nt identities[49] (Table S7). According to sequence alignment results, phylogenetic analysis based on L, G, and N proteins suggested that these AreVs were assigned into two different lineages related to AreVs reported in China (Fig. 5). We designated them as Thai-AreV lineage and Cambodian-AreV lineage. These results revealed that Thai-AreV has circulated in five Thai provinces for at least nine years (2010–2018), and Cambodian-AreV circulated in the Sihanouk-related region of Cambodian between 2008–2009. The only exception was RtMsp-AreV/Tu2016, which was a member of Cambodian-AreV lineage but found in Udon Thani province in Thailand in 2016.
Rhabdoviruses (RhaVs). RhaVs are a large group of linear negative-stranded RNA viruses under the family Rhabdoviridae currently with 20 genera[50, 51]. These viruses infect diverse vertebrates, invertebrates, and plants, some of them can cause mild-to-severe diseases such as vesicular stomatitis virus and rabies virus[52, 53]. Here, four RhaVs were identified separately in Bandicota indica, Niviventer fulvescens, and Rattus species from four Thai provinces (Table S4). Unlike previously reported rodent lyssavirus, mokola virus and murine feces-associated rhabdovirus (MuFARV)[54, 55], these four RhaVs showed less than 66.5% nt homology with known Rhabdoviridae members (Table S8). The most closely related virus was Xingshan nematode virus 4, a newly identified RhaV in Spirurian nematodes. Phylogenetic analysis based on the deduced L proteins suggested that these novel rodent RhaVs clustered with Xingshan nematode virus 4 and under the genus Alphanemrhavirus (Fig. 6A).
Paramyxoviruses (ParaVs). The family Paramyxoviridae is a group of enveloped viruses with negative-stranded RNA genomes that are responsible for many mild-to-severe human or animal diseases[31, 56–60]. Twelve rodent ParaVs’ sequences were identified in samples of Bandicota indica, Berylmys bowersi, Leopoldamys neilli, Maxomys surifer, and diverse Rattus species from five Thai provinces and two Cambodian provinces (Table S4). Nine ParaVs were closely related to members of the genus Jeilongvirus with high sequence similarity (75.3–77% nt identities for L), two ParaVs were Mossman virus related (75.8–76.5% nt identities for L), and one ParaV was Sendai virus related (91.9% nt identities for L) (Table S9). L-based phylogenetic analysis revealed that these rodent ParaVs were assigned into the genera Narmovirus, Jeilongvirus, and Respirovirus (Fig. 6B).
Characteristics of positive-stranded RNA viruses
Hepaciviruses, pegiviruses, and pestivirus. The genera Hepacvirus, Pegivirus, and Pestivirus are included within the family Flaviviridae, which includes positive, single-stranded RNA viruses. These viruses can infect a variety of mammalian hosts, including primates, bats, horses, and rodents[61–63]. Hepatitis C virus of the genus Hepacvirus is an important causative agent of human hepatitis and hepatocellular carcinoma[64], and two classic types of pestiviruses, bovine viral diarrhea virus and classical swine fever virus, are important causative agents of mild-to-severe disease in bovine and swine hosts[65, 66]. Here, a total of 51 members under the family Flaviviridae were detected in diverse rodent and soricomorph species from almost all sampling sites across Thailand, Lao PDR, and Cambodia (Table S4). Twenty-eight of them were hepaciviruses with 41.4%-100% nt identities to each other, twenty-two of them were pegiviruses with 42.1%-96.1%% nt identities to each other, and one of them was pestivirus with 75.2% nt identity to known rodent pestivirus (Table S10). Based on polyprotein-based phylogenetic analysis, the 51 novel viruses were assigned into various distinct novel lineages under the genera Hepacivirus, Pegivirus, and Pestivirus (Fig. 7). Several host-specific lineages, including Rattus exulans related lineage of Hepacivirus, and Niviventer fulvescens related lineage and Rattus related lineage of Pegivirus, were observed and suggested that the phylogenies of most of these viruses were strictly congruent with the classification of their rodent or insectivore hosts. This is the first time that hepacivirus and pegivirus (SoSm-HepaV/Cs2009, RtTb-PegV/Tb2018, and SoSp-PegV/Tn2013) are reported in insectivores (soricomorph and tree shrew). SoSm-HepaV /Cs2009 represented a separate hepacivirus clade with less than 44.4% nt identity with known viruses. RtTb-PegV/Tb2018 and SoSp-PegV/Tn2013 represent a separate pegivirus clade with less than 57.6% nt identity with known viruses. These viruses, together with previously reported bat viruses, formed the main evolutionary frames of these two genera.
ArteVs. ArteVs of the family Arteriviridae are a group of enveloped viruses with positive single-stranded RNA genomes, they are responsible for a variety of mild to severe diseases in horses, simians, and swine, such as equine arteritis virus (EAV), simian hemorrhagic fever virus (SHFV), and porcine reproductive and respiratory syndrome virus (PRRSV)[67–70]. A total of 48 ArteVs’ sequences were confirmed in diverse rodent species of almost all sampling sites throughout Thailand, Lao PDR, and Cambodia (Table S4). These viruses shared 55.7%-98.6% nt identities with each other, and less than 66% nt identity with known ArteVs (Table S11). The most similar species are Betaarterivirus and Gammaarterivirus members, PRRSVs and lactate dehydrogenase-elevating virus (LDV), and unclassified rodent ArteVs found previously. However, unlike diverse ArteVs phylogenetically scattered throughout the family Arteriviridae that were identified from rodent pharyngeal and anal samples in our previous study, phylogenetic analysis based on ORF1b and ORF5 revealed that all lung ArteVs found here clustered with each other under a separate clade of the subfamily Variarterivirinae and formed different host-specific lineages, such as Rattus related lineages, Maxomys related lineage, and Bandicota related lineage (Fig. 8A). These lineages represented distinct viral classifications that differ from previously identified genera Betaarterivirus and Gammaarterivirus under the subfamily Variarterivirinae.
CoVs. CoVs are a group of enveloped viruses with a large positive single-stranded RNA genome of the subfamily Coronavirinae. The subfamily Coronavirinae is divided into four approved genera, Alphacoronavirus, Betacoronavirus, Deltacoronavirus, and Gammacoronavirus[71–74]. A large number of rodent CoVs were found in diverse rodent species and were assigned into two separate lineages under Alphacoronavirus and Betacoronavirus previously[25]. However, only nine rodent CoVs were confirmed in lung samples of Bandicota and Rattus species from five Thai provinces and Laotian Champasak province (Table S4). Sequence similarity and phylogenetic analysis revealed that all these CoVs were classified in Embecovirus under the genus Betacoronavirus, with sequence identities between 93.7% and 100% (RdRp) (Fig. 8B and Table S12). Despite our large sample size, alpha-CoV was not found in our samples.
Hepatitis E viruses (HEVs). HEVs of the family Hepeviridae are a group of small, nonenveloped, positive single-stranded RNA viruses. Members of the species Orthohepevirus A are one of the most common causative agents of human hepatitisis, and Rodent-borne Orthohepevirus C was recently reported to be zoonotic and cause human persistent hepatitis[17, 38, 75, 76]. The genome sequences of four HEVs were confirmed in Maxomys surifer of Thai Loei, Rattus losea of Laotian Vientiane province, and Rattus exulans of Cambodian Sihanouk province (Table S4). All were closely related to previously reported rodent HEV, strains Vietnam-105, and patient HEV, strain LCK-3110, with 77.7%-80.7% nt identities in ORF1, and had less than 58.9% homology in ORF1 compared with HEVs from other hosts (Table S13). ORF1-based phylogenetic analysis assigned these HEVs into the species Orthohepevirus C, and closely related to the lineage of Vietnam-105 and LCK-3110, which were suspected to be the causative agent of human persistent hepatitis(Figure 9A).
Picornaviruses (PicoVs). Members of the family Picornaviridae are small, non-enveloped, positive single-stranded RNA viruses. Diverse PicoVs can cause mucocutaneous, encephalic, cardiac, hepatic, neurological and respiratory diseases in a wide variety of vertebrate hosts[77]. Only three PicoVs were confirmed in Bandicota indica of Thai Chiang Rai province, Rattus tanezumi of Laotian Luang Prabang province, and Rattus of Cambodian Sihanouk province (Table S4). Sequence similarity and phylogenetic analysis revealed that all these PicoVs were closely related to known rodent PicoVs with sequence identities between 42% and 88.4% (Table S14 and Figure S10).
Astrovirus (AstroV). AstroVs comprises positive single-stranded RNA viruses,members of the genus Mamastrovirus in the family Astroviridae infect many mammals and cause gastroenteritis[78]. One AstroV was detected in Bandicota savilei of Thai Loei province, this virus shared 82.05% nt identity with AstroVs previously reported in China (Table S16 and Figure S11).
Characteristics of unclassified RNA viruses
Recently viral surveillance studies in invertebrates, amphibians, reptiles, and fishes revealed a new view of the RNA virosphere that is more diverse than the current taxonomy[19, 22]. Here, a total of 24 unclassified RNA viruses were found in lung organs of diverse rodent species from different Thai and Laotian provinces (Table S4). The deduced L proteins of these viruses showed 22.7%-99.9% nt identities with each other, and less than 66.7% nt identity with those of other known RNA viruses (Table S16), suggesting these newly discovered viruses are highly diverse and may be more or less related to known or undefined viral families. To determine further the evolutionary relationships between these viruses, phylogenetic trees were constructed based on L proteins of viral genomes from all related known families, genera, and unclassified taxa of invertebrates, amphibians, reptiles, and fishes. All unclassified RNA viruses found here formed at least 10 distinct lineages (Fig. 9B). Most viruses tended to form different lineages that were phylogenetically consistent with the differences of their host species, such as partit-like-related Maxomys surifer and Bandicota savilei lineages, Rhabdoviridae related Rattus lineage, and Totiviridae related Rattus lineage. These data revealed that RNA viruses in rodent species could occupy a broader range of phylogenetic diversity, similar to the RNA viral spectrum observed in invertebrates.