Coronaviruses with a SARS-CoV-2-like receptor- binding domain allowing ACE2-mediated entry into human cells isolated from bats of Indochinese peninsula

The animal reservoir of SARS-CoV-2 is unknown despite reports of various SARS-CoV-2-related viruses in Asian Rhinolophus bats, including the closest virus from R. affinis, RaTG13. Several studies have suggested the involvement of pangolin coronaviruses in SARS-CoV-2 emergence. SARS-CoV-2 presents a mosaic genome, to which different progenitors contribute. The spike sequence determines the binding affinity and accessibility of its receptor-binding domain (RBD) to the cellular angiotensin-converting enzyme 2 (ACE2) receptor and is responsible for host range. SARS-CoV-2 progenitor bat viruses genetically close to SARS-CoV-2 and able to enter human cells through a human ACE2 pathway have not yet been identified, though they would be key in understanding the origin of the epidemics. Here we show that such viruses indeed circulate in cave bats living in the limestone karstic terrain in North Laos, within the Indochinese peninsula. We found that the RBDs of these viruses differ from that of SARS-CoV-2 by only one or two residues, bind as efficiently to the hACE2 protein as the SARS-CoV-2 Wuhan strain isolated in early human cases, and mediate hACE2-dependent entry into human cells, which is inhibited by antibodies neutralizing SARS-CoV-2. None of these bat viruses harbors a furin cleavage site in the spike. Our findings therefore indicate that bat-borne SARS-CoV-2-like viruses potentially infectious for humans circulate in Rhinolophus spp. in the Indochinese peninsula.


Introduction
The origin of SARS-CoV-2, as its mode of introduction into the human population, is currently unknown.
Since its emergence, numerous animal species have been studied to identify possible reservoirs and/or intermediate hosts of the virus, including a large diversity of insectivorous bats of the genus Rhinolophus. Despite the recent report of various SARS-CoV-2-related viruses in Rhinolophus shameli (isolated in Cambodia in 2010 1 ), R. pusillus and R. malayanus (China, 2020 and 2019 respectively 2 ), in R. acuminatus (Thailand, 2020 3 ) and R. cornutus (Japan, 2013 4 ), the closest SARS-CoV-2 bat-borne genome still remains the one from R. a nis, RaTG13 (China, 2013) 5,6 . Several studies also suggested the role of pangolin-related coronaviruses in the emergence of SARS-CoV-2 [7][8][9] . Since its appearance in humans, SARS-CoV-2 has evolved only through sporadic mutations, some of which correspond to gains in tness allowing the virus to spread more widely, or to escape neutralizing antibodies 13 . To decipher the origin of SARS-CoV-2, it is therefore essential to ascertain the diversity of animal coronaviruses and more speci cally of bat coronaviruses. Although the identi cation of SARS-CoV-2 in bats is a major goal, it may be unattainable.
A more realistic objective is to identify the sequences that contribute to its mosaicism. Among these, the spike sequence appears essential, as it determines the binding a nity and accessibility of the receptorbinding domain (RBD) to the cellular angiotensin-converting enzyme 2 (ACE2) receptor, and is therefore responsible for host range [10][11][12] . The closest related bat strain identi ed to date (RaTG13) has a low RBD sequence similarity to SARS-CoV-2 and, with only 11/17 human ACE2 (hACE2) contact amino-acid residues conserved with SARS-CoV-2, its a nity to hACE2 is very limited 14 . Moreover, SARS-CoV-2 poorly infects bats and bat cells tested so far 15 . In addition, no SARS-CoV-2-like virus has been shown to use hACE2 to enter human cells, and none presents the furin cleavage site that is associated with an increased pathogenicity in humans 16 . SARS-CoV-2 RBD binds to Rhinolophus macrotis ACE2 with a lower a nity than to hACE2 17 . An essential piece of information, i.e. nding bat viruses with an RBD motif close to that of SARS-CoV-2 and capable of binding to hACE2 with high a nity, is therefore missing.
We hypothesized that this type of virus could be identi ed in bats living in the limestone karstic terrain common to China, Laos, and Vietnam in the Indochinese peninsula. We report here the presence of sarbecoviruses close to SARS-CoV-2 whose RBDs differ from that of SARS-CoV-2 by only one or two residues, that strongly bind to the hACE2 protein, and mediate a hACE2-dependent entry in human cells. Despite the absence of the furin cleavage site, these viruses may have contributed to SARS-CoV-2's origin and may intrinsically pose a future risk of direct transmission to humans.

Diversity of bats and coronaviruses in caves
A total of 645 bats belonging to six families and 46 species were captured (Table S1). Two hundred and forty-seven blood samples, 608 saliva, 539 anal/feces, and 157 urine swabs were collected from the northern part of Laos (Table S2). We rst attempted to amplify part of the RNA-dependent RNA polymerase gene from all feces samples (n = 539) through a pan-coronavirus RT nested PCR approach 18 .
Ampli cation products were obtained from 24 individuals of 10 species, and one individual (BANAL-27) was concomitantly infected by an alphacoronavirus and a betacoronavirus (Table S3). BLAST analysis of obtained sequences identi ed alphacoronavirus sequences of the Decacovirus, Pedacovirus, and Rhinacovirus subgenera and betacoronavirus sequences of the Nobecovirus and Sarbecovirus subgenera. Sequences of the Sarbecovirus subgenus were all identi ed from Rhinolophus individuals belonging to three different species i.e., R. malayanus, R. marshalli, and R. pusillus. Positive individuals were trapped in three different districts, and those infected with a sarbecovirus were all from the Fueng district in Vientiane Province (Fig. 1A, site 1).
The complete genome sequence of ve of the seven sarbecoviruses identi ed was then obtained by nextgeneration sequencing (Fig. 1). The coverage of the genome of the remaining two sarbecoviruses, i.e. BANAL-27 and BANAL-242 sampled from R. pusillus and R. malayanus bats respectively, was 90% and therefore they were not included in the nal analyses. Phylogenetic analyses performed on the complete genome sequence of lineages A and B human SARS-CoV-2 19 , and on representative bat and pangolin sarbecoviruses, placed the Laotian R. malayanus BANAL-52, R. pusillus BANAL-103, and R. marshalli BANAL-236 coronaviruses close to human SARS-CoV-2 and R. a nis RaTG13 coronaviruses, while R. malayanus BANAL-116 and BANAL-247 coronaviruses belonged to a sister clade with other bat coronaviruses (RmYN02, RacCS203, RpYN06, and PrC31) from different Rhinolophus species. Pangolin coronaviruses possessed a basal position relative to these strains (Fig. 1B). Interestingly, one should note that very similar SARS-CoV-2-like viruses are shared by different bat species, suggesting a possible circulation of viruses between different species living sympatrically in the same caves. These results are consistent with the similarity plot analysis showing that RaTG13 and BANAL-52 bat coronaviruses exhibit a high nucleotide identity with human SARS-CoV-2 throughout the length of the genome. Interestingly, BANAL-52 presents a higher level of nucleotide conservation than RaTG13 in the S1 domain of the spike, and especially in the spike's N-terminal domain (NTD) and RBD (Fig. 1C). These observations are congruent with amino-acid identities between human SARS-CoV-2 and representative bat and pangolin coronaviruses which present a high level of conservation, except for ORF8 of bat BANAL-116, BANAL-247, Rc-o319 and RmYN02. Interestingly, the S1 domain of the spike (and especially the N-terminal domain) presents a lower degree of conservation in several bat coronaviruses, suggesting that this domain may re ect a relative degree of adaptation of the virus to its mammalian host (Fig. 1D, Supp. Figure 1).

Recombination events and phylogenetic analyses of sarbecoviruses
Following GARD analysis, we identi ed 14 recombinant breakpoints during the evolutionary history of sarbecoviruses, which were further con rmed by phylogenetic analyses performed on the 15 consecutive fragments of sequences de ned by the breakpoints (Fig. 2, Supp. Figure 2). SARS-CoV-2 presents a mosaic genome, to which more than ve sequences close to sequences published or determined during this study contributed: R. malayanus RmYN02 and R. pusillus RpYN06 viruses found in China in 2019, R. a nis RaTG13 coronavirus found in China in 2013, and R. malayanus BANAL-52 and R. pusillus BANAL-103 found in North Laos in 2020 (this study). No pangolin sequence was immediately associated with a recombination event at the origin of SARS-CoV-2, despite previous hypotheses proposed before the availability of the sequences described in this paper. Laotian Rhinolophus bat coronaviruses presented a lower degree of recombination compared to SARS-CoV-2 and when present, recombination events occurred between other BANAL viruses, in line with the fact that all viruses have been isolated from bats living sympatrically in caves in the same area.
Interestingly, the origin of several fragments of SARS-CoV-2 genomes could be assigned to several donor strains and not a unique donor sequence. For example, a breakpoint was identi ed at the beginning of the RBD region of S1 which revealed that the downstream fragment of SARS-CoV-2, which comprises the RBD and the beginning of S2 domains, could involve BANAL-52 R. malayanus, BANAL-103 R. pusillus, and BANAL-236 R. marshalli viruses, which formed a highly supported sister clade of SARS-CoV-2 (fragment 11, Fig. 2). In a more basal position, R. shameli bat coronaviruses and pangolin-2019 coronaviruses are found. These results are consistent with the conservation of RBD amino-acid sequences among SARS-CoV-2 and representative bat and pangolin coronaviruses (Supp. Figure3).
Among the 17 residues that interact with human ACE2, 16 are conserved between SARS-CoV-2 and BANAL-52 or -103 (one mismatch, H498Q), and 15/17 are conserved for BANAL-236 (two mismatches, K493Q and H498Q) while only 13/17 residues are conserved for the Cambodian bat R. shameli virus and 11/17 for the Chinese bat R. a nis RaTG13 virus. At the full spike protein level, bat R. a nis RaTG13 and pangolin-2017 P4L viruses looked closer to SARS-CoV-2 than bat R. malayanus BANAL-52 but this was shaped by the higher degree of conservation of the S2 domain of the spike. All these viruses shared the same features, such as the absence of a furin cleavage site or the conservation of the internal fusion peptide (Supp. Figure 4).

Interaction of RBDs with ACE2 and functional dynamics
To characterize the interaction of the RBDs of the BANAL-52/103, which are identical for the receptorbinding motif residues (Supp. Figure 3), and BANAL-236 spikes (residues 233-524) with hACE2, we performed biolayer interferometry assays and found that both RBDs display a binding a nity for hACE2 in the low nanomolar range (Fig. 3A), with values comparable to those previously reported for SARS-CoV- To study the effect of the mutations at the interface between these RBDs and hACE2 on their binding a nity, we constructed homology models of the BANAL-236 and BANAL-52/103 RBD/hACE2 complexes using the crystal structure of the SARS-CoV-2 RBD/hACE2 complex as template (PDB id 6M0J). Of note, the amino-acid sequence in the RBD region covered by these models is identical in BANAL-103 and BANAL-52 (Supp. Figure 3). We performed all-atom explicit solvent Molecular Dynamics (MD) simulations of these complexes for a total aggregated time of 9 μs (Table S4). A cluster analysis of the MD trajectories revealed that, at the RBD-hACE2 interface, both BANAL-236 and BANAL-52/103 complexes were identical to the SARS-CoV-2 RBD/hACE2 complex within 2 Å backbone RMSD (Supp. Figure 5). ROSETTA and FoldX empirical scoring functions predicted a similar RBD-hACE2 binding energy in all three complexes (Supp. Figure 6).  Figure 7). The H498Q mismatch present in both BANAL-52/103 and BANAL-236 RBDs negatively affected the formation of hydrogen bonds between RBD Q498 and both hACE2 K353 and Q42. However, in the SARS-CoV-2 complex, these hydrogen bonds were only transiently formed on a microsecond timescale. More persistent hydrogen bonds in this region (RBD T500 -hACE2 D355, RBD G502 -hACE2 K353, and RBD Y505 -hACE2 E37) were not affected. The K493Q mismatch present only in BANAL-236 RBD enabled the formation of two salt bridges between the RBD and hACE2 that were not present in the SARS-CoV-2 complex (RBD K493 -hACE2 E35 and RBD K493 -hACE2 D38, the former being the most persistent).
For further insight into the molecular details of these interactions, we determined the crystal structure of the complex BANAL-236 RBD with the hACE2 peptidase domain to 2.9 Å resolution. The overall structure of the BANAL-236 RBD is identical to that of the SARS-CoV-2 RBD within the accuracy expected at this resolution (RMSD 0.360 Å, 150 Ca). The only signi cant difference is the unwinding of helix H4 that participates in lateral contacts between RBDs in the SARS-CoV-2 spike (Fig. 3C, arrow) which might in uence the dynamics of the opening and closing of the RBDs in the spike.
As expected, most of the interactions observed in the SARS-CoV-2 RBD/hACE2 complex 24 are also present in the structure of the BANAL-236 RBD/hACE2 complex. In these interfaces, there are three main clusters of interactions in which the salt bridge RBD K493 -hACE2 E35 and the hydrogen bond between the side chains of residues RBD H498 and D38 are located in clusters 1 and 2, respectively, as predicted by the MD simulations ( Fig. 3B and C, insets). Although these interactions contribute to stabilizing the complex, they do not seem to affect drastically the binding to hACE2 because both RBDs have a nities similar to hACE2 and are in the same range of a nity of SARS-CoV-2 RBD.

Virus isolation and entry into human cells expressing ACE2
To assess whether the BANAL-236 spike protein could mediate entry into cells expressing human ACE2, we generated lentiviral particles pseudotyped with the Wuhan or the BANAL-236 spikes. We detected spike-mediated entry of the BANAL-236 spike-pseudotyped lentivirus in 293T-ACE2, contrarily to control cells not expressing hACE2 (Fig. 4A). Entry was blocked by human sera neutralizing SARS-CoV-2, but not by control non-neutralizing sera, demonstrating that neutralization of BANAL-236 was speci c for epitopes shared with the spike of SARS-CoV-2 (Fig. 4B).
In order to isolate infectious virus, rectal swabs were inoculated on VeroE6 cells. Cytopathic effect (CPE) and presence of viral RNA were monitored daily. No CPE was observed 3 and 4 days after infection, but viral RNAs were detected for one of the 2 wells inoculated with the BANAL-236 sample (C T = 25.1 at D3, C T = 21.7 at D4). The culture supernatant (C1) formed plaques on VeroE6 and the titer was 3800 pfu/mL.
A C2 viral stock was prepared by ampli cation on VeroE6 at a MOI of 10 -4 . The culture supernatant was harvested at day 4 when CPE was observed and titrated on VeroE6 (Supp. Figure 8). The plaques' phenotype was small, but the titer reached 2.6.10 6 pfu/mL. The random NGS performed on the RNA extracted from this stock con rmed that the culture was pure and corresponded to the BANAL-236 virus.
No non-synonymous mutations were observed between the original BANAL-236 genome and the C2 viral stock.

Discussion
Many sarbecoviruses circulate in Rhinolophus colonies living in caves in China and probably also in neighboring countries further south (Laos, Myanmar, Thailand, and Vietnam) [25][26][27] . During the course of a prospective study in Northern Laos, we have identi ed in 10 bat species 25 different coronaviruses belonging to the Alphacoronavirus and Betacoronavirus genera. We then focused our study on the ve sarbecoviruses for which we obtained full-length sequences. Among these, three (BANAL-52, -103, and -236) were considered to be close to SARS-CoV-2 because of the similarity in amino acids of one of the S domains (S1-NTD, S1-RBD, or S2) with the homologous domain of SARS-CoV-2. The similarity plot analysis revealed that the evolution history of SARS-CoV-2 is more complex than expected and that R. a nis RaTG13, isolated in Yunnan in 2013, is no longer considered the proximal ancestor of SARS-CoV-2: strains close to R. pusillus RpYN06, R. malayanus RmYN02, and Rhinolophus sp. PrC31 isolated in China in 2018-2019, along with R. malayanus BANAL-52, R. pusillus BANAL-103, and R. marshalli BANAL-236 isolated in Laos in 2020, contributed to the appearance of SARS-CoV-2 in different regions of the genome. No closer viral genome has yet been identi ed as a possible contributor, and pangolin coronaviruses appear as more distantly related than bat coronaviruses.
Because genomic regions subject to recombination are likely contributing to host-virus interactions and adaptation following spillover events, we compared SARS-CoV-2 strains from the two lineages identi ed at the very onset of the COVID-19 outbreak 19 to these novel bat sarbecoviruses and to pangolin strains within the SARS-CoV-2 clade. We thus identi ed potential recombination sites, allowing for the reconstruction of the phylogenetic history of early isolated SARS-CoV-2 strains between homologous regions de ned by recombination points. The interaction of the SARS-CoV-2 spike with hACE2 is a key event in cell infection. The spike is divided into two subunits, S1 and S2: S1 contains an RBD that speci cally binds to hACE2, whereas the S2 subunit contains the fusion peptide. Regarding the spike, we identi ed a breakpoint at the beginning of the SARS-CoV-2 RBD, resulting in a downstream fragment composed of the RBD, the furin cleavage site, and ending in the N-terminal region of S2. Despite the absence of the furin site in these novel bat sarbecoviruses, phylogenetic reconstruction of this fragment, key for the virus tropism and host spectrum, revealed that Laotian R. malayanus BANAL-52, R. pusillus BANAL-103, and R. marshalli BANAL-236 coronaviruses are the closest ancestors of SARS-CoV-2 known to date. Identi cation of strains of animal origin with a furin cleavage site may require additional sampling. As seen by others, ORF8 was highly divergent between SARS-CoV-2 related genomes. ORF8 from strains BANAL-52, -103, -236, like that of RaTG13, were closer to SARS-CoV-2 than to pangolin strains. ORF8 encodes a protein that has been proposed to participate in immune evasion 28 . It is noteworthy that ORF8 was deleted in many human SARS-CoV-2 strains that appeared after March 2020 29 , which is reminiscent of the deletions identi ed during the 2003 SARS epidemic 30 . Therefore, ORF8 can be a marker of SARS-CoV-2 adaptation to humans and its presence in bat strains is consistent with bats acting as a natural reservoir of early strains of SARS-CoV-2.
Structural and functional biology studies have identi ed the RBD domain that mediates the interaction with hACE2, as well as the major amino acids that are involved 24 . The host range is dependent on this interaction 31,32 . We show that the spike of SARS-CoV-2 is a mosaic of sequences close to the following bat viruses: S1-NTD (BANAL-52 and RaTG13), S1-RBD (BANAL-52, -103, and -236), and S2 (BANAL-52, -236, -103, and RaTG13). Notably, the RBDs (BANAL-52, -103, and -236) are closer to SARS-CoV-2 than that of any other bat strain described so far. Overall, one (H498Q (strains BANAL-103 and -52)) or two (K493Q and H498Q (strain BANAL-236)) amino acids interacting with hACE2 are substituted in these strains in comparison to SARS-CoV-2. These mutations did not destabilize the BANAL-236 / hACE2 interface, as shown by the BLI experiments (Fig. 3A).
Our results contribute to understanding the origin of SARS-CoV-2: they show that sequences very close to those of the early strains of SARS-CoV-2 responsible for the pandemic exist in nature and are found in several Rhinolophus bat species. The RBDs of the viruses found in our study are closer to that of SARS-CoV-2 than to the RaTG13 RBD, the virus identi ed in R. a nis from the Mojiang mineshaft where pneumonia cases with clinical characteristics strikingly similar to COVID-19 6 were recorded in 2012 33,34 .
We found here sarbecoviruses with RBDs closest to that of SARS-CoV-2 in three different bat species, R. marshalli, R. malayanus, and R. pusillus. Our results therefore support the hypothesis that SARS-CoV-2 could originally result from a recombination of sequences pre-existing in Rhinolophus bats living in the extensive limestone cave systems of South-East Asia and South China 35 , which provides ideal conditions for interspecies interactions among Rhinolophus bats. They are restricted to limestone caves for their roosting sites and forage in the vicinity of these caves, and many species have been found foraging in the same cave areas, including R. malayanus and R. pusillus 36 . In addition, the distribution of R. marshalli, R. malayanus, and R. pusillus overlaps in the Indo-Chinese subregion, which means they may share caves as roost sites and foraging habitats 37 .
The pangolin has been suspected as being an intermediate host of SARS-CoV-2. The pangolin CoV-2017 and -2019 genomes have a high overall protein identity with SARS-CoV-2 and RaTG13 (up to 100% for certain proteins). In particular, RBD was highly conserved between pangolin-CoV-2019 and SARS-CoV-2. On the other hand, the amino-acid sequence identity of S1-NTD is only 63.1% identical between pangolin-CoV-2017 and SARS-CoV-2 7 . With the novel viruses here described, the pool of sequences found in Rhinolopus spp. allows the reconstitution of a genome su ciently close to that of SARS-CoV-2 without the need to hypothesize recombination or natural selection for increased RBD a nity for hACE2 in an intermediate host before spillover 38 , nor natural selection in humans following spillover 39 . However, we found no furin site in any of these viruses on sequences determined from original fecal swab samples, devoid of any bias associated with counterselection of the furin site by ampli cation in Vero cells 16 . Lack of furin cleavage may be explained by insu cient sampling in bats, or by acquisition of the furin cleavage site through passages of the virus in an alternate host or during an early poorly symptomatic unreported circulation in humans. Finally, where these intergenomic recombinations arose and the epidemiological link with the rst human cases remains to be established.
As expected from the high a nity for ACE2 of the S ectodomain of the BANAL-236, pseudoviruses expressing it were able to enter e ciently human cells expressing hACE2 using an ACE2-dependent pathway. Entry was blocked by a serum neutralizing SARS-CoV-2. The RaTG13 strain, the closest to SARS-CoV-2 known before, had never been isolated. In contrast, preliminary studies show that BANAL-236 replicated in primate VeroE6 cells with a small plaque phenotype compared to SARS-CoV-2. Further analysis may indicate more clearly whether post-entry steps also shape infectivity.
To conclude, our results pinpoint the presence of new bat sarbecoviruses that seem to have the same potential for infecting humans as early strains of SARS-CoV-2. People working in caves, such as guano collectors, or certain ascetic religious communities who spend time in or very close to caves, as well as tourists who visit the caves, are particularly at risk of being exposed. Further investigations are needed to assess if such exposed populations have been infected by one of these viruses, if these infections are associated with symptoms, and whether they could confer protection against subsequent SARS-CoV-2 infections. In this context, it is noteworthy that SARS-CoV-2 with the furin site deleted replicates in hamsters and in transgenic mice expressing hACE2, but leads to ablated disease while protecting from rechallenge with wild-type SARS-CoV-2 16 .

Ethical and legal statements
The bat study was approved by the wildlife authorities of the Department of Forest Resource Management (DFRM), and the Ministry of Agriculture and Forestry, Lao PDR, No. 2493/DFRM, issued on May 21, 2020; No. 0755/MAF issued on June 2, 2020. All animals were captured, handled, and sampled following previously published protocols and ASM guidelines 40,41 .
Bat sampling areas and sample collection Trapping sessions were conducted on four sites, in Fueng and Meth Districts, Vientiane Province, and in Namor and Xay Districts, Oudomxay Province, between July 2020 and January 2021 (Fig. 1A, Tables S1 and S2). Bats were captured using four-bank harp traps 42 and mist nets set in forest patches between rice elds/orange/banana plantations and karst limestone formations, for 5-8 nights depending on accessibility. Harp traps were set across natural trails in patches of forest understory. Mist nets were set across natural trails, at the edges of forests, at entrances of caves, and in areas near cave entrances, as well as in open areas or those with high forest canopy. Bats were morphologically identi ed following morphological criteria [42][43][44] . Other data such as forearm length (FA), sex, developmental stage (adult or juvenile) and reproductive condition (pregnant or lactating) were also recorded. Bats were sampled for saliva, feces and/or urine, and blood before release at the capture site. Species identi cation of PCRpositive individuals was con rmed by sequencing the mitochondrial Cytochrome oxidase 1 26 .

Initial Coronavirus screening
Total RNA was extracted from feces samples and the presence of coronaviruses was tested by PCR targeting the RNA-dependent RNA polymerase gene using combinations of degenerate and nondegenerate consensus primers followed by amplicon sequencing (see Supplementary Methods).

Next-Generation Sequencing
Betacoronavirus enrichment was performed before sequencing at the reverse transcription step by adapting a previously described protocol (Supplementary Methods) 45 . For samples with a low nucleic acid content, a random ampli cation step was then performed using the MALBAC Single Cell WGA kit (Yikon Genomics, Promega). Libraries were generated using the NEBNext Ultra II DNA Library Prep kit (New England Biolabs) and run on the Illumina NextSeq500 platform with High Output Kit v2.5 (150 Cycles). In addition to enrichment-based sequencing, cDNA was ampli ed using the AmpliSeq for Illumina SARS-CoV-2 Research Panel (cat# 20020496), applying twenty-six ampli cation cycles in PCR1 and nine cycles in PCR2. Raw reads from the enrichment-based sequencing were processed with an in-house bioinformatics pipeline (Microseek; Bigot et al., submitted). Sequences identi ed as Sarbecovirus were then mapped onto appropriate reference sequences using CLC Genomics Workbench 20.0 (Qiagen). Trimmed reads from the amplicon sequencing were mapped to the SARS-CoV-2 genome rst, then mapped again (re ned mapping) to the closest genome relative. When needed, complete genomes were obtained by conventional PCR and Sanger sequencing (see Supplementary Methods).

Recombination and phylogenetic analyses
Identi cation of recombination events occurring during the evolutionary history of bat sarbecoviruses was performed by the IDPlot package 46 , a web-based work ow that includes multiple sequence alignment and phylogeny-based breakpoint prediction using the GARD algorithm from the HyPhy genetic analysis suite 47 . Coordinates for the breakpoints were con rmed by performing phylogenetic analyses on the corresponding fragments using PhyML implemented through the NGPhylogeny portal 48 . Branch support was evaluated with the aBayes parameter (see Supplementary Methods).

Virus isolation
Rectal swabs were inoculated in duplicate in 24-well plates containing VeroE6 cells. Three and four days after infection, a cytopathic effect (CPE) was monitored and 100 µL of supernatant was collected for RNA extraction. RT-qPCR targeting the E gene was performed as described 51 . Culture supernatant (C1) was harvested at day 4 and titrated by plaque assay on VeroE6. A viral stock was prepared by ampli cation on VeroE6 cells. Culture supernatant (C2) was harvested at day 4 when massive CPE was observed and titrated. RNA was extracted from the viral stock and submitted to random NGS analysis. Raw reads were processed with Microseek pipeline, as described above.
For crystallization experiments, the same constructs were expressed in Expi293F GnTI cells. The protein tags were cleaved overnight with thrombin and deglycosylated with EndoH. The RBD was mixed with a 1.3 molar excess of hACE2 and the complex was puri ed by SEC.

Biolayer Interferometry
Puri ed Avi-tagged RBD was biotinylated using a BirA biotin-protein ligase kit according to manufacturer's instructions (Avidity). The biotinylated RBDs at 100 nM were immobilized to SA sensors. isolates were collected from the same site (site 1). (B) Phylogenetic analysis of the complete genome of Laotian and representative human, bat, and pangolin sarbecoviruses. The complete sequences (UTR excluded) were aligned with MAFFT56 in "auto" mode, and maximum likelihood phylogenetic reconstruction was performed with PhyML implemented through the NGPhylogeny portal48. Branch support was evaluated with the aBayes parameter. Accession numbers and bat species are speci ed in the name of the sequences. Sequences are colored according to Fig. 1C. (C) Similarity plot analysis of Laotian and representative bat and pangolin sarbecoviruses based on the full-length genome sequence of SARS-CoV-2 human prototype strain (NC_045512, Wuhan-Hu-1) used as reference. The analysis was performed with the Kimura-2 parameter model, a window size of 1,000 base pairs, and a step size of 100 base pairs with SimPlot program, version 3.5.157. (D) Heatmap of identities at the protein level of representative human, bat, and pangolin sarbecoviruses compared to human SARS-CoV-2 lineage B (NC_045512, Wuhan-Hu-1). Spike protein has been divided into functional domains, and the sequences are ordered according to percentage of identity of the RBD domain. "*": absence of a functional ORF10 in Thai bat RacCS203 (accession number MW251308). Heatmap was created using the gplots package in R (version 3.6.3). possible, the closest viral sequence is indicated for each fragment. In other cases, "MULT" (group of multiple sequences) is mentioned. "$": unresolved fragment phylogeny (fragment 13, from positions 27344 to 27800 in the alignment). Sequences are colored as in Fig. 1. The complete phylogenetic analyses are presented in Supp. Figure 2.