Microbiome single cell atlases generated with a commercial instrument

Single cell sequencing is useful for resolving complex systems into their composite cell types and computationally mining them for unique features that are masked in pooled sequencing. However, while commercial instruments have made single cell analysis widespread for mammalian cells, analogous tools for microbes are limited. Here, we present EASi-seq (Easily Accessible Single microbe sequencing). By adapting the single cell workflow of the commercial Mission Bio Tapestri instrument, this method allows for efficient sequencing of individual microbes’ genomes. EASi-seq allows thousands of microbes to be sequenced per run and, as we show, can generate detailed atlases of human and environmental microbiomes. The ability to capture large shotgun genome datasets from thousands of single microbes provides new opportunities in discovering and analyzing species subpopulations. To facilitate this, we develop a companion bioinformatic pipeline that clusters microbes by similarity, improving whole genome assembly, strain identification, taxonomic classification, and gene annotation. In addition, we demonstrate integration of metagenomic contigs with the EASi-seq datasets to reduce capture bias and increase coverage. Overall, EASi-seq enables high quality single cell genomic data for microbiome samples using an accessible workflow that can be run on a commercially available platform.


Introduction
A microbiome comprises the collection of distinct microorganisms and their genomic elements within a particular environment.These microecosystems play fundamental roles in the biosphere, have major impacts on human health, and are important resources for scienti c and economic progress 1,2 .Thus, the study of microbiomes -in terms of the species present, the genes employed by different members to thrive, and the molecules consumed or produced -is of high scienti c value.Historical methods of research rely on passive observation with microscopy 3 , which predominantly yield information about phenotypes and behavior.To reveal functional properties, assays can be conducted on microbes isolated from the environment 4,5 ; however, the need to culture microbes isolated from environment form many of these studies imposes a signi cant bias or precludes any detection 6 .Thus, culture-independent methods to pro le microbes as a function of species, function, and gene markers are immensely valuable.Amplicon sequencing of 16S ribosomal RNA (rRNA) genes or other diagnostic marker genes has been widely used for classifying microbial community composition.Despite its convenience, amplicon sequencing suffers from PCR bias and can have limited resolution in discriminating closely related species or strains of the same species 7 .Metagenomic sequencing, which directly sequences all genomic DNA within an environment, enables both the pro ling of phylogenetic diversity and the comprehensive accounting of all the genes present within a microbiome 8 .However, because the data is acquired as a pool of mixed sequencing reads originating from all organisms, the bioinformatic reassembly requires sophisticated computational algorithms for assembly and sometimes yields disconnected genomic fragments 9 .Although long read sequencing 10 and/or read cloud alogrithems 11 can generate relatively long genomic assemblies, associating separate chromosomes or chromosomes and extrachromosomal elements (e.g., plasmids, non-integrating phages, Borgs 12 ) with a single cell type can still be challenging with current methods.Characterizing these associations can be critical to understanding the behavior of a microbiome and genetic ow.While approaches like genome-resolved metagenomics 13 and chromosome conformation capture 14 can obtain them in some circumstances, these methods are biased towards the more abundant species/extrachromosomal elements or when the elements are not in proximity with the genome molecule, a prerequisite for resolving them with short-read assembly.
A microbiome consists of heterogeneous single cell populations 15 .Thus, just as single cell sequencing has transformed mammalian cell biology by resolving heterogeneous systems and tissues into their composite cell types 16,17 , similar impacts are possible in microbiology.Of the possible methods, single cell genomics is perhaps most important for microbiology because of the signi cant genetic heterogeneity and frequent transfer of genetic material 18 .Genetic mobile elements can be a source of important phenotypes, including virulence factors or resistance genes that can transform a normally harmless commensal into a multidrug resistant pathogen 19,20 .Methods to analyze all genomic information of a cell, including DNA not physically connected to the chromosome, would allow characterization of these vital mobile elements.Towards this objective, there has been signi cant effort to develop single microbe sequencing.Previous approaches have been based on isolating microbes for single cell library preparation using FACS [21][22][23][24][25][26][27][28] , optical tweezers 29 , hydrogel matrix embedding 30 , and micro uidics 31 .These methods have limited throughput, allowing just hundreds of genomes to be sequenced.More recently, barcoding reminiscent to scalable mammalian cell methods have been applied to microbes and achieved the sequencing of similar numbers of cells (>10,000 cells/run) 32,33 .These multi-step droplet micro uidic approaches utilize robust molecular biology, yielding superb data for most cell types in the sample 32,33 ; unfortunately, the number of steps and custom-built instrumentation poses a signi cant barrier to non-micro uidic engineers for its application.Another single step droplet micro uidic approach enables sequencing multiple gene loci of thousands of single bacteria cells were reported 34,35 , however, its application is limited to only a few amplicons.Meanwhile, high-throughput single bacteria RNA sequencing has been demonstrated using combinatorial indexing 36,37 and commercially available single cell platforms 38 .However, these methods have only been used for model organisms and have never been applied to a complex microbiome, in which the diverse physical properties of microbes make optimization of the requisite xation, permeabilization, and in situ ligation di cult.Thus, currently, there is no tool available to the microbiological community for e cient single cell genome sequencing of microbiomes.If such a method could be developed, it would be superior to metagenomic sequencing in most instances and provide access to capabilities currently missed, including generation of complete single-microbe resolution cell atlases and gene annotation at the strain or single cell level.
In this paper, we describe EASi-seq (Easily Accessible Single microbe sequencing), a method to e ciently sequence tens of thousands of microbes.Rather than relying on custom micro uidic instrumentation as in previous methods 32,33 , we start from a commercially available work ow with the inherent capabilities for single cell sequencing: Mission Bio's Tapestri 39 .This instrument is widespread in clinical and academic centers, easy to use, and reliable.A major impediment is that the instrument is designed for targeted DNA sequencing of mammalian cells and is not directly applicable to microbial cell whole genome sequencing, which requires different lysis strategies and nucleic acid adaptors.To address this, we introduce two key modi cations into the commercial work ow: a bulk single cell nucleic acid puri cation step that addresses cell lysis and adapter tagmentation 40 protocol that enables whole genome sequencing.With these modi cations, the Tapestri generates barcoded sequence data for several thousand microbial cells which, as we show, can comprise bacteria, archaea, and fungal linages.
Our sequencing is untargeted, capturing sequence data across each cell's genome and any other DNA present in or on the cell.Captured data includes sequences from mobile elements and viruses.To facilitate analysis of the single cell sequencing data, we develop a companion bioinformatic pipeline that clusters cells into similarity groups, annotates their genes and species, and pools sequences within a cluster to increase improve genome assembly and coverage.Using EASi-seq, we generate detailed atlases of a control synthetic community and real-world fecal and environmental microbiomes.We show that EASi-seq's single cell resolution allows differentiation of microbial strains with 99% genomic similarity.EASi-seq provides a universal approach for deconvoluting microbiomes into the cells of which they are composed and to characterize their gene and pathway functions.

Results
EASi-seq work ow for whole genome microbial sequencing A platform to reliably sequence large numbers of environmental microbes must overcome technical and practical challenges.Different microbes have different cell wall and membrane properties and, thus, can require a mixture of different lysis reagents or procedures 32,41,42 .Additionally, genomic and plasmid DNA must be fragmented and have the correct adaptors added prior sequenced.Lastly, some microbiomes are highly heterogeneous, having hundreds to thousands of distinct species that potentially include many strains.Generating a complete single cell atlas in this scenario requires sequencing signi cant numbers of single cells.Prior methods to overcome these challenges used custom work ows with 3 to 5 micro uidic processing steps 32,33 .Each device had to be custom fabricated and operated by micro uidic experts.While the works demonstrate the power of high throughput single microbe sequencing, the inaccessibility of these work ows precludes their use by microbiologists lacking micro uidic expertise.
Recently, several commercial single cell instruments have become available that support processes like the ones required for microbial whole genome sequencing (Table S1).Of these, Mission Bio's Tapestri is unique in the ability to conduct two subsequent droplet steps as a result of being designed for targeted DNA sequencing of mammalian cells 39 .Nevertheless, even with automation of two common micro uidic steps, directly replicating prior microbe sequencing work ows on Tapestri is not possible.Thus, a major innovation of this work is to develop a microbe sequencing work ow that maps onto Tapestri's two-step process.
To enable single microbe sequencing, the cell must be lysed, the DNA fragmented into readable lengths, and the fragments labeled with single cell barcodes.With Tapestri's two step work ow, we can use the rst droplet manipulation stage to perform DNA tagmentation, and the second for barcoding.The challenge is lysing the cells to prepare the genomes for tagmentation, while keeping the genomes and extrachromosomal DNA together.A proven approach to accomplish this is to use a micro uidic device to encapsulate single cells in hydrogel spheres before lysing the cells with bulk washes.Because genomic DNA is a high molecular weight polymer, it remains ensnared within the hydrogel matrix and is protected from the shear forces generated by washing, allowing intact genome puri cation 23,[26][27][28]32,43 . The rquirement of micro uidics for cell encapsulation, however, would negate the primary advantage of EASi-seq's accessibility.Thus, another core element of this work has been to develop a micro uidic-free process for genome puri cation in hydrogels.In the approach, we encapsulate the cells in hydrogel droplets by emulsi cation through vigorously shaking or shearing through a syringe needle (Fig. 1a).In either case, the process generates an emulsion in which the cells are randomly loaded.The droplets of polyacrylamide monomer are gelled by radical polymerization to ensnare the cells.The resulting hydrogel beads are then transferred into an aqueous carrier for lysis and washing.This process takes 2 hours and uses no micro uidics.The resultant suspension is polydisperse, containing many hydrogel beads too large for the Tapestri (which only accepts cells or beads having a diameter less than 30 µm) or too small to trap a cell.Thus, we use differential centrifugation (Fig. 1b and Fig. S1) to select hydrogel beads sizes within the optimal 5-30 µm diameter range.To ensure a high probability of single cell genomes, the initial cell concentration is set such that hydrogels of this size are loaded at a rate of 2%.To purify the genomes, we perfuse the hydrogel beads with cocktails comprising polysaccharide digesting hydrolases and proteolytic enzymes 32 .The result is a suspension of hydrogel beads with intact single cell genomes that have similar physical and hydrodynamic properties to mammalian cells.These beads can be readily processed with the Tapestri (Figs. 1c-d).To ensure single cell sequence data, most of the hydrogels are left empty, such that about 10% contain single cells, in accordance with Poisson statistics 44 .Thus, when loading the gels into Tapestri, we set the concentration to about 5 gels per droplet, which yields 10% containing one genome and 90% containing no cells, thereby yielding single cell data, and making e cient usage of the barcoding droplets.
Normally, the Tapestri's rst step is used to encapsulate and lyse the cells.Since our cells are already lysed in the gels, we can use this module for tagmentation instead.To maximize tagmentation e ciency, the genomes must be released from the hydrogel beads.Controlled release is accomplish by utilizing N,N'-bis(acryloyl)cystamine (BAC) as the hydrogel crosslinker, which can be reversed on-demand with dithiothreitol (DTT) addition 45 .The Tapestri's dual-inlet design for the rst step allows DTT addition with Tn5 transposase, such that the hydrogels liquify upon droplet encapsulation (Fig. 1d, top module and Fig. S2).The Tn5 transposase used for tagmentation is loaded with forward adaptors matching Tapestri's V2 barcoding primer 3' constant region (Table S2), allowing the tagmented fragments to be barcoded in the subsequent droplet PCR (Fig. S3).At this point, genomic DNA is released and tagmented in each newly formed droplet.Barcoding is accomplished by droplet reinjection and merging with the needed barcoding PCR reagents in the Tapestri's second step (Fig. 1d, bottom module).After the barcoding PCR, the nal sequencing adaptors are added by pooling the amplicons of all droplets and using a bulk PCR (Fig. S3).
The resultant material is sequenced and computationally deconvoluted into single cells by barcode (Fig. 1e).The datasets contain tens of thousands of single cell genomes with coverages ranging 0.01-10% depending on genome size and sequencing depth.The genomes can be clustered into a single cell atlas (Fig. 1f, i.-ii.).The data for all single cells in each cluster can be pooled to create a consensus genome.Metagenomic sequencing data can be integrated to increase coverage.The nal genus clusters can be annotated and evaluated for features of interest, including species or strain abundance and gene or pathway distributions (Fig. 1f, iii.-iv.).

Validation of single cell resolution
For EASi-seq to be useful, it must generate barcoded single cell sequence reads.To validate this capability, we used EASi-seq to analyze the synthetic ZymoBiomics microbial community, consisting of eight bacteria and two yeasts (Fig. 2a, Table S3).We process the community using EASi-seq, generating 238,362,515 paired-end reads after ltering by barcode read count and alignment rate (Fig. 2b-c).This yields 1835 barcode groups with an average of 71,684 reads (ranging from 1000 to 1,931,407 bp worth of data).To assess single cell resolution, we map the reads in each group to the ten reference genomes and plot the fraction mapping to the dominant species.We nd that 86.16% of barcodes have a purity of >90% (Fig. 2d) and dominantly align to one species (Fig. S4, Table S4).These results demonstrates that EASi-seq achieves single cell resolution.For the shallow sequencing applied, the average coverage is 0.44% for bacterial genomes and 0.031% for the larger yeast genomes (Fig. 2e).The coverage of most single cell barcode groups remains unsaturated at 10,000 reads (Fig. S5-6).The comparison of EASi-seq with metagenomic sequencing indicates gram-negative bacteria are poorly represented within EASi-seq barcode groups (Fig. 2f).For example, we identify only four P. aeruginosa cells with a total read count of 34,021.This result is consistent with a previous report 46 and is caused by the ZymoBIOMICS synthetic community inactivating buffer (DNA/RNA Shield TM ) pre-lysing gram-negative bacteria.

Reference-independent clustering of unknown cell types
When applying EASi-seq to a novel microbiome, reference genomes are usually not available for mapping and species assignment of the single cell datasets.Thus, to build a genome atlas that displays all cells in a sample, we require a clustering algorithm not reliant on prior knowledge of the species present.In addition, many single cell genomes are covered below 1% (Fig. 2e) and comprise short reads that do not overlap with other single cells of the same type in different barcode groups.To enable clustering from such data, we propose the Taxonomic Discovery Algorithm (TDA).In TDA, each barcode group is treated as a metagenomic sample, and its taxonomic abundance is estimated with available taxonomic classi ers.The taxonomic estimations of all barcode groups are then combined into a vector suitable for similarity clustering.We hypothesize that the different barcode groups that belong to the same cell should be classi ed to the same taxa by taxonomic classi ers even if they possess completely different sets of reads.In this approach, reads of each barcode groups are rst classi ed based on a taxonomic database to estimate the barcode group's associated taxonomy abundance.The taxonomic abundances of all barcode groups are binned into a vector consisting of all genera, wherein the bin value is proportional to the number of reads in the barcode group mapping to it based on the available taxonomic database.For taxa accurately represented in the database, most reads will be assigned to one genus bin, while cells from poorly represented taxa may be assigned to several.The core concept is that related cells generate similar genus vectors, even if their reads cover different portions of the genome, and even if the genus to which individual reads are assigned to does not perfectly match the true genus (Methods).With the genus vectors in hand, related cells can be clustered using Uniform Manifold Approximation and Projection (UMAP) 47 for visualization.After clustering, reads from all cells in a cluster are pooled to generate a consensus genome.
The e cacy of the TDA for clustering cells depends on the classi cation method and database used to map reads into genus bins.If a species is totally novel, such that few of its reads can be annotated to any genera or other taxa level, the vector will contain minimal useful information for clustering.To identify the best database for the TDA, we therefore evaluated the most popular software for taxonomic classi cation and quanti cation.These included tools based on K-mer (Kraken2/Bracken 48,49 ), marker gene (MetaPhlAn3 50 ), and protein similarity (Kaiju 51 ).For this evaluation, we simulated a microbiome using downloaded available genomes, processed into single cell barcode groups with read structures resembling the output of EASi-seq (Methods, Fig. S7a-b).To assess the e cacy of a classi cation method, we calculated the accuracy of barcode purity prediction and taxonomic annotation, and barcode recovery rate with ltering.The k-mer based Kraken2/Bracken with PlusPF database (v.2021/01/27) 52,53 showed the best performance in genus identi cation accuracy, purity prediction accuracy, and accurate barcode retention rate (Discussion S1, Fig S7c-h) and was our choice going forward.
With the TDA validated on a simulated dataset, we next experimentally veri ed it using the known composition of the ZymoBIOMICS synthetic community.After ltering based on the percentage of mapped reads and genus level purity (Fig. S8a-b), the TDA correctly clustered and identi ed all ten populations in this synthetic community (Fig. 2g, Table S5).In addition, 97.34% of barcode groups were correctly annotated, re ecting the good representation of these community members in that database Integrating metagenomic contigs to reduce cell-type bias and increase overall coverage A unique and powerful feature of EASi-seq when combined with unbiased clustering is the ability to pool single cell data to increase genome coverage.Compared to EASi-seq, metagenomic sequencing does not rely on intact cells, and uses all extracted nucleic acid that may better capture all microbial taxa.Thus, to enhance the coverage of EASi-seq, we developed an approach to integrate metagenomic data using a similar strategy to the TDA, in which we calculate a genera abundance vector for each contig assembled from metagenomics data, then co-cluster the metagenomic contigs with the single cell barcode groups (Fig. S9, Table S6, Methods).These vectors are ltered by purity (Fig. S10) before clustering.From a metagenomic assembly of the ZymoBiomics community, we identi ed 1427 of 4844 contigs that had >90% association with one genus.Most contigs clustered in a fashion that overlapped with the single cell data points (Fig. 2h, Fig. S11).With reads added by metagenomic contig integration, we achieve an average cluster coverage of 94.31±4.92%for bacteria and 2.74±3.24%for fungi, and the relative contig lengths approach 100% of the genome (Fig. 2i-j).Additionally, the assembled contigs have a GC content consistent with the reference genomes (Fig. S12a) and an average N50 of 49 Kbp (Fig. S12b).These results demonstrate that integration of metagenomic contigs with the EASi-seq atlas enhances capture of diverse microbial taxa and increases genome coverage.

Strain-resolved differentiation
Differentiating between strains within a species is important for analysis of natural and engineered microbiomes 18 .Because EASi-seq can obtain thousands of reads on each cell, it affords novel opportunities for strain differentiation.To evaluate the ability of EASi-seq to accomplish this, we used it to analyze a synthetic community consisting of twenty-two equally mixed strains of Eggerthella lenta 54 (Fig. 3a, Table S7).We sequenced the library at 105,896,184 paired-end reads after quality ltering.We grouped the reads by barcode and aligned them against the reference genomes.To ensure read quality, we ltered barcode groups based on read counts and alignment rate (Fig. S13), recovering 5345 barcodes containing 101,760,151 reads.Because the strains have highly overlapping genomes, most reads align to multiple strains; thus, only reads speci c to a single genome are useful for strain identi cation.Based on this, we developed a strain resolution approach reminiscent of transcript isoform expression estimation (BitSeq) 55 .We treat each genome as an isoform of one gene and estimate their "expression" level in each barcode group using BitSeq (parseAlignment and estimateVBExpression functions).All reads in a barcode group are mapped to the isoforms/strains and the probabilities of reads originating from a given isoform/strain are calculated for each alignment using a sequence-speci c bias correction method (parseAlignment).Alignment probabilities are then used to calculate the posterior distributions of each isoform/strain via variational Bayes inference (estimateVBExpression), which is used to determine which strain a given cell most closely resembles (Methods).We aligned the reads in each barcode group to the reference genomes and recorded the overlap, using a Log-Normal read distribution to calculate the probability of originating from each reference genome, accounting for quality scores and mismatches.The barcode group is then assigned to a strain with more than 15% abundance and the highest abundance.(Fig. S14, Table S8).To visualize the resultant annotations, we plot the data as a UMAP and pair plot of the abundance estimation (Fig. 3b and Fig. S15).The separation between clusters on the UMAP plot con rms EASi-seq's strain-level resolution.

Single cell atlas of a human gut microbiome
The human gut microbiome comprises vast numbers of microbes from hundreds to thousands of species 56 .Additionally, it can vary between individuals as a result of time, diet, geographical location, and health 57 .Thus, characterizing microbiomes, the microbial taxa present, their genetic properties, and the bioactive molecules they synthesize, is critical to understanding the dynamics and complexity of this ecosystem.Most approaches use amplicon sequencing of the 16S rRNA gene or bulk metagenomic sequencing 58,59 .EASi-seq would provide unique information missed by these methods, including single cell-level heterogeneity and cell-cell interactions.To explore this possibility, cells isolated 60 from the human gut microbiome of a healthy donor were pro led by EASi-seq (Fig. S16a).After quality ltering, we recovered 232,705,096 paired end reads.We grouped reads by barcode and ltered by read count (>1000 reads) and genus purity estimated by Kraken2 (>80%) to remove multiplets and cell aggregates (Fig. S17a-b, Discussion S2, Table S9).The recovered 1118 barcode groups contained ~150,000 reads on average.To increase cell capture e ciency and genome coverage, we also performed metagenomic sequencing of the sample (Table S10) and integrated it into the single cell data as described previously (Fig. S9, Fig. S18a-c).We ltered contigs based on read percentage classi ed by Kraken2 and genus level purity before integration with the EASi-seq data (Figs.S18d-f).We generated a cell atlas, identifying 95 clusters or microbial populations (Fig. 3a) with varied cell numbers and read counts (Fig. S19).The metagenomic data increased the number of unique reads and clustered well with the single cell data (Fig. S21, Table S11).Nevertheless, several genera remain underrepresented in the atlas, including Bacteroides, Phocaeicola, Parabacteroides, Akkermansia, and Alistipes which may be a result of the cell isolation 60 or sample storage artifacts 61 , as has been described previously (Fig. S20, Discussion S3).
The taxonomic level of the clustering depends on the taxonomic level used for the mapping in the TDA.Since we used genus for the analysis so far, clusters in the UMAP most closely represent this level.Thus, some clusters may group cells from multiple species, which may be resolvable by isolating these groups and re-clustering with a TDA analysis that uses species-level Kraken2 estimation (Fig. S22).For example, the two clusters with the most cells (Blautia-A, and Bi dobacterium) can be categorized into 10 and 7 sub-clusters, respectively, corresponding to different populations of these genera coexisting in the sample.

Taxonomic distribution of antibiotic resistance genes
Antibiotics can profoundly impact the gut microbiome in terms of species composition, microbial metabolic activity, and antibiotic-resistant gene (ARG) abundance 62 .To evaluate ARG distribution among taxa in the fecal microbiome, we searched for ARGs in each cluster (Fig. 3b, Table S12) by aligning the reads against the Comprehensive Antibiotic Resistance Database (CARD) 63,64 .We ltered the alignments by mapping score (Bowtie2 output SAM MAPQ >=42), selected the ARGs for protein coding, and identi ed

Functional annotation of gene clusters detected in fecal microbiome genera
Biosynthetic pathways are often encoded as gene clusters that allow cells to acquire the ability to synthesize new molecules 66,67 .Many gene clusters have already been observed and characterized for function, allowing this information to be annotated to single cell datasets based on detection of key pathway genes, such as MetaCyc 68-70 and KEGG [71][72][73] .Using MetaCyc, in the 95 genera groupings found in our fecal microbiome, we identi ed 194 gene clusters belonging to 29 classes in 5 super classes, with biosynthetic functions including generation of energy precursors, degradation utilization and assimilation, transport, and macromolecule modi cation.Additionally, we found that different taxa possess distinct pathways, as might be expected on their unique ecological niches (Fig. 3c, Table S13).
Even within a similar pathway type, different genera have different functions, such as amino acid metabolism.For example, Blautia_A possess the pathway to produce arginine, aspartate, ornithine, lysine, methionine, serine, and tryptophan; Bi dobacterium to synthesize the branched amino acids isoleucine, serine, and valine; Akkermansia to synthesize arginine, isoleucine, valine, and branched amino acid; and Anaerobutyricum to synthesize ornithine and methionine.Different genera also utilize distinct carbohydrate sources, with pathways for glucose, galacturonate, lactose, trehalose, sucrose, galactose, stachyose, rhamnose, and mannose all detected in the microbiome.Glucose degradation was identi ed in Bi dobacterium; sucrose degradation was seen in Agathobacter, Anaerostipes, Coprococcus, Ruminococcus_D and Streptococcus; and, starchyose degradation was detected in Blautia_A, Coprococcus, Fusicatenibacter, KLE1615, and Roseburia.The ability to unambiguously link functional properties to community members is useful for unraveling the web of pathways that comprise all microbiomes and, ultimately, should aid in the engineering of microbiomes to improve gut health.

Single cell gene distribution in Halioglobus
To demonstrate the ability to analyze the gene distribution at single cell level, we analyzed the genus cluster with highest abundance, Halioglobus, which accounts for 810 barcode groups (23.7% of the 3,417 total counts).The genus Halioglobus belongs to the class Gammaproteobacteria and family Halieaceae, which is characterized as Gram-negative, non-endospore-forming, aerobic, oligotrophic, and mesophilic bacteria.The family is exclusively isolated from marine environments and is one of the major bacteria groups in coastal or open ocean environments 91 .Although a few isolated Halioglobus species isolates have been reported [91][92][93][94][95] , the heterogeneity within a Halioglobus population has never been studied.
Within the single cell cluster, we rst annotate genes using HUMAnN3 50 .After ltering based on the gene count and cell counts (minimum cell counts per gene = 10 and minimum gene counts per cell = 10), the gene presence/absence matrix of the Halioglobus barcode groups are grouped into 6 clusters by Leiden algorithms 96 (Fig. 5c).The gene presence/absence matrix of each cluster is shown in Fig. 5d.This result suggests the Halioglobus genus in our sample is a heterogeneous population and indicates that EASi-seq is suitable for the analysis of a heterogeneous population, which could potentially be used for more detailed single-cell-resolution pan-genome analysis.

Discussion
Microbes play key roles in all ecosystems and are important to human health.While they comprise the most diverse forms of life on the planet 97 , there are few tools available for sequencing them at single-cell resolution.Additionally, while tools for single mammalian cell genomics have become widespread 16 , analogous tools for microbes have lagged, due to the technical challenges of isolating and sequencing them in the numbers required to characterize diverse microbiomes.With these realizations in mind, we developed a work ow for e cient microbe sequencing using Mission Bio's commercial single-cell platform.This instrument is broadly distributed and accessible to non-experts, and therefore constitutes an opportunistic foundation on which to build a single microbe sequencing technology.A major element enabling this has been a simple and general bulk technique to purify single-cell genomes in hydrogels that are compatible with the instrument.Our lysis procedure is applicable to all microbe types, including archaea, bacteria, and fungi, and the commercial micro uidics allow high throughput and e cient singlecell barcoding, to obtain unbiased sequencing for tens of thousands of cells in a sample per run.
The data generated by EASi-seq is unique in that reads are grouped at the level of single cells.By contrast, the dominant method of metagenomic sequencing discards single cell information and captures the sequence data as a mixed pool of short reads.This mixed pool output necessitates complex bioinformatic approaches for contig reconstruction that cannot exploit single-cell information.Therefore, in addition to developing a novel approach for obtaining single-cell data, we also develop novel bioinformatic approaches that exploit the data's single-cell structure.These include ways to allow cells to be clustered by similarity, aggregation of the reads within a cluster to increase genome coverage, annotate phylogeny and genes, and to scan genomes for genetic elements of interest.By enabling the construction of detailed cell atlases that capture the overall species demographics of a microbiome, EASiseq affords new opportunities for characterizing the interaction webs inherent to these systems that are near impossible to obtain with metagenomic techniques.With the ability to integrate EASi-seq and metagenomic reads, EASi-seq can provide a complementary viewpoint to metagenomics.
There still remain aspects of the EASi-seq method that can be improved.Importantly, the coverage per barcode is low, which is caused by three reasons.First, before droplet barcoding, the genomic DNA is fragmented by Nextera-like tagmentation, which leads to only 50% of the genomic fragments being viable for barcoding PCR 40 .To overcome this ine ciency, we anticipate that future implementations of EASi-seq can increase the complements of adaptors 98 or use single-adaptor transposition and uracil-based adapter switching within the barcoding PCR step 99 .Second, the heterogeneous genome sizes of microbes require different amounts of transposase to achieve the appropriate fragment size 100 .Although we did extensive optimization, one concentration does not t all needs.For certain genomes, the transposase concentration could either be too high (for smaller genomes) and generate fragments that are below the size-selection threshold or too low (for larger genomes) and produce long fragments incompatible with downstream processing.Third, in adapting the protocol to directly integrate into a commercial device, it was necessary to utilize the barcoding beads from the Tapestri V2 reagent kit.The beads' barcoding primer has a 15 bp constant region with a melting temperature of 48°C.While we used this sequence as the forward priming site in the barcoding PCR, a higher temperature of 55°C was used as the anneal temperature to avoid random priming.This may lower e ciency in the PCR step.Future optimization can involve development of barcoding beads with improved primers having an elevated melting temperature.Finally, we suspect that coverage can be also improved with an additional single genome ampli cation step prior to the tagmentation, which can be achieved either in droplet 33 or in hydrogel beads 26 .Such improved coverage will greatly advance the application of EASi-seq.
Even in its current form, EASi-seq represents a highly accessible platform technology for generating detailed and comprehensive single-cell genome atlases independent of isolation and culturing.Such atlases will have a broad and sustained impact on microbiology, similar to what has been accomplished for mammalian cells.Because we build our work ow on a commercial architecture that is constantly adding features, many of the same improvements and innovations may carry over to microbiomes.For example, after the rst demonstrations of mammalian cell DNA and RNA sequencing, multiomic approaches were built on top of the original technologies.These include the ability to measure surface and internal proteins, characterize epigenetic signatures and genome structure, and integrate spatial data 101 .For example, microbial RNA-seq is possible using universal cDNA methods amenable to single cell barcoding and would thus allow addition of transcriptional state measurements with EASi-seq.Using oligonucleotide-labeled binders like including antibodies, lectins, and aptamers, microbes can be stained prior to EASi-seq, allowing for recording proteomic and serotype signatures in a manner similar to Abseq 102 , DAb-seq 103 , CITE-seq 104 , inCITE-seq 105 , and INS-seq 106 .Similarly, the lysis and molecular biology processes of EASi-seq should carry over to DNA viruses and, with the implementation of reverse transcription, RNA viruses, holding potential for single virus genome atlasing.All twenty-two E. Lenta strains (Table S3 list of E. lenta strains) were cultured in appropriate media 54 and equally mixed based on CFU counting in culture media.The cell mixture is stored at -80°C until use.
Before processing, thawed cells were washed 3 times to remove the storage media and ltered with 5 µm syringe lter to remove cell aggregates.After cell counting, the cells were resuspended to 100 million per mL in PBS.

b. Human microbiome and cell isolation
Fecal sample from health donor is stored at -80 °C until use.Cell isolation was performed according to previously reported protocol 60 .About 0.5 g of fecal sample was homogenized in PBS (10 mL).The suspension is ltered through a 50 µm cell strainer (Corning, 431752) to remove the large fecal particles and loaded into a 15 mL centrifuge tube with 3.5 mL of 80% Nycodenz® solution (Cosmo Bio USA, AXS-1002424).After centrifuge at 4700xg for 40 min at 4°C, the layer corresponding to cells was collected by pipetting.The cells were washed with PBS for 3 times, ltered with 5 µm syringe lter, and then resuspended to 100 million per mL in PBS.
c. Ocean water microbiome and cell isolation Sea water was collected at Paci c coastline near San Francisco (GPS coordinate: 37.7354373 N, 122.5081862W) by submerging a 1000 mL sterile bottle into the ocean.The sea water was transferred to the lab on ice.The cell was isolated according to the published protocol 32 .Brie y, the sea water was rst ltered through a 50 µm cell strainer (Corning, 431752) to remove sands or other large particles.The suspension was then ltered by a 0.45 μm vacuum lter (Millipore, SCHVU01RE) to capture the cells on the membrane.The membrane was cut off from the lter with a sterile razor blade and transferred a 15 mL centrifuge tube with 5 mL PBS.The cells were released from the membrane by vortexing the tube at maximum speed for 2 min.The cells were washed with 10 mL PBS for 3 times and passed through a 5 μm syringe lter to remove remaining virus or large particles.The cells were resuspended to 100 million per mL in PBS.

Micro uidics device fabrication
Micro uidics devices were fabricated with standard photolithography and soft lithography method.
Custom device fabrication is not necessary for the single cell sequencing using Mission Bio Tapestri but used for work ow optimization.Master photomask was designed using AutoCAD and printed at 12,000 DPI (CAD/Art Services, Bandon, OR).To make the master structure, SU8 Photoresist (MicroChem, SU8 3025 and SU8 3050) were spin coated on three-inch silicon wafers (University Wafer), soft baking at 95°C for 10 to 20 min, UV-treated through the photomasks for 3 min, hard baked at 95°C for 5 to 10 min and developed in propylene glycol monomethyl ether acetate (Sigma Aldrich).For the micro uidic devices, poly(dimethylsiloxane) (PDMS) (Dow Corning, Sylgard 184) and curing agent were mixed in 10:1 ratio, degassed and poured over the master structure, baked at 65 °C for 4 h to cure, and peeled off from the wafer.After hole punched with a 0.75 mm biopsy puncher, the devices were plasma treated and bonded to glass slides.The channels were treated with Aquapel (PPG industry) to for hydrophobic surface and dried by baking at 65°C for 10 min.In case of using Tapestri, the MissionBio Tapestri DNA cartridge and a 0.2 mL PCR tube were mounted onto the Tapestri instrument.50 µL beads solution, 50 µL tagmentation reagents, and 200 µL encapsulation oil were load in the cell well (reservoir 1), lysis buffer well (reservoir 2), and encapsulation well (reservoir 3) on the Tapestri DNA cartridge, respectively.The Encapsulation program was used for droplet generation.The droplets were collected into a PCR tube.
For custom build micro uidic device, the beads solution, the tagmentation reagents, and 5% (w/w) PEG-PFPE surfactant (Ran Biotechnologies) in HFE 7500(3M) were loaded into three syringes and placed on syringe pumps.The syringes were connected to the co-ow droplet generator device via PTFE tubing.The pumps were controlled by a Python script (https://github.com/AbateLab/Pump-Control-Program) to pump bead solution at 200 µL/h, tagmentation reagents at 200 µL/h and oil at 600 µL/h to generate droplets.The droplets were collected into PCR tubes.
The droplets generated by either method are incubated at 37°C for 1 h, 50°C for 1h, and hold at 4°C to ensure hydrogel melting and Tn5 complete reacting.

c. Droplet barcoding PCR
The tagmentation droplets from the previous were merged with PCR reagents and barcode beads for barcoding with either Tapestri or custom build micro uidic devices.
In case of using Tapestri, 8 PCR tubes and DNA cartridge were mounted onto the Tapestri instrument.In case of using custom build micro uidics, the device was rst primed by lling electrode solution (2M NaCl solution) into the electrode and the moat channels.500 µL PCR reagents containing 1.67X Q5® High-Fidelity Master Mix (NEB, M0515), 0.625 mg/mL BSA, 1.2 µM reverse primer (GTC TCG TGG GCT CGG AGA TGT GTA TAA GAG ACA GTA CCC TTC CAA TTT AAC CCT CCA) were loaded into a 1 mL syringe.200 µL Mission Bio V2 barcoding beads washed with Tris buffer (pH 8.0) and resuspended in 10 mM Tris buffer containing 3.75% tween 20, 2.5 mM MgCl2, 0.625 mg/mL BSA.The beads were centrifuged at 1000 RCF for 1 min and the supernatant was removed.The remaining bead slurry (~110 uL) was loaded into PTFE tubing connected to a 1 mL syringe lled with HFE 7500 oil.The droplets after tagmentation were loaded into a 1 mL syringe.The three syringes and two syringes lled with 10 mL of 5% (w/w) PEG-PFPE surfactant (Ran Biotechnologies) in HFE 7500(3M) HFE 7500 were connected to the micro uidic devices.The follow rates are as follows: tagmenation droplets 55 µL/h, spacer oil 200 µL/h, PCR reagent 280 µL/h, barcode beads 148 µL/h, and droplet generation oil 2000 µL/h.To merge the tagmentation droplet, the electrode near the droplet generation zone was charged with an alternating current (AC) voltage (3 V, 58kHz).And the moat channel was grounded to prevent unintended droplet coalescence at other locations on the device.The merged droplets were collected into PCR tubes.
The droplets collected in the merging step were treated with UV for 8 min (Analytik Jena Blak-Ray XX-15L UV light source) and the bottom layer of oil in each tube were removed using a gel loading tip to leave up to 100 µL of droplets.The tubes were placed on PCR instrument and thermo-cycled with the following program: 10 min at 72°C for 1 cycle, 3 min at 95°C for 1 cycle, (15 s at 95°C, 15 s for 55°C, and 2 min at 72°C) for 20 cycles, and 5 min at 72°C for 1 cycle with the lid set at 105°C.

d. Barcoded Amplicon puri cation
The thermal cycled droplets in the PCR tubes were carefully transferred into two 1.5 mL centrifuge tubes (equal amount in each).If there were visible merged large droplets present, they were carefully removed using a 2 µL pipette.20 µL PFO were added into each tube and mixed well by vortex.After centrifuging at 1000 RCF for 1 min, the top aqueous layers in each tube were transferred into new 1.5 mL tubes without disturbing the bead pellets and water was added to bring the total volume to 400 µL.The barcoding product was puri ed using 0.7X Ampure XP beads (Beckman Coulter, A63882) and eluted into 50 µL H2O and stored at -20°C until next step.The concentrations of the barcoding product were measured with Qubit™ 1X dsDNA Assay Kits (ThermoFisher, Q33230).

Barcoding sequencing library preparation and sequencing a. Library prep and QC
The sequencing library were then prepared by attaching P5 and P7 sequences to the barcoding products using Nextera primers (Table S2).The library PCR reagents containing 25 uL Kapa HiFi Master mix 2X, 5 uL Library P5 index primer (4 uM), 5 uL Library P7 index primer (4 uM), 10 uL puri ed barcoding products (normalized to 0.2 ng/uL), and 5 uL of nuclease free water were thermal cycled with the following program: 3 min at 95°C for 1 cycle, (20 s at 98°C, 20 s for 62°C, and 45 s at 72°C) for 12 cycles, and 2 min at 72°C for 1 cycle.The sequencing library was puri ed with 0.69X Ampure XP beads and eluted into 12 uL nuclease-free water.The library was quanti ed with Qubit™ 1X dsDNA Assay Kits and DNA HS chips on bioanalyzer or D5000 ScreenTape (Agilent, 5067-5588) on Tapestation (Agilent, G2964AA).The libraries were pooled and 300 cycle pair-end sequenced by Illumina MiSeq, NextSeq or NovaSeq platform.

Sequencing le barcode extraction and single cell read le preparation
Raw sequencing FASTQ les were processed using a custom python script (mb_barcode_and_trim.pys) available on GitHub (https://github.com/AbateLab/MissonBioTools)for barcode correction and extraction, adaptor trimming, and grouping by barcodes.For all reads, combinatorial cell barcodes were parsed from Read 1, using Cutadapt (v2.4) 107 109 .The overall alignment rates for each barcode were collected from the log les.The barcode groups less than 50% overall coverage rate were removed.Each barcode group's coverages, numbers of mapped reads, covered bases, and mean depths of 10 corresponding species were calculated using Samtools v1.12 (samtools coverage) with default setting 110 .The purity of each barcode group was calculated as the percentage of reads that aligned to a dominant species.For the rarefaction analysis, 10,000 reads were randomly sampled from the SAM le of each barcode group.The coverage was calculated after each read sampling using Samtools.
b. Strain abundance estimation for synthetic community with 22 E. lenta strains The reference genomes of the 22 E. lenta strains were downloaded from NCBI (Tabel S3).The reads in single-cell FASTQ les were aligned to reference genomes using Bowtie2 (v 2.3.5.1) 109 with -a setting to report all matches.The overall alignment rates for each barcode were collected from the log les.The barcode groups with less than 50% overall coverage rate were removed.The probabilities of each alignment were calculated with parseAlignment command from BitSeq (v 1.16.0) 55using uniform read distribution option (--uniform).The abundances of the 22 strains within each barcode were calculated based on the alignment probabilities using estimateVBExpression command from BitSeq v 1.16.0 with default setting 55 .The abundance output les were combined and analyzed using a Python script.The barcode group stain identity was assigned to the strain with maximum abundance.If the maximum abundance is smaller than 15% in a barcode group, the barcode group is considered as mixed strains.
The UMAP (uniform manifold approximation and projection for dimension reduction) analysis was conducted using the Scanpy toolkit in Python 47,111 .
8. Taxonomic Discovery Algorithm a. TDA validation using simulation data 100 species were randomly selected from the NCBI assembly metadata le (ftp://ftp.ncbi.nlm.nih.gov/genomes/genbank/bacteria/assembly_summary.txt).The reference genome FASTA les were downloaded using the corresponding link in the metadata le (Table S4).Simulated pair-end read les were generated using a Python script according to the following rules.1. 100 barcode groups were generated for each species.2. The reads are 150 bp paired end.3. The amplicon length is in the range of 400-1000 bp. 4. Each barcode group has 0-49% percent of contamination reads. 5.The contamination reads were generated from the other 99 species.6.Each barcode has 1,000-10,000 pairend reads.
3 taxonomic classi ers were chosen for evaluation: Kraken2/Bracken v 48,49 with PlusPF database(https://benlangmead.github.io/aws-indexes/k2,Version: 1/27/2021), Kaiju v 51 with its standard database and MetaPhlAn v 50 with its standard database.All the pair-ended barcode group FASTA les were pro led using the three classi ers.The results were grouped and analyzed in Python.The predicted taxa purity was the abundance of the dominant taxa in each barcode group.The barcode ltering based on purity was performed using thresholds ranging from 50% to 99% purities.
The average after-ltering purity was the mean purity of all the barcodes that passed a certain threshold and after-ltering barcode counts was the barcode count of which passed a certain threshold.The UMAP clustering was performed with the genus abundances of all the barcode groups.The identity of each cluster was assigned with the most abundant taxa.The identi cation accuracy was calculated as the percentage of barcodes with the correct genus identi cation.
b. TDA analysis of single cell sequence data The single cell sequencing barcode group FASTQ les of ZymoBIOMICS, Human microbiome and the sea water microbiome samples were analyzed using TDA with Kraken2/Bracken as the taxonomic identi er.
For the Zymo BIOMICS sample, Kraken2 PlusPF database (https://benlangmead.github.io/awsindexes/k2,Version: 1/27/2021) was used, while for human microbiome and sea water microbiome, Kraken2 GTDB database (https://gtdb.ecogenomic.org/tools,Release 95) was used.The reads in each barcode group were rst classi ed by Kraken2, and the abundances at genus and species level were reestimated with Bracken using default threshold setting.The percentages of the mapped reads were extracted from the Kraken2 output les of barcode groups.The purities were calculated as the abundance of the dominant genus in the barcode groups.The data was ltered according to percentage of mapped reads and genus-level purity.The taxa abundance pro les of the remaining barcodes were combined and UMAP clustering was performed using The Scanpy toolkits 111 in Python script.The taxa of each barcode group were assigned to the most abundant one.

Metagenomic sequencing and assembly a. ZymoBIOMICS community
The metagenomic sequencing data of ZymoBIOMICS Microbial Community Standards D6300 (batch ZRC195925) was provided by Zymo Research Corporation.The reads were assembled using SPAdes-3.15.3 with '--meta' setting 112 .The quality of assembly was accessed by Quast 5.0.2 113 .
10. Comparison between metagenomic and single cell sequencing.
The genus abundances of the human microbiome metagenomic data and the pooled single cell sequence le were analyzed using Kraken2 and Bracken.The results were plotted as a scatter plot with triangle markers.For any genus with a barcode group associated, a round marker of the genus was added and its size is proportional to the barcode counts.

Single cell sequencing data integration with metagenomics
To integrate the metagenomic dataset, the contigs assembled from metagenomic sequencing (ZymoBIOMICS and human microbiome sample) were treated as individual barcodes and processed with TAD.The metagenomic reads were rst aligned to the assembled contigs using Bowtie2 v2.3.5.1 109 .The pair-end reads associated with each contig were extracted using Samtools v1.12 110 ('samtools view -b {BAM le} {Contig header} | samtools fasta > {Extracted_reads.fa}').The short reads from each contig were then evaluated by Kraken2 48 and the genus abundance were generated by Bracken 49 using default threshold setting.The purity was calculated as the abundance of the dominant genus in each contig associated with short reads.The contigs were ltered using the genus level purity.The taxa abundance pro les of the short reads associated with remaining contigs were combined and integrated with the single cell dataset using the Scanpy toolkits 111 in Python script.
12. Clustered barcode groups analysis a. Cluster assembly and evaluation Single cell barcodes of UMAP clusters were combined using concatenate command (cat) in the Linux system into single FASTQ les.The pair-end reads associated with barcodes that belong to the same UMAP clusters were grouped by Seqtk toolkit (https://github.com/lh3/seqtk)(seqtk subseq) into single FASTQ les.The assemblies were conducted with all reads associated to both single cell sequencing and metagenomic contigs of each UMAP cluster using Spades v 3.15.3 112with '--careful' setting.The assembled contigs were evaluated using Quast v 5.0.2 with or without reference genome input.To calculate the clustering error rate, all the reads associated to a cluster were mapped to the corresponding reference genome, the percentage of the reads that were not aligned was considered as the error rate.
Pathway analyses of each cluster was conducted using HUMAnN v 3.0 50 with the default MetaCyc 68,70,117 database.The pathway abundance les of each cluster were combined and plotted as a heatmap using the Seaborn module in Python.
The sub-categorizing of barcode groups in a UMAP cluster was using species abundance estimation.The 2 clusters with the most barcode groups in the human microbiome samples (Blautia_A, and Bi dobacterium) were further divided into sub clusters by UMAP aggregation with the Kraken2 species abundance estimation.

b. Gene association analysis
Comprehensive Antibiotic Resistance Database (CARD) (v 3.1.4) 63(https://card.mcmaster.ca/download)was downloaded and bowtie2 references were built with botie2-build command 109 .The combined reads associated with each UMAP cluster identi ed in the human gut microbiome were mapped to the CARD databases using Bowtie2 (v2.3.5.1) 109 .The mapping reads are ltered for MAPQ >= 42 to select the reads without mismatches using SAMTools (samtools view -bS -q 42).After duplicate reads were removed using SAMTools (samtools rmdup -S), the references sequence name (RNAME) of each alignment were extracted from the bam les.The unique genes associated with each UMAP cluster, and their frequencies were generated from the RNAMEs.The relative abundance antibiotic resistance gene is calculated as the unique ARO read count per million total read count.The resistance mechanism associated with antibiotic resistance ontology (AROs) were downloaded from the Comprehensive Antibiotic Resistance Database.

Data and code access
All sequencing data is accessible at the NCBI Sequence Read Archive (Accession numbers: SUB12874540).Python Jupyter notebooks code used in this paper can be accessed at Abate lab GitHub: (https://github.com/xiangpenglee/EASi-seq.git).

Declarations Figures
EASi b) The hydrogel beads are size selected using differential centrifugation.The hydrogel beads are suspended in a density matching buffer (40% sucrose in PBS with 0.1% Tween 20) and centrifuged.
Particles of different size sediment at different rates, with the larger particles sedimenting faster.After centrifuge at 1000 x g for 10 min, the oversized hydrogel beads are pelted, and the supernatants are subject to a centrifuge at a higher speed (3000xg for 10 min).The pellets are then collected as the sizeselected hydrogel beads.c) Cells are lysed within the hydrogel beads by a two-step enzyme digestion.The beads are rst subjected to a cocktail of 4 different enzymes that digest cell walls before being treated with protease K to digest proteins.The small pore-size of the hydrogel allow proteins and other molecules to freely diffuse, while immobilizing long DNA molecules.After the treatments and washing, only genomic DNA remains in the hydrogel beads.
d) The microbial genomic DNA in each hydrogel bead is tagmented in a droplet ( rst step, bottom) before being subsequently paired with barcode beads for barcoding PCR (second step, top) on using the Tapestri instrument's micro uidic modules.
e) Sequencing of amplicons from the barcoding PCR generates single-cell shotgun reads for thousands of cells.
f) EASi-seq allows high-throughput microbiome genome atlas analysis, as well as cluster-based genome assembly, strain identi cation, and pathway analysis.

Methods 1 .
Microbiome samples processing a. Synthetic community ZymoBIOMICS standard (Zymo, D6300) was stored at -80°C until use. 100 µL of ZymoBIOMICS was washed with 4 mL of PBS for 3 times to remove the storage buffer.The cell density is measured with Countess™ cell counting slides (Thermo Fisher, C10228) using an EVOS microscope.After counting, cells were resuspended to a nal concentration of 100 million per mL in PBS.

3 . 4 .
Single cell genomic DNA isolation in hydrogel beads a. Cell encapsulation in hydrogel beads 500 µL cell suspension (100 million per mL in PBS) was mixed with 500 µL hydrogel precursor solution (12% acrylamide, 1% BAC, 20 mM Tris, 0.6% sodium persulfate, and 20 mM NaCl in H 2 O) in a 15 mL centrifuge tube. 1 mL HFE 7500 with 2% surfactant (008-FluoroSurfactant, RanBiotechnologies) was added to the cell/hydrogel precursor mixture.Emulsion was formed by passing the oil/aqueous mixture 5 times through the needle.20 µL of TMEDA (tetramethylethylenediamine, Sigma) was added into the emulsion and the emulsion was incubated at 70 °C for 30 min and at room temperature for overnight for gelation.The emulsion can be stored at 4°C for up to 1 week.The emulsion was centrifuged at 1000 RCF for 1 min and the bottom oil layer was removed by using a gel loading tip. 1 mL of 20% PFO (1H,1H,2H,2H-per uoro-1-octanol, Sigma, 370533) and 5 mL of PBST buffer (0.4% tween 20 in PBS) were added into the emulsion.The mixture was vortexed at maximum speed for 1 min break the emulsion and centrifuged at 1000 RCF for 5 min.Any remaining oil was removed by pipetting through a gel-loading tip.b.Hydrogel size selectionDifferential velocity centrifugation was performed to select the hydrogel beads from previous step within the diameter between 5 to 15 µm.The hydrogel beads were resuspended in 14 mL high density buffer (40% sucrose in PBS with 0.4% tween 20).First, the beads were centrifuged at 1000 RCF for 5 min to pellet large gels.The supernatant was transferred to a new 15 mL tube and centrifuged at 3000 RCF for 10 min to pellet the right sized beads.The supernatant (still containing beads smaller than 5 µm) was discarded and the pelleted beads were washed 3 times with PBST to remove the high-density buffer.c. Cell lysis in hydrogel beads 100 µL of size selected beads were treated in 1 mL cell wall digestion buffer (TE buffer solution containing 2.5 mM EDTA, 10mM NaCl, 2U zymolyase, 5 U Lysostaphin, 50 U mutanolysin, and 20 mg Lysozyme) at 37 °C overnight.The beads were then pelleted by centrifugated at 3000 RCF for 10 min and washed with PBST for 3 times.The beads were then treated in 1 mL protein digestion solution (TE buffer with 4U of Proteinase K, 1% triton X100 and 100 mM of NaCl) at 55 °C for 30 min.Following lysis, the beads were thoroughly washed with PBST, 100% EtOH, and PBST 3 times to ensure complete removal of proteinase K and other chemicals which may inhibit the downstream reactions.The beads were then ltered with 10 µm cell strainer and ready for droplet tagmentation.Single cell tagmentation and barcoding in droplet micro uidics Micro uidic droplet encapsulation, tagmentation, and barcoding PCR were performed on commercial single-cell DNA genotyping platform (Mission Bio, Tapestri) or custom build micro uidic devices with the same functions.a. Tagmentation reagents 25 µL Tn5-Fwd-oligo GTA CTC GCA GTA GTC AGA TGT GTA TAA GAG ACA G (100 nM, IDT), 25 µL, Tn5-Rev-oligo TAC CCT TCC AAT TTA ACC CTC CAA GAT GTG TAT AAG AGA CAG (100 nM, IDT) , and 25 µL Blocked ME Complement /5Phos/C*T* G*T*C* T*C*T* T*A*T* A*C*A*/3ddC/ (200 nM, IDT) and 25 µLTris buffer were mixed well in a PCR tube by pipetting.The mixture was incubated on a PCR thermal cycler with the following program: 85°C for 2 min, cools to 20 °C with a ramping rate at 0.1 °C/s, 20 °C for 1 min, then hold at 4 °C with lid at 105°C. 100 uL of glycerol was added into the annealed oligo.Unloaded Tn5 protein (1 mg/mL, expressed by QB3 MacroLab, Berkeley, CA), dilution buffer (50% Glycerol, 100 mM NaCl, 0.1 mM EDTA, 1 mM DTT, and 0.1% NP40 in 50 mM Tris-HCl pH 7.5 buffer), and the pre-annealed adapter/glycerol mix were mixed at 1:1:2 ratio by pipetting.The mixture was incubated at room temperature for 30 min then stored at -20 °C until use.For droplet tagmentation, equal amount of assembled Tn5 and tagmention buffer (10 mM MgCl2, 10 mM DTT in 20 mM TAPS pH 7.0 buffer) were mixed.b.Droplet tagmentation In the rst droplet step, the tagmentation reagents (0.125 mg/mL assembled Tn5, 10 mM MgCl2, and 10 mM DTT in 20 mM TAPS pH 7.0 buffer) and the genomic DNA in hydrogel beads (equivalent to 3 million cells per mL) in 10 mM MgCl 2 , 1% NP40, 17% Optiprep, and 20 mM TAPS pH 7.0 buffer were co-owed in the micro uidic devices to form droplets.
Electrode solutions were loaded into electrode wells (200 µL and 500 µL in reservoirs 4 and 5, respectively).After running the Priming program, 5 µL of reverse primer (GTC TCG TGG GCT CGG AGA TGT GTA TAA GAG ACA GTA CCC TTC CAA TTT AAC CCT CCA, 100 µM, IDT) was mixed with 295 µL Mission Bio Barcoding Mix and loaded into PCR reagent well (reservoir 8) of the DNA cartridge.The droplets from previous step (~80 µL), 200 µL of V2 barcoding beads, and 1.25 mL of Barcoding oil were loaded into cell lysate well (reservoir 6), barcode bead well (reservoir 7) and barcode oil well (reservoir 9), respectively.The droplets were merged with barcoding beads and PCR reagents by the Cell Barcoding program.The resulting droplets were collected into the 8 PCR tubes.
-seq genome puri cation, micro uidic, and bioinformatic work ow a) Microbial cells suspended in hydrogel precursor (acrylamide monomer and BAC crosslinker) are emulsi ed with a uorinated oil by passing the mixture through a syringe needle.After gelation, the cells are individually embedded in hydrogel beads.

Figure 4 Human
Figure 4

Figure 5 Coastal
Figure 5 14 ARGs from 15 genus clusters, with mechanisms including antibiotic target alteration, protection, replacement, inactivation, and e ux.ARGs with accession numbers ARO:3004480, ARO:3004601, and ARO:3000190 are most prevalent among the 14 ARGs and were identi ed in 10, 9, and 6 genus clusters, respectively.Species from Bi dobacterium, Blautia_A, Collinsella, and Anearobutyricum carry the most ARGs, at respective counts of nine, eight, six and six.Based on those nding, we predict Bi dobacterium potentially has strong resistance (11.2 ARGs read per million reads) to rifampicin and peptide antibiotics, consistent with prior ndings65.Dorea also has high potential resistance to aminoglycoside antibiotic (11.1 ARGs read per million reads) and tetracycline antibiotics (6.7 ARGs read per million reads).
and matched to a barcode whitelist.Barcode sequences within a Hamming distance of 1 from a whitelist barcode were corrected.Reads with valid barcodes were trimmed with Cutadapt to remove 5′ and 3′ adapter sequences and demultiplexed into individual singlecell FASTQ les by barcode sequences using the script demuxbyname.shfrom the BBMap package