A Protocol for Revealing Oral Neutrophil Heterogeneity by Single-Cell Immune Proling in Human Saliva

Neutrophils are the most abundant white blood cells in the human body responsible for ghting viral, bacterial and fungi infections. Out of the 100 billion neutrophils produced daily, it is estimated that 10 % of these cells end up in oral biouids. Because saliva is a uid accessible through non-invasive techniques, it is an optimal source of cells and molecule surveillance in health and disease. While neutrophils are abundant in saliva, scientic advancements in neutrophil biology have been hampered likely due to their short life span, inability to divide once terminally differentiated, sensitivity to physical stress, and low RNA content. Here, we devise a protocol aiming to understand neutrophil heterogeneity by improving isolation methods, single-cell RNA extraction, sequencing and bioinformatic pipelines. Advanced ow cytometry 3D analysis, and machine learning validated our gating system model, by including positive neutrophil markers and excluding other immune cells and uncovered neutrophil heterogeneity. Considering specic cell markers, unique mitochondrial content, stringent and less stringent ltering strategies, our transcriptome single cell ndings unraveled novel neutrophil subpopulations. Collectively, this methodology accelerates the discovery of salivary immune landscapes, with the promise of improving the understanding of diversication mechanisms, clinical diagnostics in health and disease, and guide targeted therapies.


Introduction
Rapid advances in mapping single-cell transcriptional states have been made through the Human Cell Atlas (HCA) 1 and the NIH BRAIN initiatives, providing important insights into human health 2 . By considering transcript and protein compositions, functional categorization of cells is reshaping our models of ontology and helping discover new cell populations and elucidating heterogeneity. Immunity works through orchestrated cellular actions with specialized cellular tasks. In an effort to catalogue immune repertoire, studies are surveying speci c tissues and bio uids revealing unique signals, differentiation patterns, cell activation states and diversity. While a large number of immune cell populations, especially lymphocytes and macrophages, have been widely investigated by single cell analysis there is a lack of investigation in neutrophils, the most abundant myeloid cells [3][4][5][6] . Investigating neutrophil cell diversity is an emerging eld with the potential to reveal novel cell functions and applications 7-10 . Neutrophils are produced at a rate of 10 11 cells/day, comprising 50 to 70 % of all leukocytes 11 and highly active to respond to infection, injury, migrating to sites initiating in ammation 12,13 . Human neutrophils mature in the bone marrow from committed myeloid precursors and through subsequent stages of differentiation they develop into segmented adult mature cells 14 . Characterization of neutrophil diversity by transcriptomic data has been limited due to their complex nature including short lifespan, lower RNA 1.1.1. Sample Handling for Saliva Neutrophil Enrichment: Fresh primary human saliva was harvested and prepared from healthy donors. Each donor rinsed ve times with ~10 mL of 0.9 % NaCl solution (sterile) for 30-60 sec. with a gap of 3 min between each rinse to collect 50 mL total volume. The cells were then pelleted down through centrifugation at 160 x g at room temperature (RT) for 5-10 min. From 50 mL total volume, 40 mL of supernatant were discarded by aspirating carefully without disturbing the cell pellet. It is of utmost importance to maintain each step of primary saliva neutrophil isolation at RT as the cells tend to lose viability faster at lower temperature (e.g. at 4 °C). Cells were then resuspended in the remaining ~10 mL of 0.9 % saline solution and ltered through a 40-micron nylon mesh lter using gravity to remove food particles and eliminate mucus present in the collected oral rinse. 40 µm ltered oral rinse was sequentially ltered through 20 µm and 10 µm pluriStrainer nylon mesh lters to remove epithelial cells. All ltration steps were done by gravity ow or centrifugation at 160 x g at RT for 1 min. Vacuum suction for pluriStrainers should be avoided due to increased contaminants. Viability and cell density were determined through a trypan blue exclusion using the Countess automated cell counter. Cells obtained by this method from a single healthy donor in 50 mL oral rinse of resting saliva was ~1.8 x 10 5 cells. The purity of saliva neutrophils isolated by this enrichment method was veri ed by both microscopy (Fig. 2) and ow cytometry (Fig. 3). To compare the identi cation of neutrophil subpopulation by ow cytometry and scRNA-seq, we looked at the heterogeneity of the immune cell in whole saliva by computational analysis of ow cytometry data using FLOCK 25 unsupervised analysis for identifying both known and novel cell populations from the ow cytometry data (Fig. 4). The expression pro le of neutrophil-speci c genes from the scRNA-seq data also veri ed the purity of the FACS-sorted single cell as neutrophils ( Fig. 5 and Fig. 6).

Flow Cytometry Sorting:
To ensure unbiased transcriptomic patterns, cells were not labeled with antibody/cell-surface-marker prior to sorting. The viability of cells was checked before each sorting and only samples containing > 70 % viable cells were sorted. Cells were sorted using BD FACS Aria II (BD Biosciences) with custom PMT2 using a 130-micron (10 PSI) nozzle size. A nozzle size of 100-micron is desirable due to a ow pressure of 20 PSI and would also be suitable to use. The ow pressure of 70 PSI generated by a 70-micron nozzle during sorting affects both the viability and RNA quality of sorted neutrophil cells. Cell populations were selected by software gating for high-FSC/high-SSC Scatter-gate followed by doublet-discrimination (DD) (FSC-DD and SSC-DD gates) to isolate single cells from any multicellular clusters/clump. Except for control wells, single cells were sorted into each well of a Framestar 384-well plate containing 2 µL of predispensed lysis buffer having ERCC (External RNA Controls Consortium) spike-in RNA standards ( nal concentration of 2 % Triton X-100, 2 U/µL RNase inhibitor, 1:20,000 dil. ERCC and nuclease free water). Blank or 'No Template Control (NTC)' wells contained 2 µL nuclease free water only; 'spike-in' control well contained lysis/FACS sort buffer with ERCC but no cell; and 'UHR' control well contained 2 µL lysis/sort buffer + 10 pg of Universal Human RNA (UHR). Except for the sorting-sample tube, all other reagents and FACS sorting plates were maintained at 4 °C on ice by using chill blocks. The lysis/sort buffer must be prepared fresh. The sorted plates were then frozen on dry ice and stored at -80 °C for downstream processes, including cell lysis, cDNA synthesis and PCR ampli cation according to the SMART-seq2 protocols. Out of three 384 wells, we generated 1084 single-cell cDNA wells and selected for QC-Pass criteria/parameters set to a single cDNA library plate ( Supplementary Fig. 2).
To compare the effect of pressure and electrostatic charge on cells while sorting by BD FACS Aria II, we also tested single cell sorting using other micro uidics based cell sorters, including the On-chip Sortmicro uidic chip cell sorter (On-chip Biotechnologies Co., Ltd, Japan) and the WOLF Cell sorter (NanoCellect Biomedical Inc., CA, USA). With the WOLF cell sorter, the volume requirement was very high and not compatible for use with the Smart-Seq II reaction. On the other hand, the volume needed by Onchip Sort is minimal allowing for its use with the Smart-seq protocols. In addition, On-Chip-SPiS singlecell dispenser captures high quality images providing a visual con rmation of single deposition into the wells ( Supplementary Fig. 1). We compared the purity of bulk sorted cells and quality of RNA of both systems and found that cell viability and cDNA obtained from On-Chip sorters were superior. However, due to its inherent pipetting system, the timing of this procedure was longer than neutrophil handling would allow. BD FACS Aria II work ow provided cell sorting with the volume needed, during a time frame compatible with maintaining good neutrophils viability and RNA quality.

Cell lysis, cDNA Synthesis and Quality Control:
Detailed procedure of cell lysis and cDNA for Smart-seq2 protocol was previously published 26 . In addition to modi cations 24 e.g. increasing cDNA PCR-ampli cation cycle from 18 to 21 cycles to compensate for lower mRNA levels in neutrophil 15,16,20 and modifying Template Switch Oligo (TSO) primer by 5'-biotinylation to reduce non-speci c ampli cation caused by TSO concatemers 24,27 . Before preparing the library for scRNA-seq, we carried out quality control assays of the cDNA samples by quanti cation using a PicoGreen dsDNA quantitation assay kit and qRT-PCR for ACTB expression using a TaqMan gene expression assay. A Hamilton Microlab STAR liquid handling system was used to select samples that passed the QC parameters for both PicoGreen dsDNA assay (RFU > 0.3) and qRT-PCR for ACTB expression (Ct < 35) ( Supplementary Fig. 2).

Preparation of sequencing library and sequencing:
We employed the Illumina Nextera XT DNA library preparation kit and performed multiplexed paired-end sequencing of barcoded libraries using an Illumina MiSeq in order to determine the required sequencing parameters for sequencing on the pooled libraries on a NovaSeq 6000 sequencing system. Samples in each well were barcoded by using a unique combination of Nextera XT Index Kit v2 Set A and Set D to identify sequence data from each single cell. To ensure initial quality of the sequencing libraries, we performed a MiSeq run on a pooled library of 16 randomly selected Nextera XT libraries created from the selected cDNA plates from each donor in order to determine the sequence quality and coverage needed to discover subpopulations among saliva neutrophils. Prior to loading the MiSeq, an Agilent 2100 BioAnalyzer High Sensitivity DNA chip run was performed and insert size of the sequencing library was determined to be between 250-500 bp. For the MiSeq-MO (medium output) run, 270 pM of 16-plex pool library was spiked with 1 % PhiX174. The 150 base paired-end MiSeq run of 16 single-cell data showed an average Q30 of Read 1 (75%), Read 2 (69%), i7 (83%) and i5 (79%) pass lter rate and a sequencing depth of 1.5-2.0 × 10 6 reads per cell. It has been previously shown that read-depth of 1.5-2.0 × 10 6 is adequate for the detection of saturating levels of RNA expression in single cells 28 . This information suggested that we could perform a sequence run of NovaSeq 6000 with a single pool of 384 Nextera XT libraries without over saturation of the RNA-seq read depth and gene counts from each single cell.
Based on the MiSeq run results, a total of 3 NovaSeq 6000 SP ow cell 2x150 XP work ow runs were performed on each of the 3 donors with 3 library pools consisting of 384 Nextera XT libraries from both single cells and controls. Each pool was loaded at 300 pM with a 1 % PhiX174 spike-in. Quality of 3 NovaSeq 384-plex pooled libraries were analyzed by both Agilent HS DNA chip and qPCR-based NGS library quanti cation using KAPA Library Quanti cation Kit -Illumina. FACS sorted single-cell plates were evaluated by PicoGreen dsDNA quanti cation assay (for cDNA concentration) and qRT-PCR TaqMan assay (for expression of housekeeping gene ACTB) for cDNA quality. Each selected cDNA library was used to generate an Illumina Nextera XT library and combined into a 384-plex pool for sequencing on the Illumina NovaSeq 6000 system.

Cell Morphology and Immuno uorescence 3D Holotomographic Microscopy:
Cell diversity of healthy human resting saliva and elimination of each cell type during each ltration steps of 40-, 20-, and 10-micron by our saliva neutrophil enrichment protocol was analyzed by 3Dholotomography imaging of unstained cells and cells stained with uorophore-conjugated cell surface CD-markers using 3D Cell Explorer microscope (Nanolive's 3D Cell Explorer-uo; Model CX-F). Cells were collected after each ltration step of neutrophil enrichment protocol. Unstained and stained cells suspended in 0.9 % saline solution and staining buffer respectively were imaged at 60X magni cation using class 1 low power Laser (λ=520 nm, sample exposure 0.2 mW/mm 2 ) and a depth of eld of 30 µm. For uorescent imaging, cells were stained by a ow cytometry staining protocol and were imaged by the uorescent module of 3D Cell Explorer-uo in DAPI, GFP (or FITC), and OFP (or TRITC) uorescent channels. Exposure of 100/100/100 ms, gains of 35/35/35 and intensities of 35/50/50 were used to capture images in DAPI/GFP/OFP-channels, respectively ( Fig. 2A).
1.1.6. Validation of Neutrophil Morphology by Microscopy and Giemsa Staining: To con rm the identity of unstained neutrophils done by 3D Holotomography microscope, based on their intracellular structure, we stained the same samples with Giemsa stain solution and imaged by Zeiss AxioVision microscope (Carl Zeiss Microscopy, LLC, NY, USA) at 40X objective using Zen Blue software.
For staining, enriched neutrophils obtained after 10 µm ltration step were spread into a cell monolayer on the charged side of the slide by CytoSpin 4 centrifugation. Around 25,000 cells in 50-200 µL volumes of cell suspension are loaded in Cytofunnel with white lter cards and caps (Shandon EZ Single Cytofunnel, cat. no. A78710003) for each slide and centrifuged at 700 RPM for 7 min at medium acceleration. Boundaries were drawn around the cell monolayer by using a hydrophobic pen and allowed to dry at RT inside the hood for ~30 mins. After drying, the cells were methanol xed by incubating for 5-7 min at RT. The slides were carefully washed twice in PBS (-) and dried by draining the PBS (-) completely. At this step, the slides were dried and stored overnight at 4℃. After drying, Giemsa stain solution (1:20 dilution) was added and incubated at RT for 30-60 mins. After incubation, the slides were washed carefully by draining the stain and slowly dipping the slide in an angular position in deionized water to prevent cells from getting washed away. A second wash was performed by dipping the slide in fresh deionized water for 2 min. Slides were dried in the hood and then mounted with coverslip using aqueous based mounting media such as CytoSeal 60 and observed in bright eld under 40X objective of microscope and images. Images were captured and processed by using Zen Blue software (Fig. 2B) and Nanolive 3D-Cell explorer microscope ( Fig. 2C; section 1.9.10).

Immuno uorescence for Microscopy and Flow Cytometric Analysis:
Cells from saliva samples were further processed for immuno uorescence. Brie y, cells were xed by 4 % paraformaldehyde (PFA) on ice for 30 min. After incubation, xed cells were washed at least twice by adding a staining buffer and centrifuged at 160 x g for 10 mins at RT to pellet cells. Because neutrophils don't form a clear pellet, the supernatant is carefully aspirated at slow speed from the top. After cell counting, 1 µL of Fc-block per million cells was added, followed by incubation for 15 mins at RT. The volume was reconstituted to 200 µL cell suspension/tube. Anti-human monoclonal antibodies were added to each tube according to manufacturer instructions. In the master mix, cell surface markers were added and incubated at RT for 1 hr in dark. Suggestive cell surface markers: PerCP/Cy5.5-CD11b (cat. no. analyses were run on a BD FACS Aria II (BD Biosciences) to obtain raw data in FCS format, which were later analyzed by FlowJo v10.6.1 (BD Biosciences) for 2D analysis (Fig. 3) and by FLOw Clustering without K (FLOCK v1) 25 for computational analysis using all markers simultaneously (Fig. 4), which is explained in detail in section 1.1.11. Each sample was analyzed for 'ungated' total cell populations and 'high-SSC gated' populations used for single-cell sorting of saliva neutrophil in this study.
1.1.8. scRNA-Seq data processing and Analysis: Single-cell RNA-seq data were processed according to the procedure described in detail in Krishnaswami et. al. 2016 24 . Brie y, raw sequencing les were demultiplexing using Illumina barcodes, while sequencing primers and low-quality bases were removed using the Trimmomatic package 29 . Trimmed reads were then aligned using HISAT 30 in two steps: rst to a reference of ERCC sequences, and then the remaining reads were mapped to GRCh38 (GENCODE). StringTie 30 was then used to assemble the resulting alignments into gene expression values (TPM) estimated using GENCODE v25 annotation (Ensembl 87; . Expression values for non-control cells were imported into Scanpy for PCA, UMAP, and cluster analysis 31 . Cells were ltered for quality using two different approaches. The rst liberal ltering consisted of removing cells with less than 50 genes per cell or with greater than 10% of the total gene counts being from mitochondrial genes. Genes were required to be present in at least 4 cells, with greater than 50 total counts. From these genes, the top 2500 variable genes were selected (Fig. 5). The second stricter ltering consisted of removing cells with less than 400 genes per cell or with greater than 10% of the total gene count being from mitochondrial genes. In fact, genes were required to be present in at least 4 cells, with greater than 50 total counts. From these genes, the top 2000 variable genes were selected (Fig. 6).
Unsupervised clustering was performed by rst performing PCA to determine principal components, then a neighborhood graph was constructed using those components. Next Louvain clustering was performed using the neighborhood graph. Using the Louvain clustering solution, marker gene determination was performed using logistic regression 31 . The outputs of this computational pipeline produce a set of unbiased cell type clusters, a gene expression matrix with the expression levels of genes in individual single cells grouped into cell type clusters, and a set of marker genes for each cell type cluster.
1.1.9. Publicly available healthy control data In addition to salivary neutrophils, RNA-seq data of immune-cell types and PBMC obtained from blood were acquired from the Gene Expression Omnibus (GEO) database under accession code GSE64655 32 . This data was analyzed by Scanpy in a similar fashion described above. Expression patterns for targeted marker genes, those with known differential expression patterns in the oral and blood neutrophils, were visualized ( Supplementary Fig.6).

Microscopic Analysis:
Raw images obtained for both unstained and stained cells were processed by software (STEVE software v1.6.3496, Nanolive) with similar parameters for the bright eld images which was also used to obtain digital acid-stained and RI 3D-rendering images. Parameters for processing of raw images captures in each uorescent channel (i.e. DAPI, FITC and TRITC) to determine the background cutoff pixel was determined by comparing with images of unstained negative control samples. These parameters were used consistently for all images processed for each uorescent channel.
For unbiased identi cation of cell types based on CD-markers, unstained cells were imaged and stained via digital acid-staining (STEVE software v1.6.3496, Nanolive) according to the refractive index (RI) of the intracellular structures ( Fig. 2A). Four major cell types i.e. epithelial cells, neutrophils, monocytes, and lymphocytes were identi ed based on their size and nuclear structure and quanti ed by manually counting from multiple images obtained from three healthy donors (Fig. 2C). To further verify the neutrophil enrichment and con rm the elimination of other cell types, we did Giemsa staining of 10 µm ltered unstained samples (Fig. 2B insert).
To rule out the biased staining of cell types with cell surface markers, stained samples used in ow cytometry analyses were further imaged for selected CD-markers conjugated with uorophores detectable in the three uorescent channels (i.e. DAPI, FITC, and TRITC channels) available in Nanolive's 3D Cell Explorer microscope. Classical neutrophil positive marker CD-66b conjugated with paci c blue was imaged in the DAPI channel, whereas classical neutrophil negative markers CD14 and CD19 conjugated with FITC and PE, respectively were imaged in the FITC and TRITC channels ( Fig. 2A). Four major CDmarker positive cells i.e. CD-66b-PacBlue for neutrophils, CD14-FITC for monocytes, and CD19-PE for Blymphocytes were identi ed based on obtaining positive signals in the respective channels.

Flow cytometry Data Analysis:
The FCS les from BD FACS Aria II ow cytometry were transformed using FCSTrans 33 on R programming language. FCSTrans applies a logical transformation on uorescence channels and a linear transformation on scattering parameters followed by a min-max linear rescaling applied across all the channels to scale the range to [0, 4095]. After FCSTrans transformation, area parameters of measured channels were selected and saved to tab-delimited text les for downstream data clustering analysis.
Auto-gating is applied using an unsupervised clustering approach -FLOCK 24 (Flow Clustering Without K). FLOCK is publicly available on ImmPort Galaxy (https://immportgalaxy.org/). FLOCK identi ed 23 cell subsets across the 6 stained samples (two replicates for each of the three ltrations: 40 µm, 20 µm and 10 µm) by using all measured parameters simultaneously in density-based clustering analysis. Then each identi ed cell subset was visually examined on dot plots of all 2D marker combinations for manually annotating the phenotype (e.g., CD3 -CD19 -CD14 -CD11b + CD15 + CD66 + neutrophils).
Frequencies of the 23 FLOCK-identi ed cell subsets were quanti ed by their percentages (with the total number of cells as the parent). Mean uorescent intensity (MFI) values of each cell population for each marker were also calculated. Figure 4A visualizes the MFI values of a 40 µm sample (Tube-10_Specimen_001_002) using a heatmap for indicating the phenotype of each identi ed cell population. Based on visual examination of the 2D dot plots, 5 of the 23 FLOCK-identi ed cell subsets are salivary neutrophils. For each neutrophil subpopulation identi ed by FLOCK, mean percentage is calculated for duplicate les in each ltration step. Figure 4B shows the bar graph for mean percentages across the ve FLOCK-identi ed neutrophil subsets, indicating that the major/abundant neutrophil subpopulations consistently increased their percentages as ltration size decreased. For both manual annotation and interpretation of the 5 neutrophil subpopulations, 2D dot plots (Fig. 4C) of the 40 µm sample with each neutrophil subpopulation highlighted in a different color were generated to visualize the phenotype differences of these subpopulations.

Comparison to other methods:
• The key strength of our protocol are as follows: • The protocol overcomes the methodological limitations that produced the false-negative expression of neutrophils in many published studies due to low mRNA content of neutrophils and the different experimentation conditions needed in comparison with PBMCs.
• Saliva is an easily accessible and readily available clinical sample which makes this protocol noninvasive to patients requiring deep sequencing for diagnosis.
• We suggested important experimental conditions for neutrophil that are overlooked by researchers, different from processing of PBMCs. Neutrophils prefer room temperature and lower centrifugation speeds of 160 x g to 300 x g.
• This study presented processing time of primary neutrophils and the rate of RNA degradation. This information is important to consider a sample for further downstream steps of an NGS work ow.
• For deep sequencing, the samples should be collected fresh and processed within 4 hours. Storage of samples at -150 ℃ (liquid N2) or -20 ℃ after collection by DMSO-cryopreservation and methanolxation protocols established in other immune cells are not suitable for deep sequencing of neutrophils 16,34 .
• This protocol demonstrates stringency levels needed for ltering raw sequence data during bioinformatics analysis. It suggests inclusion of mitochondrial gene expression data during analysis that is considered an exclusion criterion for single cell sequence data processing.

Limitations of this protocol:
• This study is aimed for unbiased sequencing of saliva neutrophils in which unlabeled morphology was the gating determinant. While we coupled with speci c pipelines able to detect other cells, it may lead to lack of speci city in the sorting scheme.
• Neutrophils are complex cells that are fragile and get easily activated during handling 15. Thus, personnel need to be trained, and reagents prepared carefully to yield replicable results.
• Neutrophils contain 10-20 times lower amounts of RNA per cell than PBMC 19. Therefore, if another PBMC cell is sorted along with neutrophil by chance during FACS sorting, the probability of ampli cation of non-targeted cell mRNA during cDNA synthesis is higher, which may lead to erroneous sequencing results.
• Due to the small amount of mRNA in neutrophils it may be necessary to optimize the PCR cycles required to obtain su cient cDNA for the NGS-work ow. We ampli ed the single cell neutrophil cDNA with 21 cycles, similar to single nucleus protocols24 because of low amounts of RNA, compared with 18 cycles for other cell types 26. Some low-copy number transcripts may still be di cult to detect in neutrophils. However, increasing the number of PCR cycles could introduce some ampli cation bias in the library by compressing expression values for high-copy number transcripts.
• Small noncoding RNAs (ncRNAs) and other short sequence mRNAs lacking polyA tails would not be detected. The low amounts of RNA contained in the neutrophils may also prevent the detection of some ncRNAs.
• Since the neutrophils are found to have generally lower RIN values compared to other cell types, it is possible that the RNA of the samples are partly degraded and may result in an increasingly 3' biased library preparation as suggested by Chen et al., and therefore losing valuable reads from your data is a possibility 35.

Future Applications:
Comparing Health and Disease. Viral, autoimmune, metabolic, and chronic in ammatory diseases require novel and non-invasive methods to monitor cellular phenotypes from humans, comparing health versus disease states. This protocol provides the experimental conditions and time needed for processing of neutrophils for NGS-work ow to obtain their transcriptomic signatures.
Revealing Oral-Systemic Axis. Emerging evidence demonstrates that markers expressed in bio uids such as saliva are representative of systemic changes. A protocol for the unbiased evaluation of single cells in saliva could yield a better understanding of systemic health through oral sampling.
Longitudinal Monitoring. Sampling saliva is non-invasive and easy to perform. Thus, continuous monitoring of cells, biomarkers and gene expression patterns in saliva provides an effective system for longitudinal survey. In addition to research studies, this system would also be optimal for the development of novel diagnostic systems and drug delivery. • dNTP mix (10 mM each; Thermo Fisher, cat. no. 18427-088) • ProtoScript II reverse transcriptase (New England Biolabs, cat. no. M0368X). Includes: • ProtoScript II buffer (5X) • Ethanol-molecular biology grade (Sigma-Aldrich, cat. no. E7023-500 ml) • Agencourt AMPure XP beads (Beckman Coulter, cat. no. A63881) • Adapter oligos (See cDNA Synthesis on section 3.4). Locked Nucleic Acid (LNA)-modi ed TSO were ordered from QIAGEN (https://www.qiagen.com/). All other oligos were ordered from IDT (https://www.idtdna.com). All oligos were HPLC-puri ed. The identity of LNA-modi ed TSO compounds is also con rmed by MS. • Fluorochrome conjugated anti-human monoclonal antibodies: • PerCP/Cy5. 5    2) Before collecting the rst oral rinse, each donor must clean their oral cavity and wait for 3 min.
3) Each donor must rinse their oral cavity ve times with ~10 mL of 0.9 % NaCl solution (sterile) for 30-60 sec each time with a gap of 3 min between each rinse to collect 50 mL total volume. (CRITICAL: The collection tube should be immediately proceeded for the enrichment process, as neutrophil RNA keeps degrading over time.

cDNA synthesis by Smart-seq2 • TIMING: 1 day all steps
We performed cDNA synthesis by using modi ed Smart-seq2 protocol, previously published by our team.
Cell lysis, cDNA synthesis and Nextera XT library preparation can be performed using any of the currently available methods for single cells [39][40][41]   PicoGreen dsDNA analysis is performed on the whole plate to accurately quantify the cDNA concentration in each well. Finally, TaqMan assay is performed in qRT-PCR to check the expression of housekeeping genes such as β-actin (ACTB) to make sure that each well has an eukaryotic cell.
(OPTIONAL: TaqMan assay for a known cell-type speci c marker gene(s) can be used to con rm the target cell sorted into each well. We didn't use any neutrophil speci c marker as we performed unbiased sorting. 3.6. Illumina Nextera XT Library preparation of HitPicked cDNA-library plates: Illumina Nextera XT library is prepared for the HitPicked cDNA library plate by using 'Nextera XT DNA library preparation kit' and each sample is barcoded by using 'Nextera XT index kit Set A and Set D' following the manufacturers protocol. We used 1/8 th reaction protocol for automated/robotic dispensing system, where 1/8 th the volume of each reagent is used as mentioned in manufacturers protocol for 96well reaction plate. The required target DNA quality for Nextera Library preparation is 1 ng of input DNA with 260/280 ratio of 2.0 -2. 52) The plate is sealed, mixed by brief centrifuge and loaded on the thermocycler to run the tagmentation reaction by incubation at 55 °C for 10 min.  The following sections have been brie y explained in this paper. For details on "RNA-seq analysis" of the fastq data les, please refer to Step 26 of previous publication from our group 24 .
Steps 84-87, RNA-seq analysis: sequence mapping and gene expression analysis: variable Anticipated Results

Anticipated Results
This protocol enables isolation of immune cells and enrichment of human primary salivary neutrophils, for cell isolation, ow cytometry analysis, sorting and scRNA-seq work ow (Fig. 1). Our protocol shows that the repertoire of a myeloid derived cell can be evaluated at a single cell level after saliva collection.
The use size exclusion allowed > 98 % pure enriched neutrophils with viability compatible with the protocols. In order to develop the protocol that is consistent, oral samples were collected from healthy subjects throughout the project for all the experiments including microscopy (Fig. 2), ow cytometry ( Fig.   3 and Fig. 4), and scRNA-seq ( Fig. 5 and 6). We found that neutrophils were sensitive to cold temperature and physical stress employed in cell isolation procedures for NGS work ow. Chen et. al has shown that the integrity of total RNA is a critical parameter for RNA-seq analysis and degraded RNA heavily in uences the gene expression pro les 35 Fig. 4). After RNA sequencing, our liberal ltration criteria of raw data allowed to obtain transcriptomic signature of neutrophils similar to other cell types and identi cation of eight novel sub-populations of neutrophils from healthy human saliva as compared to four sub-populations identi ed by stringent lter criteria typically used for analysis of RNA-seq data ( Fig. 5 and Fig. 6). In addition, both manual gating analysis and the FLOCK-based automated gating analysis of the ow cytometry data con rmed diverse neutrophil subpopulations and a more stringent analysis revealed ve sub-populations based on markers and density (Fig. 4).
Through microscopy, saliva presented four major different cell types (i.e. epithelial cells, neutrophils, monocytes, and lymphocytes) were initially identi ed based on histochemical morphology by cytospin and GIEMSA staining. For veri cation of the purity of neutrophils the enriched samples from healthy donors were also stained (Fig. 2B) and quanti ed from a minimum of 10 slides per donor which shows 98 % neutrophil purity. This was con rmed by immuno uorescence of CD66b, CD14, CD19 markers ( Fig.   2A  lymphocyte, and unidenti ed cell types (Fig. 2C). Morphological analysis was also con rmed by immuno uorescence analysis.
We further compared gating strategies through viability, expression marker and cell size. When utilizing gating for Aqua LIVE/DEAD xable dye ( Fig. 3A and 3C) neutrophils were positive in high abundance, but minimum monocyte marker expression is detachable. We hypothesized that by gating the cells on their size we would further exclude the other immune cell. High-SSC gating demonstrated high purity of neutrophils and exclusion of other immune cells ( Fig. 3B and 3D). Comparatively, live-cell gating and high-SSC gating were quanti ed and plotted for the type of cells (Fig. 3E and 3F). Thus, the employment of high-SSC high-FSC gating to sort unlabeled cells allows for selection of viable cells, prevents activation and cell death of oral neutrophils, minimizing the possibility of sorting monocytes or lymphocytes.
In fact, when staining for most common blood immune-cell markers i.e. CD11b, CD66b, CD15, CD14, CD19, and CD18 oral neutrophils were positive with different levels of expression. Oral neutrophils were identi ed on CD14 -CD19 -CD3 -CD15 + CD66b + CD11b + Aqua- (Fig. 4) whereas monocytes and lymphocytes were identi ed for CD14 + CD11b+ and CD19+CD3+ respectively in live-cell and high-SSC gated populations ( Fig. 3C and 3D). CD15 + neutrophil density increases from 38.30 % of live-cell gated population for 40 µm ltered un-enriched samples to 60.72 % in 10 µm ltered enriched neutrophil. In addition, this increase in neutrophil was also seen to be associated with more CD14 + monocytes in 10 µm ltered samples (Fig. 3C). When using the high-SSC gating (instead of live-cell gating), an increase of CD15 + neutrophil density from 30.10 % of high-SSC gated population in 40 µm ltered sample to 47.35 % in 10 µm ltered enriched sample. Though this increase in density of neutrophil is not very high, the CD14 + monocytes were completely eliminated on 10 µm ltered enriched samples (Fig. 3D). Similar to monocytes, CD3 + T cells and CD19 + B cells were totally eliminated even in 40 µm ltered samples when selected for high-SSC gate instead of live-cell gate. Similarly, when the frequency of each immune-cell types in the live-cell gated population (Fig. 3E) and high-SSC gated population (Fig. 3F) compared to the total ungated cell population, we see an increase in the neutrophil population and decrease in monocytes,

B cells and T cells.
To characterize heterogeneity of neutrophils in a data-driven way, we employed an unsupervised data clustering method 24 (FLOCK, http://immportgalaxy.org) to understand cell phenotype differences and cell surface markers expression levels (Fig. 4). We rst applied the FLOCK method to identify cell populations from the 40 µm ltered sample, before applying the identi ed cluster centroids across all 6 samples for cross-sample identi cation and comparison of the 23 cell populations. The advantage of unsupervised clustering analysis is being data-driven, without requiring or being limited by prede ned cellular phenotype. Therefore, manual annotation of each identi ed data cluster for identifying the cellular phenotype and interpreting the phenotype difference between the identi ed cell populations is usually required. Percentages of identi ed cell populations, mean uorescence intensities (MFI) for each marker, expression pro les across the markers (levels 1 to 4, from negative, low, positive to high) as well as 2D dot plots of samples with cell populations highlighted in different colors are automatically generated or calculated by the FLOCK procedure. Heatmap of MFI (Fig. 4A) and bar graphs of population percentages ( Fig. 4B) are commonly used to visualize the characteristics of the FLOCK-identi ed cell populations.
From the FACS Aria II data, the FLOCK-identi ed neutrophil phenotype included: • Pop10: CD14 -CD19 -CD3 -CD15+CD66b + CD11b + Aqua-CD18 - • Pop12: CD14intCD19-CD3-CD15 + CD66b + CD11b + Aqua-CD18 -• Pop14 and Pop15: CD14intCD19-CD3 -CD15hiCD66bhiCD11b + Aqua-CD18int • Pop17: CD14intCD19intCD3-CD15hiCD66bhiCD11b+Aqua-CD18int When examining the percentage values of the 5 neutrophil subpopulations, we found that the abundant/known neutrophil subpopulations (Pop10 and Pop12, with different expression levels of CD14) increased frequency as the ltration size decreased, which con rmed the nding in Fig. 3. However, FLOCK also identi ed three other rare salivary neutrophil subpopulations (Pop14, Pop15, and Pop17) that were not in the region of the "classical" neutrophils. These rare neutrophil subpopulations have larger size and complexity (based on FSC.A and SSC.A, Fig. 4C) as well as slightly higher expression on CD15/C66b/CD18 than the classical neutrophils (Fig. 4C). The frequencies of these rare neutrophil cell populations did not increase as the ltration size decreased. Limited by the small number of markers measured in ow cytometry, this nding further emphasizes the necessity of performing a single cell RNA-seq assay to elucidate transcriptional pro les of these interesting neutrophil subpopulations.
Because we determined the optimal gating for oral neutrophils that maintain their viability and purity, we pursued sorting single neutrophils for RNA-seq. After raw sequencing les were demultiplexing using Illumina barcodes, and processed 29 , we further evaluated expression values for non-control cells were imported into Scanpy for PCA, UMAP, and cluster analysis 31 . The liberal ltering 8 neutrophil subpopulations and top 2500 variable genes were selected (Fig. 5). In fact, half of the subpopulations presented differential expression of surface markers when compared to other clusters. In contrast the strict ltering showed 4 subpopulations and top 2000 variable genes were selected (Fig. 6). Unsupervised clustering determined the differential expression of each cell population provided individual single cells grouped into cell type clusters, and a set of sensitive and speci c marker genes. Logistic regression showed the comparison of gene levels in each cluster ( Fig.5B and 6B). Oral neutrophils highly expressed the following gene markers CD11c, CD14, CD16a, CD16b, CD32, CD55, CD62L, CD141, Lysozyme, BEST1, FTH1, with moderately levels of CD10, CD11b, CD18, CD31, CD50, CD63, CD85D and low levels of CD11a, CD13, CD19, CD43, CD170, CD172A, CHEMR23 ( Fig. 5E and 6E). These gene signatures are important to understand future neutrophil functions.
Previous single-cell sequencing of immune cells was unsuccessful in generating desirable transcriptomic pro le/signature of neutrophils among PBMCs 43 . We have evaluated a dataset that is publicly available ( Supplementary Fig. 6). Evaluation following our oral immune-cell pipeline applied to the publicly available dataset accession number GSE64655. Similar to oral neutrophils, blood derived neutrophils presented strong RNA expression levels of CD55, CD16b, CD15, and not CD66b. Blood neutrophils however presented much higher expression levels on CD10, CD11a and CD11b. We believe that the cell collection procedures of PBMCs such as the Ficoll-Hypaque method are not suitable for neutrophils as the NDNs found in the granulocyte layers at the interface of red blood cells and the gradient layer are excluded during the sample collection. In health and disease this is extremely important. SLE is characterized by neutrophil subsets known as low-density granulocytes (LDGs). When compared to other immune cell subtypes, LDGs showed the highest expression of interferon gamma genes, CD10 with subpopulations that were speci cally positive correlation with disease severity and coronary patterns 44 .. In addition, neutrophils patterns were also described in the emerging COVID-19 infectious disease.
Bronchoalveolar single cell analysis demonstrated that myeloid cells such as macrophages were highly in amed expressing high levels of CD68, while neutrophils from infected lungs were highly expressed (FCGR3B or CD16b) 45 . Interestingly, similar to lung neutrophils, oral neutrophils highly expressed Cd16b as part of our cell clustering system. Whereas similar to SLE cells, oral cells expressed moderate levels of CD10, revealing that oral neutrophils present shared markers with systemic cells. Future studies should implement similar protocols for systemic neutrophils compared to oral cells to uncover methods and heterogeneity.
Altogether, we developed a protocol to survey neutrophil cell heterogeneity by improving isolation methods, ow cytometry evaluation, single-cell RNA extraction, sequencing and bioinformatic pipeline.
Our ndings suggest novel transcriptomic signatures with identi cation of novel sub-populations. By combining ow cytometry with machine learning, we validated our model and gating strategies which led