Single-cell RNA expression profiling shows that ACE2, the putative receptor of COVID-2019, has significant expression in nasal and mouth tissue, and is co-expressed with TMPRSS2 and not co-expressed with SLC6A19 in the tissues

A novel coronavirus (COVID-2019) was first identified in Wuhan, Hubei Province, and then spreads to the other Provinces of China. COVID-2019 was reported to share the same receptor, Angiotensin-converting enzyme 2 (ACE2), with SARS-CoV. But the infection rate of COVID-2019 is much higher than SARS-CoV. The biophysical and structural evidence showed that the COVID-2019 binds ACE2 with 10~20 times affinity than SARS-CoV. TMPRSS2 cleaves ACE2 and facilitates the entry of the virus into host cells. The presence of SLC6A19 may block the access of TMPRSS2 to the cutting site on ACE2 and weaken the entry of COVID-2019 into host cells. Here based on the public single-cell RNA-Seq datasets, we analyzed the ACE2 expression in the nasal, mouth, lung, and colon tissues. We find that the number of ACE2-expressing cells in the nasal and mouth tissues is comparable to the number of ACE2-expressing cells in the lung and colon tissues. We also find that ACE2 tends to be co-expressed with TMPRSS2 and not co-expressed with SLC6A19 in the nasal and mouth tissues. With the results, we infer that nasal and mouth tissues may be the first host cells of COVID-2019 infection. In our previous report in medRxiv and a recurrent report in New England Journal of Medicine, the COVID-2019 load The number of cells expressing ACE2 in each tissue could not be estimated. In previous work, we found that a significant number of epithelial cells from nasal tissue have ACE2 expression, and nasal-swabs tend to have a higher COVID-2019 load than the throat-swabs 9 . Lirong Zou et al. also found that the COVID-2019 load tends to be higher in nasal-swabs than in throat swabs in the asymptomatic or minimally symptomatic patients 10 . Here, we analyzed ACE2 single-cell expression profiles in the non-immune cells of the nasal, mouth, lung, and colon tissues. We find that 2.5% non-immune nasal tissue cells, 2% non-immune mouth tissue cells, 5.6% non-immune lung tissue cells, and 2.8% epithelial cells of the colon have ACE2 expression. For the non-immune tissue cells, the percent of ACE2-expressing cells in the nasal and mouth tissues is comparable to the percent of ACE2-expressing cells in the lung and colon tissues. We also find the non-immune cells expressing ACE2 tend to express TMPRSS2 but not to express SLC6A19 in the nasal and mouth tissues, which suggests they are sensitive to the infection of COVID-2019. With the results, we infer that the nasal and mouth tissues may be the first host cells of COVID-2019 infection. We need to pay more attention to protect the nose and mouth from COVID-2019 infection.


Introduction
Severe infection by COVID-2019 could result in acute respiratory distress syndrome (ARDS) and sepsis, causing death in approximately 2% of infected individuals 1  Epidemiological evidence showed that the infection rate of COVID-2019 is much higher than SARS-CoV. Yang  found ACE2 is abundantly present in humans in the epithelia of the lung and small intestines. They also found ACE2 expression in the basal layer of the non-keratinizing squamous epithelium in nasal and oral mucosa and the nasopharynx. However, the experiment is conducted at the bulk level. The number of cells expressing ACE2 in each tissue could not be estimated. In previous work, we found that a significant number of epithelial cells from nasal tissue have ACE2 expression, and nasal-swabs tend to have a higher COVID-2019 load than the throat-swabs 9 . Lirong Zou et al. also found that the COVID-2019 load tends to be higher in nasal-swabs than in throat swabs in the asymptomatic or minimally symptomatic patients 10 . Here, we analyzed ACE2 single-cell expression profiles in the non-immune cells of the nasal, mouth, lung, and colon tissues. We find that 2.5% non-immune nasal tissue cells, 2% nonimmune mouth tissue cells, 5.6% non-immune lung tissue cells, and 2.8% epithelial cells of the colon have ACE2 expression. For the non-immune tissue cells, the percent of ACE2expressing cells in the nasal and mouth tissues is comparable to the percent of ACE2expressing cells in the lung and colon tissues. We also find the non-immune cells expressing ACE2 tend to express TMPRSS2 but not to express SLC6A19 in the nasal and mouth tissues, which suggests they are sensitive to the infection of COVID-2019. With the results, we infer that the nasal and mouth tissues may be the first host cells of COVID-2019 infection. We need to pay more attention to protect the nose and mouth from COVID-2019 infection. Using Seurat, we performed unsupervised graph-based clustering on these single-cell RNA-Seq datasets. Then, We ran the Uniform Manifold Approximation and Projection (UMAP) dimensional reduction technique to visualize the data. Next, we used the violin plot to find the clusters with significant PTPRC (CD45) expression in the datasets and filtered out the clusters. Thus, we got non-immune cells in the datasets. We calculated the percent of ACE2-expressing cells in each dataset (Table 1). We find that, in nasal tissue, 2.5% non-immune cells from nasal brushing and 1.7% non-immune cells from turbinate have ACE2 expression. 2% non-immune malignant and normal cells from mouth show ACE2 transcription. In lung tissue, 0.2% non-immune cells from the bronchial biopsy, 5.6% non-immune cells from bronchial brushings, and 1.1% non-immune cells from bronchioli terminales have ACE2 transcription. 2.8% non-immune cells from colon epithelial cells express ACE2.

ACE2 tends to be co-expressed with TMPRSS2 in the nasal, mouth, lung, and colon tissues and co-expressed with SLC6A19 in the colon tissue at single-cell resolution
We employed the hypergeometric test to test the co-appearance of ACE2 expression, TMPRSS2 expression, and SLC6A19 expression in the non-immune cells. We find that ACE2expressing cells of all the single-cell RNA-Seq datasets tend to express TMPRSS2 at singlecell resolution (Table 1; P-value <0.05). We detected no or few SLC6A19-expressing cells in the single-cell RNA-Seq datasets from nasal, mouth, and lung tissues. We find that ACE2-expressing cells from colon tissue tend to express SLC6A19 at single-cell resolution (Table 1; P-value <0.05). We plotted the distribution of the detected gene number per cell in each single-cell RNA-Seq dataset. The median of the number of detected genes in the colon tissue dataset is around 1000, which is lower than the median of the number of detected genes in most other datasets from nasal, mouth, and lung tissues ( Figure 1). We believe that the single-cell RNA-Seq datasets have detected enough genes per cell. The fact that no or few SLC6A19-expressing cells were detected in the nasal, mouth, and lung tissue should not be attributed to the data quality.

Discussion
With single-cell RNA-Seq technology, we precisely calculated the number of ACE2expressing cells in nasal, mouth, lung, and colon tissues and find the number of ACE2expressing cells in nasal and mouth tissues is comparable to the number of ACE2-expressing cells in the lung and intestine tissues. Also, we found that ACE2 tends to be co-expressed with TMPRSS2 in the nasal and mouth tissues. Since no or few SLC6A19-expressing cells were detected in the single-cell RNA-Seq datasets from nasal and mouth tissues. We infer that ACE2 tends to be not co-expressed with SLC6A19 in the nasal and mouth tissues. It has been reported that TMPRSS2 will facilitate the virus into host cells, while SLC6A19 will prevent the virus into host cells. We believe the nasal and mouth tissue could be the first host cells of COVID-2019 infection.
Here, we emphasize the importance of wearing a mask to protect people from COVID-2019 infection. We further emphasize the necessity of detecting COVID-2019 in both the nasal tissue and throat tissue samples, with the fact that a large number of infected people have no or few clinical symptoms. We also think it is not a good idea to let close contacts isolate themselves at homes. The families will become the victim if they don't always wear masks. We admit the single-cell profiling of mouth tissue with squamous cell carcinoma is not an ideal model to study ACE2 expression in mouth tissue. However, it is the only single-cell RNA-Seq data that we can find having profiled mouth tissue at single-cell resolution. The malignant cells are derived from normal epithelial cells of the mouth; we believe most of the malignant cells should still maintain their tissue specificity.

Method
The single-cell RNA-Seq datasets of mouth, bronchial brushing, lung (bronchioli terminales), and colon epithelial tissues were downloaded from GSE103322, GSE131391, GSE122960, and SCP259 (Single Cell Portal). The single-cell RNA-Seq datasets of turbinate, nasal brushing, and bronchial biopsy were downloaded from GSE121600.
Single-cell RNA-Seq dataset pre-processing We employed Seurat (3.1.4) to process the single-cell RNA-Seq datasets. At first, we filtered out the cells 1-expressing less than 200 genes, or 2-highly expressing mitochondrial genes, in which mitochondrial genes' reads account for more than 25% of the total reads. We filtered out the genes expressing in less than three samples. Then, we got the processed single-cell RNA-Seq datasets.
Single-cell RNA-Seq dataset clustering and visualization We employed Seurat in default mode to cluster and visualize cell-clusters (See supplemental file S1.txtc for the R code).
Identification of the non-immune cells We used the violin plot to check the PTPRC expression in each cluster in each single-cell RNA-Seq dataset. The cluster whose PTPRC expression having a spindle body in the violin plot was filtered. Thus, we got the non-immune cells of each single-cell RNA-Seq dataset.
Test the significance of enrichment of ACE2-expressing cells in TMPRSS2-expressing cells and SLC6A19-expressing cells.
We employed the hypergeometric test to test the significance of enrichment of ACE2expressing cells in TMPRSS2-expressing cells. Supposed is the number of total sequenced cells, M is the number of TMPRSS2-expressing cells, K is the number of ACE2-expressing cells, we calculated the possibility (p) of finding x or more than x cells of ACE2 expression and TMPRSS2 expression when we randomly picked K cells from total sequenced cells ( ). We used R function phyper to calculate p as follow, Figure 1 The distribution of the number of detected genes per cell in the seven single-cell RNA-Seq datasets We calculated the P-value of the hypergeometric test to measure the significance of enrichment of ACE2-expressing cells in TMPRSS2-expressing cells and SLC6A19-expression cells. However, we omitted the data whose ACE2-expressing-cells/TMPRSS2-expressingcells/SLC6A19-expression-cells are less than 4.