Smoking Modulates Different Secretory Subpopulations Expressing SARS-CoV-2 Entry Genes in the Nasal and Bronchial Airways

Background: SARS-CoV-2 infection and disease severity are influenced by viral entry (VE) gene expression patterns in airway epithelium. The similarities and differences of VE gene expression (ACE2, TMPRSS2, and CTSL) across nasal and bronchial compartments has not been fully characterized using matched samples from large cohorts. Results: Gene expression data from 793 nasal and 1,673 bronchial brushes obtained from individuals participating in lung cancer screening or diagnostic workup revealed that smoking was the only clinical factor significantly and reproducibly associated with VE gene expression. ACE2 and TMPRSS2 expression were higher in smokers in the bronchus but not in the nose. scRNA-seq of nasal brushings indicated that ACE2 co-expressed genes were highly expressed in club and C15orf48+ secretory cells while TMPRSS2 co-expressed genes were highly expressed in keratinizing epithelial cells. In contrast, these ACE2 and TMPRSS2 modules were highly expressed in goblet cells in scRNA-seq from bronchial brushings. Cell-type deconvolution of the RNA-seq confirmed that smoking increased the abundance of several secretory cell populations in the bronchus, but only goblet cells in the nose. Conclusions: The association of ACE2 and TMPRSS2 with smoking in the bronchus is due to their high expression in goblet cells which increase in abundance in current smoker airways. In contrast, in the nose these genes are not predominantly expressed in cell populations modulated by smoking. Smoking-induced VE gene expression changes in the nose likely has minimal impact on SARS-CoV-2 infection, but in the bronchus, smoking may lead to higher viral loads and more severe disease.


Background
As of August 14th, 2021, over 205 million con rmed cases and 4.3 million deaths have been reported globally for COVID-19 (https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situationreports). SARS-CoV-2 infects cells by utilizing host cell-surface proteins for viral entry (VE). VE proteins include angiotensin-converting enzyme 2 (ACE2), which provides a binding site for the virus; proteases such as transmembrane protease serine 2 (TMPRSS2) that can cleave the viral S glycoprotein; and cathepsin L (CTSL), which can also aid with S glycoprotein priming [1][2][3][4] . Prior studies have attempted to establish associations between the expression of VE genes and clinical variables to identify factors that may in uence infection incidence or COVID-19 severity.
Several studies have reported con icting ndings regarding associations between VE gene expression and clinical factors due to small sample sizes and exclusion of important covariates. Smith et al 5 10 reported that ACE2 expression was associated with age; however, Smith et al 5 found that ACE2 expression was equivalent between young and elderly individuals. Moreover, earlier studies with limited numbers of subjects led to the conclusion that race was associated with ACE2 expression 11 , which was not found by Smith et al in a larger dataset 5 . Lung airway expression of both ACE2 and TMPRSS2 was also found to be signi cantly up-regulated in patients with chronic obstructive pulmonary disease (COPD) compared with healthy subjects or in smokers compared with non-smokers 12 without adjustment of important factors such as age and sex. The goal of this study is to establish associations between VE gene expression and clinical factors leveraging a large number of airway samples with extensive clinical data, matched between nasal and bronchial compartment from multiple independent cohorts, and to identify the biological processes and cell types associated with the expression of VE genes in the nasal and bronchial compartments.

Results
Gene expression pro les of 796 nasal and 1,673 bronchial brushing samples were generated using bulk RNA-seq from 4 cohorts, DECAMP 13 (Table 1). Study participants were current or former smokers undergoing bronchoscopy as part of either a diagnostic workup or a screening process for lung cancer. Clinical data collected on samples included smoking status, sex, age, forced expiratory volume during the rst second (FEV1) percent Predicted, and, in the AEGIS study, lung cancer status (Supplemental Tables 1). Notably, we observed different patterns of association with smoking between ACE2 and TMPRSS2 gene expression between the nasal and bronchial epithelium. ACE2 was down-regulated in the nasal epithelium of current smokers in 1 out of 2 cohorts (p < 0.01) but strongly up-regulated in the bronchus in all four cohorts (p < 0.01; Figs. 1a and 1b; Supplemental Fig. 1a and 1b). TMPRSS2 expression was not associated with smoking in the nose but its expression was up-regulated in current smokers in the bronchus in 3 out of 4 cohorts (p < 0.001). CTSL expression was down-regulated in current smokers in the nasal epithelium in 1 out of 2 cohorts (p < 0.01) and the bronchial epithelium in 3 out of 4 cohorts (p < 0.05). These genes were not strongly associated with other clinical variables (sex, age, FEV1% Predicted) across cohorts in either the nasal or bronchial samples in contrast to some previous studies 5,8,11,12 . Adjusting for lung cancer diagnosis in the AEGIS cohort did not change these observed correlations, although positive lung cancer status negatively correlated with ACE2 expression in the nasal epithelium (p<0.05, Supplemental Fig. 1b). Upon examining paired nasal and bronchial samples from the same subjects, a lack of correlation in the expression of each VE gene was observed between the two tissues (p > 0.05; Fig. 1c Table 3), suggesting that the biological processes that in uence the expression of these genes may differ across airway sites. To further explore the biological pathways associated with VE genes in each site of the airway, we developed co-expression networks using Weighted Gene Correlation Network Analysis (WGCNA) 17 . We identi ed 58 and 48 gene modules within the nasal and bronchial cohorts, respectively ( Fig. 2a and Fig. 3b) to those presented in Figure 1 to those presented in Figure 1.
The nasal VE gene module eigengenes were not highly correlated, however, the ACE2-and TMPRSS2associated module eigengenes in the bronchus were highly correlated across all four datasets (Supplemental Figs. 2a and 2b), suggesting that similar biological processes control these two gene modules in the bronchial compartment.
Comparing biological pathways enriched among VE gene modules between the nose and bronchus revealed ( Fig. 2c; Supplemental Table 3) that genes of the nasal ACE2 module were enriched in the in ammatory response, interferon-alpha signaling, and interferon-gamma signaling pathways while the bronchial ACE2 gene module was enriched in genes associated with protein secretion and androgen response. Genes of the TMPRSS2 module in the nose were enriched in estrogen response and KRAS signaling pathways while the module in the bronchial epithelium was enriched in MTORC1 and TNFα-NFκB signaling pathways. The lack of shared biological pathways mirrors the small number of genes shared by the nasal and bronchial modules for ACE2 (n = 5) and TMPRSS2 (n = 12). In contrast, the nasal and bronchial CTSL modules shared 177 genes (Fisher's exact test, p = 2.23E-10) and were both enriched in genes related to in ammatory response and TNF-alpha signaling pathways. Furthermore, in the bronchus, ACE2 and TMPRSS2-associated modules share enrichment in genes involved in the p53 pathway, adipogenesis, androgen response, MTORC1 signaling, and estrogen response (Supplemental Fig. 3c). We also did not observe signi cant correlations between the eigengenes of each VE gene module in the paired nasal and bronchial samples ( Fig. 2d and Supplemental Fig. 3d). The VE gene module and single-gene analyses are consistent and suggest that there are important differences between VE gene expression and biology between the upper and lower airways.
To determine if different cell populations contribute to the difference in VE gene expression between airway compartments, we leveraged single-cell RNA sequencing (scRNA-seq) data to characterize the expression patterns of individual VE genes and VE gene modules across major epithelial and immune cell types. We pro led 34,833 cells from 9 nasal brushings (4 collected from two volunteers and 5 from 5 patient undergoing lung cancer screening) and 2,075 cells from 17 bronchial brushings from patients undergoing bronchoscopy for suspicion of lung cancer (Supplemental Tables 6, 7). A total of 17 and 15 transcriptionally distinct cell clusters were identi ed in the nose and bronchus, respectively, many of which expressed similar marker genes: KRT5, KRT15 (basal cells); FOXJ1, C20orf85, CDC20B (ciliated cells); SCGB1A1, SERPINB3 (club cells); MUC5AC, TFF1, TFF3 (goblet cells); FOXI1, CFTR (ionocytes); CD3D, CD8A (T cells). We also identi ed two bronchial goblet-like secretory populations: one characterized by high expression of CEACAM5 but not MUC5AC, previously described as peri-goblet cells 18 , and the other by high expression of HLA-DQA1 and MHC class II genes (Figs. 3a-b, Supplemental Fig. 4). In the nose, we discovered two club-cell-like secretory cell clusters (STATH+ and C15orf48+), as well as a novel cell cluster of "keratinizing epithelial cells" expressing genes involved in corni cation, epidermis development, and keratinocyte differentiation, such as SPRR3 and SPRR2A. Immune populations represent <1% and 26% of total cell populations in the nasal and bronchial epithelium, respectively. In both nasal and bronchial datasets, the cell subpopulations were consistently observed across multiple patients (Supplemental Fig. 5).
We found low expression of the VE genes in the single cell data, consistent with other reports (Supplemental Fig. 6) 19,20 , so we calculated VE gene module metagene scores in each cell to understand how biological processes associated with the VE genes in the bulk RNA-seq data are distributed across nasal and bronchial cell populations (Figs. 3c-e). In the nose, the ACE2 module was moderately expressed across many cell types but was most highly expressed in C15orf48+ secretory cells and club cells. The TMPRSS2 module was predominantly expressed in the keratinizing epithelial cells, followed by C15orf48+ secretory cells. The expression of gene modules of ACE2 and TMPRSS2 is different from the gene expression pattern reported by Sungnak et al in that TMPRSS2 gene expression was limited to ACE2+ cells 21 . In the bronchus, both the ACE2 and TMPRSS2 modules were expressed at the highest levels in the goblet cell population, in agreement with previous ndings 22 . The highest median metagene score for the CTSL module was in the neutrophils and macrophages in both the nose and bronchus; however, expression was also high in T cells in the nose and dendritic cells in the bronchus. We found very few cells co-expressing ACE2, CTSL, and TMPRSS2 at high levels -114 out of 34833 cells in the nose and no cells in the bronchus. On the other hand, we found that ACE2 and TMPRSS2 modules were more likely to be highly co-expressed in nasal keratinizing epithelial cells (odds ratio = 7.56), nasal C15orf48+ secretory cells (N = 518, odds ratio = 7.36), and bronchial goblet cells (odds ratio = 16.80) (N = 806, Supplemental Table 6, FDR q < 0.001). Thus, ACE2 and TMPRSS2 were found to be expressed in different nasal and bronchial cell populations, which may be in uenced by smoking in ways that lead to their divergent correlation with smoking in the two airway compartments.
To investigate how smoking modulates different cell populations, we computationally deconvolved cell population proportions in the bulk RNA-seq data using gene markers identi ed from the single-cell data (Supplemental Table 7). Higher proportions of goblet cells were observed in current smokers in both the nose and bronchus in all 6 cohorts, consistent with the previous studies 18 . Increased proportions of ionocytes in current smokers were also observed in the nose (1 out of 2 cohorts) and bronchus (all 4 cohorts). The proportion of ciliated cells was signi cantly lower in smokers in all 6 cohorts (Figs. 4a-b, Supplemental Figs. 7a-d, p < 0.05). While goblet cell proportions were signi cantly higher in smokers in both the nose and bronchus, the ACE2 module was highly expressed only in bronchial, not nasal goblet cells. These results may explain the lack of association of ACE2 expression with smoking in the bulk RNA-seq nasal data. Additionally, in the AEGIS nasal data (Supplemental Fig. 7 ) the fraction of keratinizing epithelial cells is increased and STATH+ secretory cells is decreased in current smokers, suggesting that shifts in these populations may be partially responsible for the differential correlation with smoking between ACE2 and TMPRSS2 in the nose. For CTSL, both nasal datasets showed a signi cant decrease in the proportion of the neutrophil/macrophage population with respect to smoking.
While this may partially explain the down-regulation of CTSL in smokers in the nasal bulk RNA-seq data, the overall proportions of these populations estimated in the bronchial data by the deconvolution were close to zero for most samples, which could have prevented us from con rming the decrease of this population with smoking in the bronchus.

Discussion
The current study investigated the relationship between the expression of genes encoding proteins important for SARS-CoV-2 entry (ACE2, CTSL, and TMPRSS2) and clinical factors including age, sex, COPD, and smoking in both the nose and bronchus. In our datasets, only smoking status showed a consistent association with the expression of VE genes across cohorts within each airway compartment.
In the bronchus, current smoking status was associated with higher ACE2 and TMPRSS2 expression; however, in the nose, current smoking status was inversely correlated with ACE2 expression and was not associated with TMPRSS2 expression as in prior studies [5][6][7]10 . The cohorts used in this study contain subjects at high risk for developing lung cancer and are thus composed of older subjects (mean age = 63, SD = 8); therefore, we did not nd an association between nasal ACE expression and age as previously reported by Bunyavanich et al 8 in a study comparing children and adults. We also did not nd sex to be associated with ACE2 expression as was reported by another study with a lower sample size 9 and could not con dently assess differences in expression pro les between racial subgroups as our cohorts are predominantly (> 70%) composed of White subjects. Our analyses did not nd signi cant VE gene expression correlations between paired nasal and bronchial samples suggesting that there are differences in biological pathways or cell types between the airway compartments.
Our WGCNA analysis demonstrated that genes co-expressed with ACE2 and TMPRSS2 in the bulk RNAseq data were more highly correlated to one another in the bronchus than in the nose and that pathway enrichment was different for these modules in the upper and lower airways. Similarly, scRNA-seq data showed that ACE2 and TMPRSS2 modules were co-expressed in bronchial goblet cells but had distinct patterns of expression across nasal cell populations. Our high-resolution nasal scRNA-seq dataset containing over 30,000 cells allowed us to differentiate between secretory cell populations to show that C15orf48+ and keratinizing epithelial cells had the highest expression of ACE2 and TMPRSS2 modules, respectively. In contrast, we found that bronchial goblet cells had the highest expression of both ACE2 and TMPRSS2 modules, and only the goblet cell population was increased in smokers in our deconvolution analysis. Of note, ACE2 and TMPRSS2 modules were also co-expressed highly within the peri-goblet cells in the bronchus, which was also observed by Lukassen et al 20 .
Interestingly, the nasal and bronchial CTSL modules showed enrichment of immune-associated biological programs and were highly expressed in similar immune cell populations. Despite these similarities, CTSL gene or module expression was not correlated between paired bronchial and nasal samples potentially due to the low abundance of immune populations in the scRNA-seq and bulk RNAseq data.

Conclusions
Our study leveraged bulk and single-cell gene expression pro les from large cohorts to show that current cigarette smoking status is consistently associated, in a site-speci c manner, with the expression of genes required for SARS-CoV-2 entry in the nose and bronchus. This difference of the association of ACE2 and TMPRSS2 expression with smoking between the nasal and bronchial compartments is likely due to the expression patterns of these genes in distinct cell subpopulations in the nose and bronchus, as well as the way the proportion of these subpopulations change with respect to smoking between the two compartments. Future work investigating other putative viral entry genes such as FURIN and ATRNL1 20-22 may build upon these ndings to enhance our understanding of the effect of smoking on SARS-CoV-2 infection. The results of our study of the expression of ACE2 and TMPRSS2 suggest that smoking is unlikely to impact the likelihood of SARS-CoV-2 infection in the upper airways, but that it may play a signi cant role in COVID-19 disease progression and severity.

Methods
Datasets with gene expressions pro led by bulk RNAsequencing or microarray. Single-cell analysis of nasal and bronchial samples.
Preprocessed count matrices of the nasal and bronchial samples were pre-processed using the Scruff package 27 and analyzed using the Seurat 3.0 28 with standard settings. Quality Control of the nasal and bronchial count matrices was performed using SCTK-QC pipeline 29,30,31 . Uniform Manifold Approximation and Projection (UMAP) was used for dimension reduction and visualizing relationships amongst cells. Canonical marker genes were utilized to identify cell types when possible. Library normalized counts were used for generating plots with single gene expression. And the meta-gene score for each VE module was calculated by averaging the module genes across normalized expressions. A high expression for a gene module is de ned by an expression greater than one standard deviation above the mean.
Deconvolution of bulk RNA-seq gene expression.
Estimated proportions of cells in each of the bulk RNA samples were computed using the Autogenes Package 32 with reference gene expression pro les (GEPs) derived from the nasal and bronchial scRNAseq datasets.

Declarations
Ethics approval and consent to participate The DECAMP study was approved by the Human Research Protection O ce (HRPO) of the Department of Defense, and the individual site IRBs for every participating site. All subjects were approved for written informed consent to participate in the study in accordance with IRB regulations. The generation of nasal scRNA-seq data was approved by the IRBs at Boston University School of Medicine and Lahey hospital. The generation of bronchial scRNA-seq data was approved by the IRB at Boston University School of Medicine. Samples were collected according to the study protocol. Written informed consent was obtained from all participants.

Consent for publication
Not applicable.

Availability of data and materials
The datasets generated and analyzed during the current study are being prepared for submission to the Gene Expression Omnibus (GEO).   Consensus nasal and bronchial VE gene modules are associated with different biological pathways. (a-b) WGCNA was used to identify consensus co-expression modules in nasal (a) and bronchial (b) samples from the DECAMP cohort (n=58 and 48 modules, respectively). A heatmap of the correlation of module eigengenes computed from each module demonstrates that ACE2 and TMPRSS2 modules were more highly correlated with each other in the bronchus than in the nose. The CTSL module was not correlated with ACE2 or TMPRSS2 modules in either the nose or bronchus. (c) Scatterplots comparing the overrepresentation of MSigDB Hallmark pathway gene sets in each VE module in the nose and bronchus.

Competing interests
The -log10(FDR q) values denoting the degree of overrepresentation of each gene set in each module in the nose and bronchus are shown on the x-and y-axes, respectively. The overrepresentation of gene sets in the ACE2 and TMPRSS2 modules is largely discordant between the nose and bronchus, whereas several immune-related pathways are overrepresented in the CTSL module in both sites. (d) VE module eigengenes were not signi cantly correlated (Pearson p > 0.05) between paired nasal (x-axis) and bronchial (y-axis) samples in DECAMP (N = 114). The blue line is the line of best t and the gray shading represents the 95% con dence level interval for predictions from the linear model.  Signi cant cell proportion differences between current and former smokers were determined by the Wilcoxon test (* indicates FDR q < 0.05). Bottom: the heatmap displays the average VE module score for each cell population from the single-cell RNA-seq data ( rst three rows), as well as the proportion of each cell type that is ACE2+/TMPRSS2+ (i.e., the expression of both genes is one standard deviation above the average). Cell types that express both ACE2 and TMPRSS2 modules are in red.
Prevalence of cell types enriched for ACE2 and TMPRSS2 expression shows positive correlation with smoking only in the bronchus, which may explain the lack of association of ACE2/TMPRSS2 expression between the bronchial and nasal bulk RNA-seq data.