Identication of Type 2-pediatric Asthma Based on Single-cell Transcriptomic Analysis

Type 2-pediatric asthma characterized by T2 cytokine-driven airway inammation is the most common type of asthma. Recently, T2 cytokine inhibitors have reduced the exacerbation rates of asthma, but their ability to improve lung function is limited. Screening novel therapeutic strategies for Type 2-pediatric asthma patients is imperative. We obtained single-cell RNA sequencing (scRNA-seq) describing the chronic stimulation GSE145013 dataset with IL-13. Consensus clustering was performed to classify pediatric asthmatic patients from validation datasets GSE65204 and GSE40888, based on the cell marker genes. We found three cellular subtypes including ciliated cells, secretory cell 1, and secretory cell 2. The expression of CCL26, PRB1, and SLC9B2 was higher in secretory cell 1, while SCGB3A1 and BPIFA1 were higher in secretory cell 2. Consensus clustering based on the ve cell marker genes produced two patient subtypes (C1 and C2). The expression of SCGB3A1 and BPIFA1 was higher in C2 subtypes, while CCL26, PRB1, and SLC9B2 was higher in C1 subtypes. Patients in C2 subtypes may more secretory cell 2, while the patients in C1 may have higher secretory cell 1 in the inltrate. More Type 2 T helper cells were in the inltrate in the C2 subtype, while type 1 T helper cells were higher in the C1 subtype. T2 cytokines (IL-13, IL-33, IL-3, IL-4, and TSLP) were expressed more in the C2 subtype, corresponding to Type 2-pediatric asthma. This study identied ve cell marker genes to screen Type 2-pediatric asthma that could potentially be therapeutic targets for Type 2-pediatric asthma.


Introduction
Bronchial asthma is a common respiratory disease in children, mainly related to environmental and genetic factors. It is more common in spring and autumn owing to increased airway reactivity, resulting in airway constriction [1]. Type 2-pediatric asthma as the most common type of asthma, is characterized by in ltration of in ammatory cells in the lung tissue, including eosinophils, mast cells, lymphocytes, accompanied by excessive differentiation of Type 2 T helper cells. The in ltrating Type 2 T helper cells secrete cytokines (IL-4, IL-5, and IL-13), which initiate and exacerbate allergic in ammation [2]. Recently, T2 cytokine inhibitors have been shown to reduce the exacerbation rates of Type 2-pediatric asthma, but their ability to improve lung function is limited. Therefore, further study on the mechanism of Type 2pediatric asthma at the molecular level and the search for biomarkers related to Type 2-pediatric asthma will be bene cial to the development of new targeted drugs for Type 2-pediatric asthma.
Single-cell RNA sequencing (scRNA-seq) is a new technology for the ampli cation and sequencing of the whole transcriptome at the single-cell level and can obtain the mRNA expression information of each cell and determine different cell subsets [3]. ScRNA-seq has been used in the study of asthma, and some progress has been made. Tibbitt et al. [4] revealed a novel gene expression signature of airway epithelium Th2 cells in asthma patients, which provides a basis for the study of the mechanism of Th2 cells involved in asthma. Jiang et al. [5] screened speci c genes to characterize and quantify mast cells in asthma using scRNA-sEq. Ailu Chen et al. [6] identi ed six main cell subsets in severe asthma (SA), containing CD4 + T cells, natural killer (NK) cells, CD8 + T cells, dendritic cells (DCs), B cells, and monocytes by scRNA-sEq. Among them, CD4 + T cells were the dominant cell subtype in SA. In addition, they also found that SA patients have a strong response to the stimulation of polyinosine-polycytidylic acid (poly I:C). Therefore, sRNA-seq data can help us to dissect the cells and genes that play a key role in the development of asthma.
In our study, to reveal the mechanism and nd new therapeutic strategies for Type 2-pediatric asthma, we obtained the scRNA-seq data describing chronic stimulation with IL-13 from the Gene Expression Omnibus (GEO) database. Seurat software was used for cell cluster analysis and produced three cellular subtypes, including ciliated cells, secretory cell 1, and secretory cell 2. The expression of cell marker genes CCL26, PRB1, SLC9B2, SCGB3A1, and BPIFA1 were reversed in secretory cell 1 and secretory cell 2. Consensus clustering was performed based on the ve cell marker genes in the validation dataset and produced two patient subtypes (C1 and C2). Type 2 T helper cells were higher in the in ltrate in the C2 subtype, while type 1 T helper cells were higher in the in ltrate in the C1 subtype. T2 cytokines (IL-13, IL-33, IL-3, IL-4, and TSLP) were expressed more in the C2 subtype. We hypothesized that the patients in the C2 subtype corresponded to type 2-pediatric asthma.

Data acquisition
We downloaded the pediatric asthma scRNA-seq describing chronic stimulation with IL-13 in the GSE145013 dataset from the GEO database (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi? acc=GSE145013) [7]. Single cell sequencing analysis was performed on epithelial cells from tracheal donors after chronic (11 days) IL-13 stimulation. A total of 802 single cell data were obtained. In addition, we obtained the transcriptome expression pro les of GSE65204 [8], and GSE40888 [9] datasets containing 36 and 65 pediatric asthma patients, respectively, from the GEO database.

Data processing
The 'Seurat' package was used to process the scRNA-seq data. The PercentageFeatureset function was used to calculate the percentage of mitochondrial genes. Correlation analysis was used to analyze the relationship between sequencing depth and mitochondrial gene sequences, and/or intracellular total sequences. We then removed the cells with mitochondrial gene content > 5%, sequencing number < 50, and gene number < 100. Variance analysis was used to screened the rst 1500 genes with highly variable characteristics [10].

Dimensionality reduction and cell annotation
Principal Component Analysis (PCA) was used for dimensionality reduction of the sRNA-seq data. Cluster analysis was performed using the t-distributed stochastic neighbor embedding (t-SNE) algorithm based on the principal components (PCs) with p < 0.05. In addition, P < 0.05, and log 2 |FC|> 0.5 were used as cut-off criteria to screen out marker genes in each cluster group. The top 10% marker genes in the cluster were listed in the heat map. The "Singler" package was performed for clustering annotation according to the marker genes [11,12]. The clusterPro ler package was performed for Gene Ontology (GO) enrichment analysis based on the marker genes of each cluster. Finally, the "Monocle" algorithm was carried out for pseudotime and trajectory analysis [13].

Consensus clustering algorithm based on the cell marker genes
The "ConsensusClusterPlus" package was used to classify pediatric asthmatic patients in the validation datasets (GSE65204 and GSE40888) based on the selected cell marker genes and 50 repetitions with maxK = 9 were performed to ensure stable subtypes [14].

Single-sample gene set enrichment analysis
The "GSVA," "limma" and "GSEABase" packages were used to calculate the abundance of the 28 Tumor in ltrating immune cells (TIICs) types in pediatric asthmatic patients from the GSE65204 and GSE40888 datasets. We searched for the marker genes of each immune cell type in the transcriptome expression pro les of patients with pediatric asthma and calculated the abundance of the 28 immune cell types in each sample based on the expression levels of the marker genes.

Statistical analysis
The Wilcoxon rank sum test was used to determine whether the difference between two groups was statistically signi cant. All statistical analyses were performed in R Studio (version 4.0.3) software, and P < 0.05 was used to indicate statistically signi cant differences.

Quality control and normalization of scRNA-seq data
The scRNA-seq data were qualitatively controlled using the 'Seurat' package, and the number of genes in a cell (nFeature_RNA), transcription number (nCont_RNA), and mitochondrial gene ratio (percent.mt) of the cells in the samples were calculated ( Figure 1A). After processing, 756 single cells were selected from 802 cells. There was no correlation between nCont_RNA and percent.mt ( Figure 1B), while nCont_RNA and nFeature_RNA had a strong positive correlation (R = 0.94, Figure 1C). A total of 1500 genes with high variation were extracted from 31920 genes ( Figure 1D).

Dimensionality reduction and cell annotation
Dimensionality reduction of the scRNA-seq data using the PCA algorithm indicated that the epithelial cells had clearly separated ( Figure 1E). The six PCs with p < 0.05 were used for further analysis ( Figure  1F). The t-SNE algorithm divides the epithelial cells into three clusters ( Figure 2B), including ciliated cells, secretory cell 1, and secretory cell 2 ( Figure 2C). Differential analysis between the three clusters identi ed 121 cell marker genes (Supplementary Table 1) and the top 10% cell marker genes of each cluster were showed by heat map (Figure 2A). The expression of CCL26, PRB1, and SLC9B2 marker genes were higher in secretory cell 1, while SCGB3A1 and BPIFA1 were higher in secretory cell 2 ( Figure 3). Pseudotime and trajectory analysis revealed that the ciliated cells could gradually differentiate into secretory cell 1 and secretory cell 2 under the action of IL-3 ( Figure 2D and E). Figure 2F indicates that there are three transition cell types before the ciliated cells differentiate into secretory cell 1 and secretory cell 2. To investigate the functions of the ciliated cells, secretory cell 1 and secretory cell 2, GO enrichment analysis based on the marker genes was performed. The function of secretory cell 1 may be related to glutathione derivative biosynthetic and metabolic process ( Figure 4A). Secretory cell 2 may be related to cytokine activity, cation antiporter activity, proton antiporter activity, and chemokine receptor binding ( Figure 4B). Ciliated cells may be related to cilium movement and microtubule movement ( Figure 4C).

Consensus clustering based on the ve cell marker genes
Considering that the expression of cell marker genes CCL26, PRB1, SLC9B2, SCGB3A1, and BPIFA1 were reversed in secretory cell 1 and secretory cell 2, we performed consensus clustering based on the ve cell marker genes to investigate whether the asthma patients could be classi ed. We rstly downloaded the transcriptome expression pro les of the GSE65204 and GSE40888 datasets from the GEO database and extracted the ve cell marker genes. Then, consensus clustering algorithm was performed, and two asthma subtypes (C1 and C2) were obtained ( Figure 5A-5C, Figure 6A-6C). The PCA analysis indicated that the asthma patients could be completely separated based on the expression of the ve cell marker genes ( Figure 5D, Figure 6D). We found that the expression level of SCGB3A1 and BPIFA1 were higher in the C2 subtypes, while CCL26, PRB1, and SLC9B2 were higher in the C1 subtypes ( Figure 5E, Figure 6E). We suggest that type 2 T helper cells were higher in the in ltrate in the C2 subtype, while type 1 T helper cells were higher in the in ltrate in the C1 subtype. It is known that chronic IL-3 stimulation can produce a Type 2-asthma model [7]. We hypothesized that one of the two secretory cells (secretory cell 1 and secretory cell 2) may be associated with Th2 in ammation. To verify our hypothesis, we compared the expression of T2 cytokines and the abundance of immune cell in ltration between the asthma subtypes.
We found that the T2 cytokines (IL-13, IL-33, IL-3, IL-4, and TSLP) were highly expressed in the C2 subtype ( Figure 5F, Figure 6F). Type 2 T helper cells were higher in the in ltrate in the C2 subtype, while type 1 T helper cells were higher in the in ltrate in the C1 subtype ( Figure 5G, Figure 6G). Thus, we speculate that the patients in the C2 subtype correspond to type 2-pediatric asthma, and secretory cell 2 is associated with Th2 in ammation.

Discussion
Asthma is a heterogeneous disease, and transcriptome sequencing is often used to study the pathogenesis of asthma and the e cacy of drug intervention in asthma [15][16][17]. Owing to the heterogeneity of asthma, traditional transcriptome sequencing methods based on blood or tissue samples obtain macroscopic results of the cell population, showing the average expression level of the same gene in all cells, rather than the differences between individual cells of the same gene and its role in the pathogenesis of the disease. Therefore, the pathogenesis of heterogeneous asthma and the therapeutic effect of drugs cannot be completely revealed. ScRNA-seq is a rapidly developing frontier technology in life science in recent years and can reveal the gene expression status of cells at the singlecell level, re ect the heterogeneity between cells, and provide a new research idea and technical means for the study of heterogeneous diseases [18][19][20]. This study explored the pathogenesis of Type 2-pediatric asthma based on single-cell sequencing data and provides a direction for nding novel targeted therapies.
Asthma affects health of all age groups, and its prevalence is increasing year by year, especially in children. Asthma has different disease processes and can be divided into different asthma phenotypes according to the patient's history, clinical manifestations, pathophysiological characteristics, treatment feedback, and prognostic difference [21,22]. Lymphocytes play an important role in chronic in ammation in asthma, and the enhanced immune response of type 2 helper T cells (Th2) (type 2 immune response) is considered an important in ammatory phenotype in asthma [23]. At the initial contact with the allergen, epithelial cells detect the allergen, release pro-in ammatory factors, and promote the recruitment and activation of lung dendritic cells, which absorb the allergen and transport the allergen to re ux lymph nodes to induce the activation of allergen-speci c Th2 cells. When exposed to the same allergen again, Th2 effector cells are stimulated and activated to secrete type 2 in ammatory factors such as IL-4, IL-5, and IL-13, causing allergic airway in ammation [24]. Currently, T2 cytokine inhibitors reduced the exacerbation rates of asthma, but their ability to improve lung function is limited. Therefore, we need to explore the pathogenesis of Type 2-pediatric asthma at the molecular level to provide a basis for nding new therapeutic targets.
Considering that Type 2-pediatric asthma is an airway obstructive disease driven by IL-13, we obtained an in vitro model of Type 2-pediatric asthma in the GSE145013 dataset, which described the epithelial cells from tracheal donors after chronic (11 days) IL-13 stimulation. Single cell sequencing analysis was performed on the in vitro model and 756 single cells were obtained through quality control and normalization. We then acquired three cell subtypes (ciliated cells, secretory cell 1, and secretory cell 2) and the cell marker genes (CCL26, PRB1, SLC9B2, SCGB3A1, and BPIFA1) were reverse expressed in secretory cell 1 and secretory cell 2. We suspect that the ve cell marker genes may be involved in the pathogenesis of Type 2-pediatric asthma. Consensus clustering was performed based on the ve cell marker genes in the validation dataset and produced two patient subtypes (C1 and C2). Type 2 T helper cells were higher in the in ltrate in the C2 subtype, while type 1 T helper cells were higher in the in ltrate in the C1 subtype. T2 cytokines (IL-13, IL-33, IL-3, IL-4, and TSLP) were expressed more in the C2 subtype. We speculate that the patients in the C2 subtype correspond to Type 2-pediatric asthma. According to the above results, we have reason to believe that the ve cell marker genes are involved in the pathogenesis of Type 2-pediatric asthma and may be new therapeutic targets for Type 2-pediatric asthma in the future.

Conclusions
In conclusion, our research found ve cell marker genes involved in the pathogenesis of Type 2-pediatric asthma based on scRNA-seq data and transcriptome expression pro les through a series of bioinformatics analysis methods, which provide new direction for precise treatment of Type 2-pediatric asthma. Our study was based on a public database, and further in vitro and in vivo studies are needed to verify the results.