Prioritization of candidate causal genes in GWAS signals of asthma in UK Biobank

To identify susceptibility loci and candidate causal genes of asthma, we performed a genome-wide association study (GWAS) in UK Biobank on a broad asthma denition (n = 56,167 asthma cases and 352,255 controls). We then carried out functional mapping through transcriptome-wide association studies (TWAS) and Mendelian randomization in lung (n = 1,038) and blood (n = 31,684) tissues. The GWAS revealed 72 asthma-associated loci from 116 independent signicant variants (P GWAS <5.0E-8). As expected, the yield of exonic variants associated with asthma was low, but nine were identied as potentially deleterious (CADD > 20) including a stop-gain mutation in the laggrin (FLG) gene. The top lung TWAS gene on 17q12-q21 was GSDMB (P TWAS =1.42E-54). Other TWAS genes of interest include TSLP on 5q22, RERE on 1p36, CLEC16A on 16p13, and IL4R on 16p12, which all replicated in GTEx lung (n = 515). A novel risk locus was also revealed by the lung asthma TWAS on 1q23.3 with the putative gene encoding the gamma chain of the high-anity IgE receptor (FCER1G, P TWAS =2.13E-6), which was also replicated in GTEx lung (P TWAS =3.71E-7). By testing a comprehensive set of cells and tissues, we then demonstrated that the largest fold enrichment of regulatory and functional annotations among asthma-associated variants was in the blood. We mapped 485 eQTL-regulated genes associated with asthma in the blood and 50 of them were shown to be causally associated with asthma by Mendelian randomization. Prioritization of druggable genes revealed known (IL4R, TSLP, IL6, TNFSF4) and potentially new therapeutic targets for asthma. hypomethylation its been in monocytes of patients with dermatitis, the of high anity IgE receptors in these cells we found that higher expression of FCER1G lung (z 4.74, P TWAS =2.13E-6), a nding replicated in GTEx lung (z = 5.08, P TWAS =3.71E-7). This new asthma locus may thus mediate its effect by upregulating FCER1G, which may then lead to inammatory cells with greater surface expression of IgE receptors, that are more prone to allergic prospectively evaluated for a range of health-related outcomes 37 . The denition of asthma in this study is based on the UK Biobank Outcome Adjudication Group and relies on hospital, death, primary care and self-reported related codes (Phase 2 code list for asthma). Asthma cases include patients with a diagnosis from hospital record (ICD-9 or ICD-10 codes) or primary care medical record as well as those with self-reported asthma (data-eld 20002 in UK Biobank). Genotyping data are derived from the Affymetrix UK BiLEVE or UK Biobank Axiom Arrays. Phasing and imputation were performed centrally using the Haplotype Reference Consortium (HRC) and merged UK10K and 1000 Genomes phase 3 reference panels 38,39 . Samples with call rate < 95%, outlier heterozygosity rate, sex mismatch, non-white British ancestry, samples with excess third-degree relatives (> 10), or not used for relatedness calculation were excluded. Variants with an imputation quality score (INFO) ≤ 0.3 or minor allele frequency (MAF) < 0.0001 were removed. Using the aforementioned denition of asthma and quality control lters, 56,167 asthma cases were compared to 352,255 controls. The genetic association analysis was performed using SAIGE (Scalable and Accurate Implementation of GEneralized mixed model, version 0.36.3.1, https://github.com/weizhouUMICH/SAIGE) 40 . SAIGE is a two-step method to perform generalized mixed model GWAS analysis that is robust to unbalanced case-control ratios, sample relatedness and low-frequency variants. In step 1, we t a null logistic mixed model with 93,511 independent, high-quality genotyped variants, which were used by the UK Biobank data group to estimate the kinship coecients between samples 39,40 . The following covariates were added: age, sex, and the rst 20 ancestry-based principal components. In step 2, we performed association tests between each genetic variant (genotyped The TWAS was performed using S-PrediXcan 45 . The lung eQTL dataset was used as the training set to derive the expression weights. Gene expression normalized for age, sex and smoking status from Laval, UBC, and Groningen were combined with ComBat 46 . Gene expression traits were then trained with elastic net linear models (alpha = 0.5, n_k_folds = 10, window = 500 Kb). Models with false-discovery rate (FDR) < 0.05 as implemented in S-PrediXcan were obtained for 19,918 probe sets. Predicted expression levels from the lung in the UK Biobank participants were then tested for association with asthma 45 . The Bonferroni correction was used to claim transcriptome-wide signicance (S-PrediXcan P TWAS =0.05/19,918 = 2.51E-6).


Introduction
Asthma is still causing 420,000 deaths per year and a icts 300 million individuals worldwide 1 . Our understanding of the genetics of asthma has progressed following the completion of large GWAS by international consortia, namely GABRIEL 2 , EVE 3 , TAGC 4 , and CAAPA 5 . More recently, two groups of investigators tapped into the UK Biobank resource to delineate the genetics of childhood-vs. adult-onset asthma 6,7 . Together, approximately 200 genetic loci have been associated with asthma through GWAS. The objective of this study was to map candidate causal genes of asthma in lung and blood tissues. This was achieved in two steps. First, performing a case-control GWAS on a broad asthma de nition in UK Biobank in order to physically de ne chromosome regions associated with asthma. Second, prioritizing candidate genes by mapping the effects of asthma-associated variants on protein-coding genes, gene expression and chromatin interaction sites using multiple approaches such as transcriptome-wide association study, colocalization, and Mendelian randomization.

Asthma GWAS in UK Biobank
In total, 56,167 asthma cases and 352,255 controls of White British ancestry were selected from UK Biobank (see Methods). Demographics and clinical characteristics of cases and controls are in Table 1. The number of cases corresponds to an asthma population prevalence of 13.8%, which is consistent with the UK lifetime prevalence of patient-reported clinician-diagnosed asthma of 15.6% 8 . The granularity of asthma cases de ned based on self-reported questionnaires, hospital records (ICD-9 and ICD-10), and primary care records is provided in Supplementary Fig. 1. For GWAS analysis, 35,270,583 SNPs ( ltered by MAF > 0.0001 and imputation info score > 0.3) were available for genetic association testing following standard quality controls and imputation. We observed no evidence of in ation in the test statistics with λ = 1.029 ( Supplementary  Fig. 2). The SNP-heritability on the liability scale was estimated at 11.3%. In total, 14,742 SNPs reached genome-wide signi cance (P GWAS <5.0E-8) at 73 physically de ned loci. Figure 1 shows the Manhattan plot and individual loci are listed in Supplementary Table 1. Nine of these loci are novel, with no genetic variant associated with asthma in the literature published before January 1st, 2020. Regional plots for these 9 loci are provided in Fig. 2. The locus 7p14 is characterized by only one rare SNP that passed the signi cance threshold (rs576468798, P GWAS =2.00E-8, imputation info = 0.61). Allele frequencies in asthma cases (0.00033) and controls (0.00013) range within those observed in reference populations (TOPMed = 0.00015, 1000G European = 0.0006). Nevertheless, we discarded this locus as more validation is needed to robustly establish its association with asthma. We also evaluated the number of independent association signals within the 72 loci by conditional analysis. Sixteen loci had more than one independent association signals, ranging from 2 to 9 independent signals by locus, except for the MHC locus where we observed 12 independent signals. In total, 116 independent associations with asthma risk at a P GWAS <5.0E-8 were observed (Supplementary Table 2). GWAS sensitivity analysis GWAS-nominated loci were re-evaluated by changing exclusion criteria to de ne asthma cases and controls. The rationale was to evaluate the potential confounding effect of other lung diseases, smoking, and allergy. Genetic association analyses were thus performed in three case-control subsets. First, asthma cases and controls with other lung diseases were excluded. Individuals were excluded if they had self-reported or medical records consistent with the presence of chronic obstructive pulmonary disease (COPD), emphysema, chronic bronchitis, interstitial lung disease or alpha-1 antitrypsin de ciency. This results in the exclusion of 20,998 individuals and genetic analysis performed in 47,391 asthma cases and 340,033 controls. Second, we excluded all asthma cases and controls with a positive smoking history (i.e. former and current smokers). This results in the exclusion of 250,739 individuals and genetic analysis performed in 21,097 asthma cases and 136,586 controls. Third, we excluded control individuals with atopy, including hay fever, allergic rhinitis, and eczema/atopic dermatitis. This results in the exclusion of 84,113 individuals and genetic analysis with 268,142 controls (and the same number of asthma cases, n = 56,167). Note that the three lists of exclusion criteria were applied separately (not cumulatively) and speci c UK Biobank data elds and codes used for excluding individuals in each case-control subset are provided in Supplementary Table 14.

Positional mapping of deleterious coding SNPs
Our rst strategy to prioritize target genes within GWAS-nominated asthma loci was to map deleterious coding variants. A total of 354 exonic variants located in 27 loci were associated with asthma (P GWAS <5.0E-8) or in LD (r 2 ≥ 0.6, 1000G EUR) with asthma-associated variants (Supplementary Table 3). Two-thirds (236 out of 354) of these variants were located in the MHC locus. The top deleterious variants at this locus were rs9269958 in HLA-DRB1 and rs2855430 in COL11A2 with CADD scores of 57 and 33, respectively. However, association signals for these variants (P GWAS =6.43E-5 and 6.83E-7) were much smaller compared to the sentinel variant (rs9273386, P GWAS =2.11E-48). The extent of LD at this locus precluded rm conclusion. Outside of the MHC locus, we identi ed 8 nonsynonymous variants and 1 stop-gain variant with CADD score > 20 located at 7 loci ( Table 2). Genes of known biological relevance were identi ed including laggrin (FLG) on 1p21 and toll like receptor 10 (TLR10) on 4p14. On 17q12-q21, three potential target genes were identi ed, namely, ERBB2, STARD3, and GSDMA. Overall, the yield of candidate genes by mapping of deleterious coding variants was relatively low. This is consistent with previous GWAS results on asthma that showed more genetic associations in noncoding regions of the genome and suggests that most of the risk loci are likely to act through gene regulation. showing the colocalization events for these TWAS genes on 17q12-q21 are provided in Supplementary Fig. 5 and show that the P value distribution of eQTL for GSDMB colocalized better with that of the GWAS. The direction of effects, i.e. whether lower or higher predicted expression of these genes increased asthma risk are presented in Table 3, along with other TWAS genes found at asthma-associated loci. TWAS genes of known  repl.: no model for C9orf38, KIAA1432 (+) and (−) indicate predicted gene expression positively or negatively associated with asthma risk. For loci with more than one TWAS genes, the genes are ordered by their level of signi cance and separated by arrows.
*All Bonferroni-corrected TWAS genes per loci found in GTEx lung are indicated as well as the results of TWAS genes identi ed in the lung eQTL dataset in order to seek for replication (P TWAS <0.05) in GTEx lung.  repl.: no model for PHF5A (+) and (−) indicate predicted gene expression positively or negatively associated with asthma risk. For loci with more than one TWAS genes, the genes are ordered by their level of signi cance and separated by arrows.
*All Bonferroni-corrected TWAS genes per loci found in GTEx lung are indicated as well as the results of TWAS genes identi ed in the lung eQTL dataset in order to seek for replication (P TWAS <0.05) in GTEx lung.
TWAS can also reveal novel risk loci owing to the resulting power of combining GWAS and eQTL. In this study, two TWAS genes are located in genomic loci that did not reach statistical signi cance in the GWAS. This GTEx lung was used to validate the TWAS results. For the two novel asthma risk loci, FCER1G was replicated on 1q23.3 (z = 5.08, P TWAS =3.71E-7), but not DMPK on 19q13.32 (z = 1.78, P TWAS =0.075). Table 3 shows replication of TWAS results in GTEx lung for the 21 asthma-associated loci. For asthma loci with a single TWAS gene, consistency was observed for RERE on 1p36, CLEC16A on 16p13, and IL4R on 16p12. On 5q22, TSLP was the top TWAS gene in both our lung eQTL set and GTEx lung. On 17q12-q21, GSDMB and ORMDL3 were switched as the top TWAS gene. In general, for asthma loci with multiple TWAS genes in our lung eQTL dataset (the MHC locus for example), some of the genes were replicated in GTEx lung, but the ranking of genes based on level of signi cance changed, and sometimes different TWAS genes were observed in GTEx lung. Six TWAS genes were replicated, but with a different direction of effect, SMAD3 on 15q22-q23 is an example. Finally, replication was not feasible for 24 TWAS genes observed in our lung eQTL dataset as they did not yield signi cant gene expression models in GTEx lung (Table 3).
Cell and tissue functional enrichment of asthma-associated SNPs We used GARFIELD 10 to evaluate the enrichment of asthma-associated loci in regulatory and functional annotations derived from ENCODE and the Roadmap Epigenomics Project. Figure 4 shows functional enrichment within DNase I hypersensitivity site (DHS) hotspots at two GWAS P value cut-offs. The largest fold enrichment values were in the blood. All results are summarized in Supplementary Table 6, along with other annotation types.

Functional mapping and annotation in blood
We used the FUMA platform 50 to functionally annotate our GWAS ndings. The summary statistics of the asthma GWAS in UK Biobank were uploaded in FUMA. The SNP2GENE function was used to map GWAS SNPs to 1) deleterious coding SNPs (positional mapping), 2) blood eQTL (eQTL mapping), and 3) chromatin contact interactions (chromatin interaction mapping). Positional mapping was performed by selecting exonic variants directly associated with asthma (P GWAS <5E-8) or in LD with asthma-associated variants using a LD r 2 threshold of 0.6 based on the 1000 Genomes EUR reference panel. Protein coding variants (excluding synonymous) with CADD score > 20 were further prioritized. Blood cis-eQTL mapping was performed using a publicly available dataset of 31,684 samples 11 . Signi cant SNP-gene pairs (P FDR <0.05) were identi ed and then mapped to genetically expressed genes associated with asthma, or eGenes. Chromatin interaction mapping was performed using Hi-C data of a lymphoblastoid B cell line (GM12878). Results of eQTL and chromatin mapping were visualized using circos plots generated by FUMA.
Mendelian randomization in blood with asthma Two-sample summary-level Mendelian randomization (MR) analyses were performed to infer causal associations between blood eGenes and asthma. The genetic effects on asthma risk were derived from the current GWAS in UK Biobank and the genetic effects on gene expression in blood were derived from a published eQTL dataset 11 . MR was performed using the inverse-variance weighted (IVW) and Egger methods as implemented in the MendelianRandomization package in R. SNPs were selected within a window of 500 Kb around the transcription start site of each blood eGene. SNPs associated with gene expression (P < 0.001 corresponds to ~ F statistics > 10) and independent (r 2 < 0.1 based on the 1000 Genomes EUR reference panel) were selected as instrumental variables. We requested at least 3 instrumental variables per gene to perform

Drug targets
Target genes of the asthma-associated variants identi ed in previous sections were then integrated to prioritize druggable genes. In total, we identi ed 55 lung TWAS genes (Supplementary Table 4), 485 blood eGenes (Supplementary Table 8), and 563 chromatin contacts mapped genes (Supplementary Table 10). Together, 806 unique target genes were identi ed with overlap across methods shown in Fig. 6. According to the Open Targets Platform 12 , 13 of them are the targets of investigational or approved asthma drugs (Table 4). All target genes were also interrogated using the Open Targets Platform 12 for their overall association score with asthma. Results for all target genes are in Supplementary Table 13. The 806 target genes were also overlaid with the known druggable genes derived from the drug-gene interaction database (DGIdb) 13 and the druggable genome 14 . Drug-gene interactions were identi ed for 182 target genes in DGIdb and 201 target genes were part of the druggable genome (Supplementary Table 13), which offer numerous opportunities for drug repurposing.
We further focused on 29 target genes that were consistently identi ed by TWAS, eQTL, and chromatin interactions (Fig. 6). Ten of them have known drug targets. Table 5 shows these 10 druggable target genes for asthma and their direction of effect on asthma risk in lung tissue as well as the candidate drugs, interaction types and clinical indications. Target-asthma associations of 1, which is the highest possible score in Open Targets, were observed for two genes including IL4R that is the therapeutic target of dupilumab used to treat uncontrolled persistent asthma 15 and SMAD3 involved in airway remodeling 16 and that may mediate some actions of corticosteroids, which are the cornerstone of asthma treatment. Finally, we ltered the 806 target genes based on three cumulative criteria: 1) those identi ed by at least two out of three approaches (lung TWAS genes, blood eGenes, and Hi-C genes), 2) those with asthma score of at least 0.5 in Open Targets, and 3) those that are druggable in either the DGIdb or the druggable genome. By excluding the HLA molecules, this strategy revealed 21 prioritized therapeutic targets for asthma (Table 6). In addition to IL4R and SMAD3, these prioritized genes are known targets of existing asthma drugs including IL6 (clazakizumab, sirukumab), TNFSF4 (oxelumab) and TSLP (tezepelumab).   *Overall association score for asthma from the Open Targets Platform 12 .

Discussion
An important genetic susceptibility to develop asthma has long been demonstrated by genetic epidemiology studies. However, the predisposing genetic variants have been di cult to identify until the realization of the recent large-scale GWAS. Now, a large number of genetic loci are robustly associated with asthma. The new challenge is to identify the candidate causal genes and best therapeutic targets underpinning GWASnominated loci. Here, we leveraged lung and blood transcriptome as well as epigenetic marks to map the most likely causal genes within asthma susceptibility loci derived from UK Biobank. Using a broad asthma de nition, we identi ed 72 physically-de ned asthma loci containing 116 independent genetic variants with P GWAS < 5.0E-8. The effect size estimates were robust to more strict asthma de nitions excluding other lung diseases, smoking history or allergy within controls. As expected, the yield of deleterious coding variants was low, and we thus focused most analyses on regulatory elements. The UK Biobank asthma GWAS was integrated with the largest lung eQTL study available. Fifty-ve signi cant TWAS genes located in 21 asthma loci were found candidate causal genes identi ed in this study based on consistency across methods, druggability, and prior asthma association led to 21 genes prioritized as therapeutic targets for asthma.
Among them, there are three members of the toll-like receptor family: TLR1, TLR6, and TLR10. These three TLRs are located at the same 4p14 locus, are phylogenetically related, and require the formation of heterodimers with TLR2 for recognition of invading microbes 18 . We found a strong colocalization signal between the blood eQTL for TLR10 and the GWAS for asthma (PP4 = 0.84). In MR, the blood expression of TLR10 was positively associated with the risk of asthma (P IVW =4.34E-6). We also identi ed two missense mutations in TLR10, C-rs4129009 (p.Ile775Val) (P GWAS =1.49E-10) and G-rs11096957 (p.Asn241His) (P GWAS =2.49E-10), which are in moderate LD (r 2 = 0.42 in CEU) and associated with the risk of asthma. The CADD score for G-rs11096957 is 21.9 and is predicted to be "possibly damaging" and "deleterious" by PolyPhen and SIFT, respectively. The alleles Crs4129009 (p.Ile775Val) and G-rs11096957 (p.Asn241His), which decrease the risk of asthma, have been associated with elevated blood cytokine responses to a TLR1/2 agonist, most speci cally Pam 3 CSK 4 -induced interleukin 6 (IL6) 19 . We recently demonstrated that genetically predicted levels of circulating IL6R, a negative regulator of IL6 signaling, are positively associated with the risk of asthma and atopic disorders 20 . These data suggest a complex interaction between TLR10 and IL6 on the risk of asthma and warrant further investigation. One line of enquiry could examine the possibility that TLR10 dampens TLR2 signaling and IL6 production, thereby increasing the risk of asthma. In support of the latter hypothesis, in an ovalbumin-induced asthma mouse model, IL6 lowered Th2 cytokines and decreased bronchial hyperresponsiveness 21 .
Other gene targets of interest include cytokine/chemokine receptors (IL7R, IL1RL1, CCR7), two members of the EGFR family of receptor tyrosine kinases (ERBB2 and ERBB3), two G protein-coupled receptors (GPR18 and GPR183), an antigen recognition molecule (CD247), a member of the protein tyrosine phosphatase family (PTPRC), a transcription factor (RORC), a member of the NOTCH family (NOTCH4), a member of the plexin family (PLXNC1), and galactosidase beta 1 (GLB1). All these targets have drug-gene interactions in DGIdb 13 and/or are present in the list of genes encoding druggable human proteins 14 . They are thus representing drug repurposing/development opportunities. Further experimental research will be needed to screen these putative novel therapeutic targets for asthma.
We have been searching for asthma genes for a long time 22 . The last decade of GWAS research has been particularly exciting, witnessing the identi cation of genetic variants robustly associated with this disease 23,24 .
Ten years ago, nding these variants was almost seen as an insurmountable task. Three main factors have allowed us to overcome this challenge: 1) our evolving understanding of the human genome, 2) progress in genotyping technologies, and 3) international collaboration to allow su ciently large sample size to be analyzed. Now, with advanced bioinformatics skills and resources such as the UK Biobank, we can rediscover the genetic variants associated with asthma. This is remarkable. Obviously, any milestone comes with new challenges. Translating new genetic ndings into better management and/or treatment for patients is still falling short. The effect sizes of all genetic variants associated with asthma are relatively small. Although 116 independent variants reached genome-wide signi cance (P GWAS <5.0E-8) in this study, the ORs range from 1.03 to 3.59 (median = 1.06, note that ORs lower than 1 were converted into their reciprocal (1/OR)). As an example to appreciate the effect size that we are detecting, the top associated asthma variant on chromosome 9 near the IL33 gene has a p value of 1.25E-56 and an OR of 1.13, which is the result of allele frequencies of 0.73 in cases and 0.75 in controls. One hundred out of the 116 independent variants were common, with minor allele frequency greater than 5% in cases and controls combined. So most risk alleles are common with small effects that we are able to detect owing to the large sample size. Cumulatively, all genetic variants discovered by GWAS explained about 8-9% of the total heritability 17 , suggesting much more work is needed to elucidate the full genetic architecture of asthma. More work is also needed to move discovered genetic factors underlying asthma down the clinical translation pipeline. We believe that the current study represents an important step beyond GWAS data. By combining different data sources (eQTL and Hi-C in disease-relevant tissues) and advanced bioinformatics approaches (TWAS, MR, colocalization), we were able to reveal relevant genes and putative therapeutic targets for asthma.
As observed in previous asthma GWAS studies, we had limited success in mapping asthma-associated variants to deleterious coding SNPs. One of our most interesting hit is the stop-gain mutation in the laggrin  25 where the risk variants are believed to disrupt the skin barrier, allowing allergen sensitization and then promoting the development of asthma 26 . rs61816761 was also found in previous GWAS of asthma in UK Biobank 6,7,17,27,28 and with greater effect on atopic dermatitis than asthma 29 , which supports skin barrier dysfunction as a cause of asthma. Another deleterious coding variant (rs2230624, Cys273Tyr) was identi ed in TNFRSF8 (also known as CD30), which was previously reported 30 and characterized as a loss of function variant that decreased asthma risk by reducing the tra cking of the CD30 protein on cell surface 29 .
As GWAS on asthma in UK Biobank accumulate 6,7,17,28−30 , the next milestone will be to identify the function units, most intuitively genes, underlying the GWAS loci. In this study, we have combined the UK Biobank GWAS data with the largest lung eQTL available to perform a TWAS. Plausible causal genes in lung tissues were revealed for 21 asthma loci. On 17q12-q21.2, the rst discovered 31 and the most replicated 32 GWAS asthma locus, GSDMB was the top TWAS gene in our lung eQTL dataset. Although other TWAS genes were observed in that locus, the p value distributions of GWAS and eQTL colocalized better with GSDMB. Using blood as eQTL source, we also identi ed GSDMB as the most likely causal gene on 17q12-q21.2. These results are consistent with eQTL analysis showing that SNPs associated with asthma susceptibility and severity at 17q12-q21.2 are correlated with GSDMB expression in cells from human bronchial epithelial biopsy and bronchial alveolar lavage 33,34 . GSDMB is thus a gene to focus on in future functional studies. Other lung TWAS genes prioritized by our study for further functional characterization are RERE on 1p36, TLR1 on 4p14, SLC22A5 or RAD50 on 5q31, RBM17 on 10p15, UBAC2 on 13q32, SMAD3 on 15q22-q23, CLEC16A on 16p13, IL4R on 16p12, KPNB1 on 17q21.32, and PHF5A on 22q13.
The lung TWAS revealed a novel asthma risk locus at 1q23.3 with the putative gene encoding the gamma chain of the high-a nity IgE receptor (FCER1G). Note that the FCER1A gene at 1q23.2 (approximately 2 Mb away from FCER1G) has been associated with total serum IgE levels 35 . Concerning FCER1G, hypomethylation at its promoter has been reported in monocytes of patients with atopic dermatitis, resulting in the overexpression of high a nity IgE receptors in these cells 36 . Here, we found that higher expression of FCER1G in lung tissue is associated with asthma (z = 4.74, P TWAS =2.13E-6), a nding replicated in GTEx lung (z = 5.08, P TWAS =3.71E-7). This new asthma locus may thus mediate its effect by upregulating FCER1G, which may then lead to in ammatory cells with greater surface expression of IgE receptors, that are more prone to allergic reaction.
This study has limitations. We used the best possible bioinformatics approaches to identify causality genes.
However, there is no functional validation using experimental models or human studies to con rm the biological role of these genes in the pathogenesis of asthma. Further studies are needed to demonstrate causality. Our analyses are largely based on European-descent individuals. Inference to other ethnic groups is thus a concern and the lack of similarly powerful resources (e.g. UK Biobank) for other ancestries represent a missed opportunity to identify other relevant asthma genes. Environmental risk factors and the speci c period of exposures during the lifespan play an important role in the development of asthma. Our genomic datasets (GWAS and eQTL) are retrospective in nature and we have not taken into account likely modi ers of genetic risk. Finally, we used whole lung and blood tissues, which contain heterogeneous cell populations, limiting our ability to identify genes affecting asthma risk through gene regulation. Progress in single-cell transcriptomic is promising for future studies.
In conclusion, this study expands our understanding of the regulatory and functional mechanisms underlying GWAS asthma risk loci in lung and blood tissues. The candidate causal genes identi ed are key to understand disease etiology, interpret GWAS results and prioritize follow-up functional studies. Our top therapeutic targets represent new opportunities for drug repositioning and testing in pre-clinical models.

Genome-wide association study on asthma in UK Biobank
UK Biobank is an open access resource of nearly 500,000 participants enrolled at the age of 40-69 and prospectively evaluated for a range of health-related outcomes 37 . The de nition of asthma in this study is based on the UK Biobank Outcome Adjudication Group and relies on hospital, death, primary care and selfreported related codes (Phase 2 code list for asthma). Asthma cases include patients with a diagnosis from hospital record (ICD-9 or ICD-10 codes) or primary care medical record as well as those with self-reported asthma (data-eld 20002 in UK Biobank). Genotyping data are derived from the Affymetrix UK BiLEVE or UK Biobank Axiom Arrays. Phasing and imputation were performed centrally using the Haplotype Reference Consortium (HRC) and merged UK10K and 1000 Genomes phase 3 reference panels 38,39 . Samples with call rate < 95%, outlier heterozygosity rate, sex mismatch, non-white British ancestry, samples with excess thirddegree relatives (> 10), or not used for relatedness calculation were excluded. Variants with an imputation quality score (INFO) ≤ 0.3 or minor allele frequency (MAF) < 0.0001 were removed. Using the aforementioned de nition of asthma and quality control lters, 56,167 asthma cases were compared to 352,255 controls. The genetic association analysis was performed using SAIGE (Scalable and Accurate Implementation of GEneralized mixed model, version 0.36.3.1, https://github.com/weizhouUMICH/SAIGE) 40 . SAIGE is a two-step method to perform generalized mixed model GWAS analysis that is robust to unbalanced case-control ratios, sample relatedness and low-frequency variants. In step 1, we t a null logistic mixed model with 93,511 independent, high-quality genotyped variants, which were used by the UK Biobank data group to estimate the kinship coe cients between samples 39,40 . The following covariates were added: age, sex, and the rst 20 ancestry-based principal components. In step 2, we performed association tests between each genetic variant (genotyped and imputed) and asthma. We applied the leave-one-chromosome-out (LOCO) scheme (LOCO = TRUE). The quantile-quantile plot was generated ( Supplementary Fig. 2). The genomic in ation factor was computed by converting P values into chi-squared values and then dividing the median of the resulting chisquared statistics by the expected median of the chi-squared distribution. The present analyses were conducted under UK Biobank data application number 25205.

GWAS sensitivity analysis
GWAS-nominated loci were re-evaluated by changing exclusion criteria to de ne asthma cases and controls.
The rationale was to evaluate the potential confounding effect of other lung diseases, smoking, and allergy. Genetic association analyses were thus performed in three case-control subsets. First, asthma cases and controls with other lung diseases were excluded. Individuals were excluded if they had self-reported or medical records consistent with the presence of chronic obstructive pulmonary disease (COPD), emphysema, chronic bronchitis, interstitial lung disease or alpha-1 antitrypsin de ciency. This results in the exclusion of 20,998 individuals and genetic analysis performed in 47,391 asthma cases and 340,033 controls. Second, we excluded all asthma cases and controls with a positive smoking history (i.e. former and current smokers). This results in the exclusion of 250,739 individuals and genetic analysis performed in 21,097 asthma cases and 136,586 controls. Third, we excluded control individuals with atopy, including hay fever, allergic rhinitis, and eczema/atopic dermatitis. This results in the exclusion of 84,113 individuals and genetic analysis with 268,142 controls (and the same number of asthma cases, n=56,167). Note that the three lists of exclusion criteria were applied separately (not cumulatively) and speci c UK Biobank data elds and codes used for excluding individuals in each case-control subset are provided in Supplementary Table 14.

Number of loci associated with asthma
After the GWAS analysis, we assessed the number of loci that were associated with asthma based on two methods. First, we counted the number of loci based on physical distance only. All SNPs associated with asthma (P < 5.0E-8) were ranked by chromosome order and by position on build 37. Two subsequent SNPs on this list located on the same chromosome and separated by more than 1 Mb were considered distinct loci. The physical boundaries of asthma-associated loci were then de ned by adding 500 Kb downstream and upstream of the most 5' and 3' asthma-associated variants (P GWAS <5.0E-8), respectively, within each locus. One exception was the extended MHC region on chromosome 6 that was counted as a single locus and delimited at 25,726,000 to 33,378,000 bp (GRCh37) based on the positions of two genes (HIST1H2AA and KIFC1). Second, we identi ed the number of independent variants, as some physically de ned loci will contain signi cant SNPs that are not in LD. This was performed using a stepwise conditional analysis (--cojo-slct) 42 . The procedure consists of a rst round of analysis that is conditioned on the top asthma-associated variant at each locus derived from the original GWAS. If signi cantly associated variants remain, a second round of analysis is conditioned on the top asthma-associated variant from the rst round. Subsequent rounds are carried out until no more variants reach P GWAS <5.0E-8.

The lung expression quantitative trait loci
The lung eQTL dataset consists of whole-genome genotyping (Illumina Human1M-Duo BeadChip) and gene expression (Affymetrix) in non-tumor lung tissues from patients who underwent lung surgery at three academic sites, Laval University, University of British Columbia, and University of Groningen, henceforth referred to as Laval, UBC, and Groningen, respectively. All lung specimens from Laval were obtained from patients undergoing lung cancer surgery and were harvested from a site distant from the tumor. At UBC, the majority of samples were from patients undergoing resection of small peripheral lung lesions. Additional samples were from autopsy and at the time of lung transplantation. At Groningen, the lung specimens were obtained at surgery from patients with various lung diseases, including patients undergoing therapeutic resection for lung tumors, harvested from a site distant from the tumor, and lung transplantation. Lung tissue processing and storage, DNA and RNA extraction, genotyping, microarray-based gene expression and lung cis-eQTL analyses have been described previously 43,44 . Following standard microarray and genotyping quality controls, data on 1,038 patients were available. At Laval and UBC, written informed consent was obtained from all subjects and the study was approved by their respective ethics committee. At Groningen, lung specimens were provided by the local tissue bank of the Department of Pathology and the study protocol was consistent with the Research Code of the University Medical Center Groningen and Dutch national ethical and professional guidelines ("Code of conduct; Dutch federation of biomedical scienti c societies"; http://www.federa.org).
Transcriptome-wide association study The TWAS was performed using S-PrediXcan 45

TWAS replication in GTEx lung
Lung eQTL data from 515 individuals available in the Genotype-Tissue Expression (GTEx) project (version 8) 47 were used for TWAS replication. The TWAS was performed using SPrediXcan as described above. In GTEx lung, models were obtained for 11,518 genes. Bonferroni-corrected TWAS gene was thus set at P TWAS <4.34E-6.
We also sought replication of TWAS genes identi ed in our lung eQTL dataset. Signi cant replication was considered for genes with the same direction of effect and with P TWAS <0.05 in GTEx lung.

Bayesian colocalization
For speci c asthma-associated loci and genes, we evaluated whether the asthma GWAS and lung eQTL signals shared the same causal variants using the COLOC package in R 48 . For the loci of interest, summary statistics from the asthma GWAS in UK Biobank were combined with our lung eQTL results using a window of 1 Mb up-and downstream of the TWAS genes. We considered colocalization events when the posterior probability of shared eQTL and GWAS associations (PP4) was greater than 60%. The colocalization analysis for the 485 blood eGenes were performed using the HyPrColoc package. HyPrColoc analysis was performed in a window of 500 kb up-and downstream of the transcription start site of each eGene. LocusCompare 49 was used to visualize GWAS and eQTL colocalization events.
Cell type and tissue enrichment of asthma-associated loci We used GARFIELD 10 to overlap our GWAS ndings with regulatory and functional annotations derived from ENCODE, GENCODE and Roadmap Epigenomics projects. A total of 1,005 annotation features were considered including chromatin states, histone modi cations, genic annotations, transcription factor binding sites and open chromatin data (FAIRE, DHS hotspots, peaks and footprints), which were evaluated in different cell types and tissues. LD pruning of GWAS SNPs was performed at r 2 > 0.8 and fold enrichment was evaluated at two GWAS signi cance thresholds: 1.0E-5 and 1.0E-8.

Functional mapping and annotation in blood
We used the FUMA platform 50 to functionally annotate our GWAS ndings. The summary statistics of the asthma GWAS in UK Biobank were uploaded in FUMA. The SNP2GENE function was used to map GWAS SNPs to 1) deleterious coding SNPs (positional mapping), 2) blood eQTL (eQTL mapping), and 3) chromatin contact interactions (chromatin interaction mapping). Positional mapping was performed by selecting exonic variants directly associated with asthma (P GWAS <5E-8) or in LD with asthma-associated variants using a LD r 2 threshold of 0.6 based on the 1000 Genomes EUR reference panel. Protein coding variants (excluding synonymous) with CADD score > 20 were further prioritized. Blood cis-eQTL mapping was performed using a publicly available dataset of 31,684 samples 11 . Signi cant SNP-gene pairs (P FDR <0.05) were identi ed and then mapped to genetically expressed genes associated with asthma, or eGenes. Chromatin interaction mapping was performed using Hi-C data of a lymphoblastoid B cell line (GM12878). Results of eQTL and chromatin mapping were visualized using circos plots generated by FUMA.

Pathway analysis
Pathway analysis was performed using the Enrichr web server 51 . Blood eGenes discovered in this study were uploaded in Enrichr and enrichment was assessed using the combined score method for gene sets available in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database.
Mendelian randomization in blood with asthma Two-sample summary-level Mendelian randomization (MR) analyses were performed to infer causal associations between blood eGenes and asthma. The genetic effects on asthma risk were derived from the current GWAS in UK Biobank and the genetic effects on gene expression in blood were derived from a published eQTL dataset 11 . MR was performed using the inverse-variance weighted (IVW) and Egger methods as implemented in the MendelianRandomization package in R. SNPs were selected within a window of 500 Kb  Fig. 2. Quantile-quantile plot of test statistics generated by the GWAS in UK Biobank including 56,167 asthma cases and 352,255 controls. Genomic in ation factor λ = 1.029. Supplementary Fig. 3. Results of sensitivity analysis evaluating the potential confounder effects of other lung diseases, smoking and allergy. The three scatter plots compared the effect size estimates and SE of the main study design (x axis) with the three alternative study designs (y axis). The identity line is shown in blue. The single and most extreme outlier in the upper right corner of scatter plots consists of the sentinel variant on 1q21.3.
Supplementary Fig. 4. The top lung TWAS genes identi ed per asthma-associated loci. Previously known and new asthma loci are illustrated in blue and green, respectively.
Supplementary Fig. 5. LocusCompare plots for four signi cant TWAS genes on chromosome 17q12-q21. Association signals for SNPs within 50 Kb up and downstream of target genes are illustrated for GSDMB, ORMDL3, GSDMA, and PNMT.
Supplementary Fig. 6. The top causally associated blood eGene identi ed per asthma-associated loci.
Previously known and new asthma loci are illustrated in blue and green, respectively. Figure 1 Manhattan plot of the GWAS on asthma in UK Biobank. The GWAS was performed in 56,167 asthma cases and 352,255 controls. The y axis represents P value in -log10 scale. The horizontal blue and magenta lines indicate P value of 1x10-5 and 5x10-8, respectively. Novel asthma loci are in red. Genetic variants with P value > 0.05 were removed to limit the digital information of the Figure   Manhattan plot of the TWAS on asthma integrating the UK Biobank GWAS and the lung eQTL dataset. Each dot represents the association between predicted gene expression and asthma for a speci c probe/transcript. P values for gene expression-asthma associations are on the y axis in -log10 scale. The blue, green and magenta horizontal lines represent PTWAS of 0.05, 0.0001 and 2.51E-6 (Bonferroni), respectively. Annotations for genome-wide signi cant probes/transcripts that passed Bonferroni correction are indicated.

Figure 4
GARFIELD functional enrichment analyses. The wheel plot shows functional enrichment for asthma variants within DHS hotspot regions in ENCODE and Roadmap Epigenomics data. The radial axis represents the enrichment (OR) for each of 424 cell types that are sorted by tissue along the outside edge of the plot. Tissues are labeled with font size proportional to the number of cell types. Boxes forming the edge are colored by tissue. Enrichment is calculated for two GWAS signi cance thresholds: 1.0E-5 and 1.0E-8, which are plotted in blue and black, respectively, inside the plot. Dots along the inside edge of the plot are colored with respect to tissue and represent signi cant enrichment for a speci c cell type (one dot = P < 1E-5 and two dots = P < 1E-6).

Figure 5
Blood eQTLs and chromatin interactions in GM12878 at asthma risk loci on chromosome 17. Top asthma GWAS SNPs at independent loci are indicated at the most outer border of the circos plot. The subsequent layers show 1) genetic association results from the asthma GWAS in UK Biobank for SNPs with P value < 0.05 with the color of dots re ecting the level of LD with the sentinel variant at each locus (red: r2 > 0.8, orange: r2 > 0.6), 2) the chromosome coordinate and the asthma GWAS loci highlighted in dark blue, 3) blood eQTL genes, genes mapped by Hi-C, and genes mapped by both eQTL and Hi-C labelled in green, orange, and red, respectively, 4) green and orange lines link the position of eQTLs and chromatin interactions, respectively.

Figure 6
Venn diagram showing the number of target genes that overlapped among 55 lung TWAS genes, 485 blood eGenes, and 563 chromatin contacts mapped genes.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.