Study design and data preparation
The UK Biobank (UKB) cohort is a major data resource that contains genetic as well as a wide range of phenotypic data of ~500,000 participants of European ancestry aged 39-73 years at recruitment . Genotyping was conducted using the Affymetrix UK BiLEVE Axiom or Affymetrix UK Biobank Axiom array. These arrays were augmented by imputation of ~90 million genetic variants from the 1000 Genomes and UK10K projects, Haplotype Reference Consortium. Detailed information of cohorts, genotyping, imputation, quality control approaches, and association analysis please refer to the published studies [19, 20].
We paid attention to the susceptibility mechanisms modified by genetic mutations for chronic lower respiratory diseases. Phenotypes with more than 5,000 cases were selected for subsequent analyses, including asthma with 28,628 cases, COPD with 9,266 cases and pneumonia with 9,774 cases. All individuals clinically defined from hospital episode statistics were coded as cases, while all other individuals were considered as controls .
GWAS summary datasets
We downloaded the summarized data of genome-wide imputed variants from GeneALTAS website (http://geneatlas.roslin.ed.ac.uk). We included variants with the quality score of imputation >0.9 and 13,324,371 SNPs remained. We further performed the quality control based on European population from 1000 Genomes Project (Phase 3) with the following criteria: (i) variants having a minor allele frequency (MAF) >0.01; (ii) variants deviated from the Hardy-Weinberg equilibrium (P>1.0×10-6). After the procedure of quality control, 7,330,104 variants were finally retained for the following analysis. The study design is shown in Figure 1.
We calculated the genetic correlations (rg) between diseases using linkage disequilibrium score regression (LDSC), which requires only GWAS summarized statistics and is not biased by sample overlap . LDSC quantifies the extent to which two phenotypes share genetic etiology based on the patterns of LD found across the genome (https://github.com/bulik/ldsc). A conservative Bonferroni-corrected method was used to determine significant correlations (Pbonferroni<0.05).
TWAS analysis using FUSION
We downloaded FUSION  software (http://gusevlab.org/projects/fusion/) along with its prepackaged weights for gene expression data in lung tissue derived from the Genotype-Tissue Expression Project (GTEx). FUSION estimated the heritability of genes explained by cis-SNPs (SNPs within 1 Mb region surrounding the TSS) and restricted TWAS analysis to include cis-heritable genes (P<0.01). Then, the effect sizes of cis-SNPs on gene expression (i.e. expression weights) were estimated using predictive linear models (Elastic Net, LASSO, GBLUP, and BSLMM). For each gene, FUSION estimated the z-score of the expression and a complex trait (ZTWAS) as a linear combination of the vector of GWAS summary Z scores at a given cis-locus with weight vector W derived from the reference panels. However, the imputed z-score of expression and trait (WZ) has variance WVWt, where V is a covariance matrix across SNPs at the locus (i.e., LD), as:
MAGMA’s gene association test
The SNP-based P values were used for gene-based analysis using MAGMA  software (http://ctg.cncr.nl/software/magma), a novel tool for gene and gene-set analysis. Total 19,427 protein-coding genes from the database (NCBI 37.3) were used for SNP annotation. Then, MAGMA used a multiple regression approach to properly incorporate LD between markers and to detect multi-marker effects for a genome-wide gene association analysis. For both two gene-based methods, we applied a stringent Bonferroni correction to account for multiple testing and associations with Pbonferroni <0.05 were considered as statistically significant.
Functional enrichment of defined CRVs
We first defined credible risk variants (CRVs) as SNPs located within 500 kb upstream or downstream of the associated genes and with P values within two orders of magnitude of the most significant SNP in each locus. To investigate the enrichment of CRVs in chromatin regulatory regions, Fisher’s exact test was used to estimate the distribution of the above CRVs in active promoter and enhancer regions defined in NHLF and A549 cell types by calculating the fold-enrichment against the background of 1,000 genomes (other SNPs in the defined locus). Chromatin state data in four human cell types of other tissues (HSMM, HESC, NHEK, and GM12878) were also included in our analysis for comparison. All the histone modifications of promoter and enhancer marks (H3K4me3, H3K9ac, H3K4me1, and K3K27ac) were downloaded from the UCSC Genome Browser.
Functional annotation using INQUISIT algorithm
To further detect the regulation mechanisms underlying identified genes, we performed functional annotation with the INQUISIT algorithm . We calculated a score for each gene by assessing the potential impact of each CRV on regulatory (proximal or distal gene regulation) or coding features. The INQUISIT score was contributed with multiple lines of evidence including Hi-C chromatin interaction information, enhancer-target gene predictions, topologically associated domains, histone modification marks, transcription factor binding sites and expression quantitative trait loci (eQTLs) in lung-related tissues. Details of the algorithm and scoring strategy have been described previously elsewhere .
Functional exploration for the best GWAS SNP
To investigate the genetic effects of variants on gene regulation, we conducted functional annotation for the best hit in each novel susceptibility region. The best hit was referred to the most significant GWAS SNP within cis-locus (500 kb upstream or downstream of the susceptibility gene). We performed annotations for variants within promising genes using ANNOVAR software . The functional effects of missense variations were predicted using the SIFT  and PolyPhen  databases. To investigate the potential function of the association at non-coding regions, we utilized data from the Genotype-Tissue Expression (GTEx, http://www.gtexportal.org/, version 7) to perform the expression quantitative trait loci (eQTL) analyses in 383 lung tissue samples. To further map the variants to potential regulatory elements, we annotated SNPs according to the histone Chip-seq (H3K27AC, H3K4ME1, H3K4ME3) peaks and DNase I hypersensitivity sites (DHS) from ENCODE Project Consortium (http://genome.ucsc.edu/ENCODE). We downloaded these features measured in NHLF cell lines from the UCSC genome browser (http://genome.ucsc.edu/).
Pathway enrichment analysis
We performed pathway enrichment analysis on genes with integrated score ≥1 defined by functional annotation to further explore the biological process. The analysis was conducted with the combined genes of three traits as input considering the high genetic correlations between traits. We used the Reactome Pathway Database  as a reference, which was implemented in R package “clusterProfiler” . Bonferroni method was used for multiple correction and pathways with adjusted P-value <0.05 were retained.