Aims
The first aim of this study was to see whether we could replicate the association between measured serum urate and respiratory outcomes. The second aim was to examine whether there was any evidence supporting a causal association with genetically predicted serum urate and respiratory outcomes (Mendelian randomization) in people with a history of smoking cigarettes. We also examined the associations between urate and respiratory function as a phenotype that might also be influenced by endogenous antioxidant activity.
Data source
We used The UK Biobank Resource, a prospective cohort study of over 500,000 participants aged 40–69 years, recruited between 2006–2010 from around the UK12. Further information on UK Biobank such as the processing of biological samples including DNA is available at the following: https://www.ukbiobank.ac.uk/. The quality control and imputation of SNPs, indels and structural variants are reported elsewhere13.
Study design
The methods we report are similar to our earlier study on urate using UK Biobank14. In brief, we analysed the longitudinal relationship between serum urate levels and lung cancer and the cross-sectional relationship between serum urate and forced expiratory volume in 1 second (FEV1). We estimated the causal relationships between urate levels and these outcomes by applying Mendelian randomization (MR) to individual-level data. The protocol was approved by UK Biobank in July 2018 (ID:5167) and we checked the sample size using online tools (http://cnsgenomics.com/shiny/mRnd/).
Inclusion/exclusion criteria
We excluded people who no longer wished to participate in UK Biobank up to August 2020 and applied several genetic exclusions including outliers for genotype missingness or excess heterozygosity, sex aneuploidy and sex discordance (n = 2200). We used a published algorithm to retain unrelated participants15 (n = 39,642) and finally restricted the sample to “white British” participants based on self-reported ethnic identity and principal components available in the dataset (n = 88,341)13. We set the cohort start date at the date when the participant attended the research centre and the exit date was the earliest date of lung cancer diagnosis, loss to follow-up, death or end of the follow-up period. At the time of analysis, the most recent date for complete follow-up for incident cancers was March 2016 for England and Wales and October 2015 for Scotland. Prevalent lung cancer cases were excluded (n = 512).
Exposures
Almost all participants provided blood samples at the initial assessment centre visit. Serum urate was assayed in theses samples by Uricase PAP (Beckman Coulter AU5800). We selected 31 SNPs for estimating genetically predicted urate levels based on the results of a large-scale Genome Wide Association Analysis (GWAS) of European people16. The two lead GWAS SNPs (rs12498742 and rs2231142) are located in renal and gut urate transporters17 and we analysed these separately as well as in combination with the 28 weaker variants.
Outcomes
The primary outcome was a new lung cancer diagnosis recorded after study recruitment. Cancer diagnoses in UK Biobank are provided by the NHS Central Register for participants living in Scotland and the Health & Social Care Information Centre for participants living in England and Wales. Diagnoses are coded using the International Classification of Disease (ICD) version 9 and 10 and we selected malignant neoplasms of the trachea and bronchus (ICD10: C33-C34) as the cancers where smoking has the strongest pathophysiological role and highest attributable risk18. In addition to the national cancer registries, we used self-reported cancer diagnosis to identify prevalent cancers.
Other risk factors for lung cancer are potentially on the causal pathway between urate antioxidant activity and lung cancer. We examined family history of lung cancer and comorbidity for chronic obstructive pulmonary disease (COPD) or emphysema separately as potential mediators of the relationship with lung cancer.
Other variables
We included important predictors of lung cancer in analyses including age, calendar year, genetic sex, population sub-structure (first 40 principal components) recruitment centre, height, weight and self-reported smoking status19, 20. Weight is strongly associated with urate levels and there is evidence that weight is causally associated with lung cancer21 and the lead GWAS SNP (rs12498742) is located in a gene that has a role in glucose homeostasis (SLC2A9) that could potentially influence weight. Therefore, we examined models with and without this variable. In a subset of people with a history of regular smoking, we further adjusted for waist circumference, exposure to smoke at home, Townsend social deprivation index, antioxidant supplements, alcohol intake and nitrogen dioxide air pollution.
Interactions
We fitted models separately for men and women given the different levels average urate levels as well as evidence of differential genetic effects of SNPs on urate levels. We previously reported strong interactions between urate and smoking status with no clear association in non-smokers but strong negative associations in current smokers22. We therefore estimated associations by self-reported smoking status (never, former and current) and smoking intensity (1–19 cigarettes per day or 20 or more cigarettes per day) by including multiplicative interaction terms in the models for each sex. Pack-years of smoking was available for a subset of participants and we described continuous-by-continuous interactions with urate.
Statistical analyses
Serum urate levels were divided into sex-specific quintile categories to describe the univariable associations with other covariates. We identified and excluded outlier values for continuous variables using multivariate approach (blocked adaptive computationally efficient outlier nominators algorithm) including age and sex with a 15% threshold of the chi-squared distribution23. To estimate the observational incidence rate ratios (IRRs) per 100 µmol/L increase in serum urate, we used multivariable Poisson regression with robust standard errors and age as the time scale. We explored non-linear relationships by applying restricted cubic spline-interpolation using Harrell’s default percentiles and selecting the transformation that minimised the Akaike and Bayesian information criteria (AIC/BIC). To easily visualise non-linear transformations and interactions, we calculated the margins of response as adjusted incidence rates at different levels of urate while holding all other variables at their observed values. We applied a user-written programme for data visualisation24 and standard errors for marginal effects were calculated using the delta method. We checked for proportionality of associations with age by testing interaction terms. All continuous variables were parameterised as linear and Wald tests were used for calculating p-values for categorical variables and spline transformations.
We estimated the IRRs for lung cancer per 100 µmol/L increase genetically predicted urate using one-sample MR and the two-stage predictor substitution (2SPS) method19. We used a similar approach, the two stage least squares method (2SLS), to estimate the causal cross-sectional relationship per 100 µmol/L increase genetically predicted urate and FEV119. FEV1 was missing not at random for approximately 25% of participants and we used inverse probability weighting in an attempt to reduce the impact of any selection bias. After applying the ERS/ATS criteria for FEV1 reproducibility, FEV1 was missing for 50% of smokers and we decided against this analysis.
Relatives were excluded using an algorithm in R (v.3.5.1)15 and all other analyses were done using Stata v.16.1 (Stata Corporation, College Station, Texas).