2.1 UK Biobank samples and mental phenotypes
The phenotypic and genotypic data of this study were derived from UK Biobank health resource under UK Biobank application 46478, which was a population-based prospective cohort study. Between 2006 to 2010, UK Biobank collected 502,656 participants aged 40 and 69 at recruitment. UK Biobank cohort has collected a rich variety of phenotypic, health-related information on each participant, including physical and biological measurements, lifestyle indicators, imaging of the body and brain and genome-wide genotyping. Longitudinal follow-up for a wide range of health-related information are provided by linking health and medical records.
Several potential measures of smoking behavior were selected to define the phenotype of ever smoking. The UK Biobank data field of 20432 was described as ongoing behavioural or miscellanous addiction. Anxiety and depression were defined according to the previous study , which were based on the general anxiety disorder (GAD-7) and Patient Health Questionnaire (PHQ-9) [30, 31]. Fluid intelligence score was described as a simple unweighted sum of the number of correct answers given to the 13 fluid intelligence questions. The maximum number of reported past or current cigarettes (or pipes/cigars) consumed per day was used to define the frequency of smoking (UK Biobank data fields: 20116, 2887 and 3456). In addition, the frequency of alcohol consumption (UK Biobank data field: 20117) was defined as the sum of all alcoholic beverages per week. Ethical approval of UK Biobank study was granted by the National Health Service National Research Ethics Service (reference 11/NW/0382). The detailed definition of phenotypes are shown in Additional file 1.
2.2 UK Biobank genotyping, imputation and quality control
A total of 488,377 middle-aged adults have genome-wide genotype data, which were assayed by two similar genotyping array. DNA was extracted from stored blood samples when participants visited to a UK Biobank assessment Centre. Genotyping was carried out by Affymetrix UK BiLEVE Axiom Array or the Affymetrix UK Biobank Axiom arrays (Santa Clara, CA, USA), which shared 95% of marker content. Imputation was conducted by IMPUTE4 (https://jmarchini.org/software/) to carry out in chunks of approximately 50,000 imputed markers with a 250 kb buffer region. Routine quality checks were carried throughout the process, including sample retrieval, DNA extraction and genotype calling. Statistical tests were performed to identify poor quality markers by checking for consistency of genotype calling across experimental factors, including batch effects, plate effects, departures from Hardy-Weinberg equilibrium (HWE), sex effects, array effects, and discordance across control replicates. Based on self-reported ethnicity (UK Biobank data field: 21000), the individuals were restricted to only “white British”. Finally, the participants who reported inconsistencies between self-reported gender or genetic gender, who were genotyped but not imputed, and who withdraw their consents, were excluded in the current study. Detailed description of array design, genotyping and quality control procedures can be found in previous studies [32, 33].
2.3 GWAS data of sex hormone traits
The SNPs associated with sex hormone traits were derived from a published GWAS . Briefly, the published GWAS analyzed four sex hormone traits, including sex hormone-binding globulin (SHBG), testosterone, bioavailable testosterone and estradiol. Association test was conducted to account for cryptic population structure and relatedness by linear mixed models implemented in BOLT-LMM (v2.3.2). Genotypic data was derived from the ‘v3’ release of UK Biobank , which contained the full set of Haplotype Reference Consortium (HRC) and 1000 Genomes imputed variants. The SNPs with significant threshold of p value < 5×10-8 were selected to calculate PRSs. Detailed description of sample characteristics, array design, quality control and statistical analysis can be found in the previous study .
2.4 PRS of sex hormone traits
Using the genotype data of UK Biobank cohort, PRS calculation was performed by using the PLINK’s “--score” command . Briefly, PRS denotes the PRS of the sex hormone traits for the ith subjects, defined as , where n (n = 1, 2, 3, …, t) and i (i=1, 2, 3, …, k) denote the number of genetic markers and the sample size, respectively. βn is the effect parameter of risk allele of the nth significant SNP related to sex hormone traits, which obtained from the published study. is the dosage (0 to 2) of the risk allele of the nth SNP for the ith subject. In addition, we have excluded Linkage Disequilibrium (LD) when calculating PRSs by using the command “--indep-pairwise” implemented in PLINK with parameters window size (500kb), step size (5 variant ct) and r2 < 0.5.
2.4 Statistical analysis
Four serum sex hormone traits, including SHBG, testosterone, bioavailable testosterone and estradiol, were analyzed both within and across sexes, with the exception of estradiol where analyses were performed only in men. Logistic regression model was performed to assess the associations between individual PRSs of sex hormone traits and ever smoking and ongoing behavioural or miscellanous addiction, respectively. Correspondingly, linear regression model was conducted to evaluate the correlations between individual PRSs of sex hormone traits and anxiety score, depression score, fluid intelligence score, and the frequency of alcohol and smoking, respectively. The regression analyses were conducted by R software (version 3.5.3). Additionally, the sex, age, and 10 principle components of population structure were used as covariates in the regression model.
2.5 Genome-wide genetic interaction study
Based on the result of regression model, GWGIS was performed to assess the interaction effects between genetic factors and sex hormone traits for fluid intelligence, and the frequency of smoking per day and alcohol consumption per week in UK Biobank cohort. The GWGIS was conducted by PLINK2.0 [34, 35]. Letting D is the disease outcome variable, the penetrance models form is described as the following:
logit[𝑃(𝐷 = 1|𝐺, 𝐸)] = 𝛽0 +𝛽gG+𝛽eE +𝛽geGE
where G is genetic factors and E is the environmental factors . In this study, the outcome variables were fluid intelligence score and the frequency of smoking per day and alcohol consumption per week. And the instrumental variables were the PRS of serum sex hormone traits. The HWE p value < 0.001 or minor allele frequencies (MAFs) < 0.01 or the SNPs with low call rates (< 0.90) were excluded in the current study for quality control. To avoid the influence of population stratification, cryptic relatedness and null model misspecifications on our results, we calculated the inflation factor of GWGIS. Significant interactions were identified at p value < 5.0 × 10–8 in this study. Rectangular Manhattan plot was generated using the “CMplot” R script (httcps://github.com/YinLiLin/R-CMplot).