Study design
This study is based on large-scale GWAS summary datasets, eQTLs datasets, expression datasets, and case-control expression datasets. All participants gave informed consent in all these corresponding original studies. All relevant data, analytic methods, and study materials are within the paper. In addition, this study does not involve animal models.
GWAS datasets
We first selected three different IS GWAS dataset resources. The first IS GWAS dataset resource is from the largest multiancestry meta-analysis of stroke GWAS datasets conducted by MEGASTROKE [22]. Here, we limited our analysis in participants of European ancestry including 34,217 IS cases and 406,111 controls. According to the Trial of Org 10172 in Acute Stroke Treatment classification criteria, IS consisted could be mainly divided into three subtypes including large artery atherosclerotic stroke (LAS, 4,373 cases and 406,111 controls), cardioembolic stroke (CES, 7,193 cases and 406,111 controls), and small vessel stroke (SVS, 5,386 cases and 406,111 controls) [22]. The second IS GWAS dataset resource is from the UK Biobank including 1,501 IS cases with cerebral artery occlusion, and 399,017 controls [23]. More detailed information is provided in PheWeb (http://pheweb.sph.umich.edu/SAIGE-UKB/). The third IS GWAS dataset resource is from the Million Veteran Program (MVP) including 1,198 IS cases with cerebral artery occlusion, and 331,601 controls, as described in a recent study [17]. To be a comparison, we also selected the large-scale GWAS datasets for RA and CAD. The RA GWAS dataset is from a large-scale meta-analysis in Eurpean ancestry including 14,361 RA cases and 43,923 conrols [29]. The CAD GWAS dataset is from the CARDIoGRAMplusC4D consortium ((Coronary ARtery DIsease Genome wide Replication and Meta-analysis (CARDIoGRAM) plus The Coronary Artery Disease (C4D) Genetics) including 60,801 CAD cases, and 123,504 controls, most of which are of Eurpean ancestry [30].
eQTLs datasets
The rs7529229 mutation is a no-coding variant in IL-6R gene, and may regulate the expression of IL-6R gene. Hence, we examine the association between rs7529229 variant and IL-6R gene expression using multiple eQTLs dataset resources. The first eQTLs dataset resource is from the UK Brain Expression Consortium (UKBEC), which is publicly available in Brain eQTL Almanac (Braineac) database [24]. The gene expression levels were measured using the Affymetrix GeneChip Human exon 1.0 ST arrays [24]. Braineac included 10 eQTLs datasets in 10 brain tissues from 134 neuropathologically normal individuals with European descent [24].
The second eQTLs resource is from the Genotype-Tissue Expression (GTEx) project (version 8) including 49 tissues (number of samples with genotype >= 70), 828 donors and 15201 samples [27]. The gene expression levels were measured using the Illumina TruSeq RNA sequencing and Affymetrix Human Gene 1.1 ST Expression Array (V3; 837 samples) [27].
The third eQTLs resource is from the eQTLGen Consortium [26]. This consortium conducted a large-scale meta-analysis in 31,684 human whole blood samples from 37 cohorts and the majority individuals were of European ancestry [26]. The gene expression levels were profiled by Illumina, Affymetrix U291, Affymetrix HuEx v1.0 ST expression arrays and RNA-seq [26].
IS case-control gene expression dataset
To evaluate the potential differential expression of IL-6R gene, we performed a IS case-control gene expression analysis in whole blood using a gene expression dataset from the Gene Expression Omnibus (GEO) database (GSE16561). In GSE16561 dataset, gene expression profiling was measured in the peripheral whole blood of 39 IS patients (17 males and 22 females) and 24 healthy control subjects (10 males and 14 females) using Illumina microarrays [28]. All these 63 participants are of European ancestry [28].
Genetic association analysisof IL-6R rs7529229
We first extracted the corresponding summary statistics of rs7529229 variant in three IS GWAS dataset resources including MEGASTROKE, UK Biobank and MVP. We then conducted a meta-analysis to evaluate the association between rs7529229 variant and IS using R Package (meta: General Package for Meta-Analysis). The overall odds ratio (OR) is calculated by the fixed effect model (Mantel-Haenszel) or random-effect model (DerSimonian-Laird), which is determined by the heterogeneity among these three resources [31]. We further investigated the association of rs7529229 variant with the IS subtypes (LAS, CES, and SVS), RA, and CAD using the corresponding GWAS summary statistics. The statistical significance for the association between rs7529229 and one specific phenotype is a Bonferroni-corrected threshold 0.05/6=0.0083. Meanwhile, the original P values between 0.0083 and 0.05 were considered to be suggestive association.
eQTLs analysis
In Braineac, we first downloaded the IL-6R gene expression data and the genotype data of generic variants with 1Mb upstream of transcription start site and 1Mb downstream of transcription end site [24]. We then evaluated the potential association between rs7529229 variant and IL-6R expression using a linear regression analysis under an additive model by adjusting for several critical covariates including the brain bank, gender and batch effects in Partek’s Genomics Suite v6.6 [24].
In GTEx, eQTLs analysis was performed using FastQTL with following covariates: top 5 genotyping principal components, a set of covariates identified using the Probabilistic Estimation of Expression Residuals (PEER) method (the number of PEER factors was determined as function of sample size (N): 15 factors for N<150, 30 factors for 150≤ N<250, 45 factors for 250≤ N<350, and 60 factors for N≥350), sequencing platform (Illumina HiSeq 2000 or HiSeq X), sequencing protocol (PCR-based or PCR-free), and the gender [27]. Detailed information for the laboratory methods and analysis methods was provided in the original paper and the GTEx website (https://www.gtexportal.org/home) [27].
In eQTLGen, a data-driven method was used to integrate the gene expression data from platforms [26]. For a given SNP, genes within 1Mb upstream and 1Mb downstream around this SNP were selected according the center position of the gene [26]. Then the eQTLs analysis was conduced by a Spearman correlation [26].
Gene expression analysis of IL-6R in GTEx
We conduct a gene expression analysis to investigate the potential IL-6R expression difference in different human tissues using the gene expression data in GTEx (version 8). The gene expression level was quantified by transcripts per million (TPM) based on the GENCODE 26 annotation, collapsed to a single transcript model for each gene using a custom isoform collapsing procedure [27]. Here, we selected the T test or analysis of variance (ANOVA) method to evaluate the potential difference of IL-6R expression in different human tissues. The statistical significance is P < 0.05.
IS case-control gene expression analysis
Here, we performed a differential expression analysis using the NCBI web application GEO2R (http://www.ncbi.nlm.nih.gov/geo/geo2r/) [32]. GEO2R could invoke the Bioconductor R packages to transform and analyze GEO datasets [32]. Evidence has showed the sex differences in IS epidemiology, presentations, and outcomes [33]. Hence, we further conducted a subgroup analysis in females and males, respectively. Here, we define P < 0.05 to be the significance level of differential expression of IL-6R gene in IS patients and healthy control subjects.