Study design
This study is based on large-scale GWAS summary datasets, eQTLs datasets, expression datasets, and case–control expression datasets. All participants gave their informed consent in the corresponding original studies. All relevant data, analytic methods, and study materials are within the paper. This study does not use animal models.
GWAS datasets
We selected three different IS GWAS dataset resources. The first is from the largest multi-ancestry meta-analysis of stroke GWAS datasets conducted by MEGASTROKE [38]. Here, we limited our analysis to participants of European ancestry including 34,217 IS cases and 406,111 controls. These include the following numbers of IS subtypes, based on Trial of Org 10172 in Acute Stroke Treatment classification criteria: large artery atherosclerotic stroke (LAS, 4,373 cases and 406,111 controls), cardioembolic stroke (CES, 7,193 cases and 406,111 controls), and small vessel stroke (SVS, 5,386 cases and 406,111 controls) [38]. The MEGASTROKE IS GWAS dataset is publicly available from http://www.megastroke.org/index.html. The second IS GWAS dataset resource is from UK Biobank and is publicly available from PheWeb (http://pheweb.sph.umich.edu/SAIGE-UKB/). It includes 1,501 IS cases with cerebral artery occlusion and 399,017 controls [39]. The third IS GWAS dataset resource is from the Million Veteran Program (MVP), and includes 1,198 IS cases with cerebral artery occlusion and 331,601 controls [19].
As a comparison we selected large-scale GWAS datasets for RA and CAD. The RA GWAS dataset is from a large-scale meta-analysis of individuals with European ancestry including 14,361 RA cases and 43,923 controls [45], which is publicly available from http://plaza.umin.ac.jp/~yokada/datasource/software.htm. The CAD GWAS dataset is from the CARDIoGRAMplusC4D consortium ((Coronary ARtery DIsease Genome wide Replication and Meta-analysis (CARDIoGRAM) and The Coronary Artery Disease (C4D) Genetics) including 60,801 CAD cases and 123,504 controls, most of European ancestry [46]. It is publicly available from http://www.cardiogramplusc4d.org/data-downloads/. C4D GWAS is a meta-analysis of GWAS studies of individuals of European and South Asian descent (PROCARDIS, HPS, PROMIS, and LOLIPOP) involving 15,420 CAD cases and 15,062 controls [47].
eQTLs datasets
We examined the association between rs7529229 and IL-6R expression using multiple eQTLs dataset resources. The first eQTLs dataset resource is from the UK Brain Expression Consortium, which is publicly available from the Brain eQTL Almanac (Braineac) database [40]. Gene expression levels were measured using Affymetrix GeneChip Human exon 1.0 ST arrays [40]. Braineac includes 10 eQTLs datasets of 10 brain tissues from 134 neuropathologically healthy individuals of European descent [40].
The second eQTLs resource is from the Genotype-Tissue Expression (GTEx) project (version 8) including 49 tissues (each with 70 samples with genotype data), 828 donors, and 15201 samples [43]. Gene expression levels were measured using Illumina TruSeq RNA sequencing and Affymetrix Human Gene 1.1 ST Expression Array (V3; 837 samples) [43].
The third eQTLs resource is from the eQTLGen Consortium [42]. This consortium conducted a large-scale meta-analysis in 31,684 human whole blood samples from 37 cohorts, with the majority of European ancestry [42]. Gene expression levels were profiled by Illumina, Affymetrix U291, Affymetrix HuEx v1.0 ST expression arrays, and RNA-seq [42].
IS case–control gene expression dataset
To evaluate the differential expression of IL-6R, we performed an IS case–control gene expression analysis in whole blood using a gene expression dataset from the Gene Expression Omnibus (GEO) database (GSE16561). In this dataset, gene expression profiling was measured in the peripheral whole blood of 39 IS patients (17 men and 22 women) and 24 healthy controls (10 men and 14 women) using Illumina microarrays [44]. All 63 participants were of European ancestry [44].
Genetic association analysisof IL-6R rs7529229
We first extracted corresponding summary statistics of the rs7529229 variant from three IS GWAS dataset resources including MEGASTROKE, the UK Biobank, and MVP. We then conducted a meta-analysis to evaluate the association between rs7529229 and IS using R Package (meta: General Package for Meta-Analysis). The overall odds ratio (OR) was calculated by the fixed effects model (Mantel–Haenszel) or random effects model (DerSimonian–Laird), which was determined by the level of heterogeneity among these three resources [48]. We further investigated the association of rs7529229 with IS subtypes (LAS, CES, and SVS), RA, and CAD using corresponding GWAS summary statistics. The statistical significance for the association between rs7529229 and one specific phenotype was a Bonferroni-corrected threshold 0.05/6=0.0083. Original P values between 0.0083 and 0.05 were considered to be suggestive of an association.
eQTLs analysis
In Braineac, we first downloaded IL-6R expression data and genotype data of variants within 1 Mb upstream of the transcription start site and 1 Mb downstream of the transcription end site [40]. We then evaluated the potential association between rs7529229 and IL-6R expression using linear regression analysis under an additive model by adjusting for several critical covariates including the brain bank, gender, and batch effects in Partek’s Genomics Suite v6.6 [40].
In GTEx, eQTLs analysis was performed using FastQTL with the following covariates: top five genotyping principal components, a set of covariates identified using the Probabilistic Estimation of Expression Residuals (PEER) method (the number of PEER factors was determined as a function of sample size [N]: 15 factors for N<150, 30 factors for 150≤ N<250, 45 factors for 250≤ N<350, and 60 factors for N≥350), sequencing platform (Illumina HiSeq 2000 or HiSeq X), sequencing protocol (PCR-based or PCR-free), and sex [43]. Detailed information for laboratory and analytical methods was provided in the original paper and the GTEx website (https://www.gtexportal.org/home) [43].
In eQTLGen, a data-driven method was used to integrate gene expression data from platforms [42]. For a given single nucleotide polymorphism (SNP), genes within 1 Mb up/downstream were selected according to the central position of the gene [42]. eQTLs analysis was conducted by a Spearman correlation [42].
Gene expression analysis of IL-6R in GTEx
We conducted gene expression analysis to investigate IL-6R expression differences in different human tissues using gene expression data in GTEx (version 8). The gene expression level was quantified by transcripts per million (TPM) based on the GENCODE 26 annotation, then collapsed to a single transcript model for each gene using a custom isoform collapsing procedure [43]. Here, we selected the T test or analysis of variance method to evaluate the potential difference of IL-6R expression in different human tissues. Statistical significance was set at P < 0.05.
IS case–control gene expression analysis
We performed a differential expression analysis using the NCBI web application GEO2R (http://www.ncbi.nlm.nih.gov/geo/geo2r/) [49] to invoke the Bioconductor R packages to transform and analyze GEO datasets [49]. Evidence has shown the existence of sex differences in IS epidemiology, presentations, and outcomes [50]. Hence, we further conducted a subgroup analysis by sex. We defined P < 0.05 as the significance level of differential expression of IL-6R in IS patients and healthy controls.