Study population
A two-stage case‐control study was conducted. In the discovery stage participants were recruited from two independent case-control studies in the urban areas of Beijing, China. First, the study of Adolescent Lipids, Insulin Resistance and Candidate Genes (ALIR) included 151 normal-weight, 400 overweight and 386 obese children aged 7-18 years old. Second, the baseline of Comprehensive Prevention Project for Overweight and Obese Adolescents (CPOOA) collected 456 normal-weight, 318 overweight and 319 obese children aged 14-17[22, 23]. Individuals with any cardiovascular or metabolic related diseases were excluded as well. BMI is calculated by dividing weight (Kg) by square of height (m2). According to the BMI percentile criteria, children with an age- and sex-specific BMI ≥95th percentile were classified into obese group, and those with a BMI between 85th to 95th percentile were classified into overweight group, whereas those with a BMI between 15th to 85th percentile were normal-weight. The children with BMI ≥97th percentile were defined as severely obese [24]. Both studies were approved by the Ethics Committee Board of Peking University Health Science Center.
In the replication stage, information was extracted from UKB database. The UKB database is a cohort of approximately half a million individuals aged 40 to 69 years across the United Kingdom. The UKB data are available on application to the UKB (www.ukbiobank.ac.uk/). This research was conducted using the UKB data under Application Number 44430. A BMI between 18.5-25 is classified as normal, 25-30 as overweight, 30-35 as obesity and more than 35 as severely obesity[25]. The study only included Caucasian people and proposed individuals with no blood relationship. Appling the 1st, 2nd and 3rd Principal Component (PC) filtering and selecting those with information of age, sex, BMI and Townsend deprivation index individuals. Finally, 264,838 non-related individuals of self-reported British descent from the UK Biobank were included in the present study. The summary-level of GWAS data used in present study are publicly available. Therefore, no specific ethical approval is needed.
Genotyping quality control and imputation
In the discovery stage, 9 variants located in MAP2K5 genes, with 2 variants from published literatures[5, 26] and 7 Tag SNPs base on CHB database from 1000 Genomes Project were selected for genotyping. Genotyping was performed with the matrix-assisted laser desorption ionization time of flight mass spectrometry (MALDI-TOF MS, Agena, San Diego, CA, USA). The call rates were above 99.4% and 3 variants were excluded for strong linkage disequilibrium (r2>0.80) in the present study. The remained 6 variants were shown for the basic information of genotyping (Table S1).
In the replication stage, the genotype data of SNPs located in the MAP2K5 region were extracted from UKB GWAS database. The SNPs selection was conducted under the following criteria: (i) imputation quality score (INFO) ≥0.9; (ii) genotyping call rate≥ 95%; (iii) minor allele frequency (MAF) in controls ≥0.01; or (iv) Hardy–Weinberg equilibrium (HWE) ≥1 × 10−6. Finally, 2,994 SNPs were included into the subsequent analyses.
eQTL analyses and in-silico functional annotations
eQTL analyses was performed with Genotype-Tissue Expression (GTEx; http://www.gtexportal.org/home/) Project expression quantitative trait locus (eQTL) database (V8 release) [27]. Variant‐gene paired eQTL analysis results were conducted in the subcutaneous adipose and whole blood tissues, individually.
To explore the potential molecular functions of the colocalization genes and corresponding variants, we performed in-silico functional annotations with several prediction aspects, including histone modification sites (H3K4me1, H3K4me3 and H3K27ac) from the Encyclopedia of DNA Elements (ENCODE). All outcomes were visualized in UCSC browser[28]. Transcription factor binding site analyses were performed with the JASPAR 2020[29] and enhancer element locator algorithm[30]. Finally, pathway enrichment analyses were adopted to explore which signaling pathway did the transcription factor involve via GO, KEGG and REAC pathways[31].
Dual luciferase reporter assays (DLRA)
Luciferase constructs were generated to encompass corresponding variants. The rs7175517 (chr15:68077130-68078130 GRCh37) is located in the intro region of MAP2K5 and was cloned into the pGL3-Promoter vector (Promega, Madison, WI, USA). The constructed plasmids were sequenced to confirm the accuracy. HEK293T cells were plated into 24-well plates into each well (7.5 × 104) and co-transfected the plasmids (100ng/well) of interest next day with pRL-SV40 Renilla Luciferase Control Vector (10ng/well, Promega, Madison, WI, USA) using Lipofectamine 2000 reagent (Invitrogen, Carlsbad, CA, USA). After 48 hours of culturing, the cells were lysed, and luciferase activity was measured using the Dual-Luciferase Reporter Assay System (Promega, Madison, WI, USA). Relative luminescent signals were calculated by normalizing luciferase signals with Renilla signals. In total, 3 independent transfection experiments with triplicates for each condition were conducted.
Electrophoretic mobility shift assay (EMSA)
Nuclear extracts were prepared from HEK293T using the NE-PER Nuclear and Cytoplasmic Extraction kit (Thermo Scientific, Waltham, MA, USA). DNA oligonucleotides for each variant were synthesized with 5’-biotin labeling and HPLC purified by Genscript (Nanjing, China; probe sequences are listed in Table S5). Double-stranded DNA probes were prepared by combining sense and antisense oligonucleotides, heat annealing, and slow cooling. Probes and HEK293T cell nuclear extracts were then incubated by using the LightShift EMSA Optimization & Control Kit (Thermo Scientific) at 4°C for 20 mins. For competition assays, unlabeled competitors at 25-fold, 50-fold or 100fold excess oligonucleotides were added to the reaction mixture 10 mins before the addition of labeled probes. After incubation, binding reactions were separated on a 6% polyacrylamide gel and transferred blots were developed using the Chemiluminescent Nucleic Acid Detection Module (Thermo Scientific) and signals were visualized with the ChemiDoc XRS+ scanner (BIO-RAD, Louisville, KY, USA).
DNA Pull-down and Protein mass spectrometry
The biotin labeled probe and magnetic beads were placed on the 4 ℃ freezer turn over incubation for 6 ~ 8 hours, the nuclear extracts were incubated with the magnetic beads DNA probe complexes placed on the 4 ℃ freezer turn over incubation overnight, after washing to remove nonspecific binding proteins. Finally, the eluate was subjected to elution to obtain the product of interest, which was then subjected to protein mass spectrometry to identify the protein. Protein mass spectrometry was conducted in the central lab of Nanjing Medical University.
Statistical analyses
Hardy-Weinberg equilibrium of all the genotypes were analyzed with χ2 test. Logistic regression and linear regression analyses were conducted to analyze the effect of genetic variants on overweight and obesity (categorical variables) or BMI, individually. The adjusted covariables were age, sex and study population. For the UKB data analysis, age, sex, income, educational attainment (income and educational attainment was replaced by Townsend deprivation index for the forward step wise regression with 2,297 SNPs) and the top three principal components were adjusted for logistic and linear regressions. All genetic regression analyses were performed under an additive model and conducted with Plink software (v.1.9). The linkage disequilibrium (LD) between variants was tested by calculating r2 with Haploview 4.2. In terms of genetic associations analyses, statistical significance were considered when P values <5×10−8. For the in vitro experiments, an unpaired student t test was used to compare the mean value of different conditional triplicates and two-sided P values <0.05 were considered significant unless otherwise specified. Then pathway enrichment analyses were performed for the proteins identified by mass spectrometry using ‘clusterProfiler’ package in R software (version 3.5.1) (http://bioconductor.org/packages/clusterProfiler/, version 3.6.1)