Participants and setting
The institutional review board (IRB) of the Korean National Institute of Health and the IRB of Hoseo University approved the KoGES and the present study (KBP-2015-055 and 1041231-150811-HR-034-01, respectively). All participants signed a written informed consent form. In this study, 58,701 participants aged 40–74 years were recruited from a large city hospital-based cohort that formed part of the Korean Genome and Epidemiology Study (KoGES) conducted during 2010–2014 [9]. Adults aged over 40 (n = 8842) that participated in the Ansan/Ansung plus rural cohorts during 2001–2002 were used as the replicate study for exploring fat mass-related genetic variants. People with a disease history that might influence energy metabolisms, such as cancers, thyroid diseases, chronic kidney disease, and brain-related diseases, were excluded from the study. The 4,802 and 591 participants were excluded from the urban hospital-based cohort and Ansan/Ansung plus rural cohorts, respectively.
Demographic, Anthropometric, And Biochemical Parameters Of The Participants
The information on age, gender, education, income, alcohol and coffee consumption, physical activity, and smoking history was received during a health interview. Alcohol intake was calculated as the daily amount (g/day) by multiplying the frequency of drinking and the amount at an event [10]. The smoking status was categorized into current smoker (smoked at least 20 cigarettes in the past six months), past smoker (no smoking for at least six months), and never-smoker [10]. The coffee intake was assessed as the weekly drinking frequency and was categorized into three groups by the tertiles of daily coffee intake. Regular physical activity was defined as more than 30 min of moderate physical activity for three or more days per week.
The anthropometric characteristics (height and weight) were measured at the initial visit, as described previously [10]. Body fat and skeletal muscle masses were not determined using the Inbody 3.0 (Cheonan, Korea) based on the BIA method in Ansan/Ansung cohort. However, they were estimated using a method described in a previous study by using a machine learning prediction model generated from the Ansan/Ansung cohort [11] because the appendicular skeletal muscle and fat masses were not measured in the city hospital-based cohort. A doctor measured blood pressure three times via a sphygmomanometer under resting conditions, and the average systolic blood pressure (SBP) and diastolic blood pressure (DBP) were reported. After fasting for more than 12 h, blood was collected in heparin-treated and non-treated tubes, and the plasma glucose concentrations were measured using a Hitachi 7600 Automatic Analyzer (Hitachi, Tokyo, Japan). HbA1c in heparin-treated blood was measured using an automatic analyzer (ZEUS 9.9; Takeda, Tokyo, Japan). Serum total cholesterol, high-density lipoprotein cholesterol (HDL-C), triglyceride, alanine aminotransferase (ALT), aspartate aminotransferase (AST), and creatinine levels were assessed using a Hitachi 7600 Automatic Analyzer. The serum high-sensitive C-reactive protein (hs-CRP) concentrations were measured using a high-sensitivity ELISA kit (Thermofisher, Waltham, MA, USA). The white blood cells (WBCs) were counted from the EDTA-treated blood.
Definition Of Obesity And Metabolic Syndrome
Obesity was defined according to total body-fat mass: men with ≥ 25% (n = 3298) and women with ≥ 30% (n = 7204) considered to be obese and assigned to the high-BF group (a case) in accordance with Korean definitions of obesity. Metabolic syndrome is a cluster of energy, glucose, and lipid disorders categorized according to the 2005 revised National Cholesterol Education Program-Adult Treatment Panel III criteria for Asia [12, 13]. The criteria for MetS were as follows: 1) abdominal obesity (waist circumference ≥ 90 cm for men and ≥ 85 cm for women), 2) elevated fasting blood glucose level (≥ 100 mmol/L) or current use of anti-diabetic medication, 3) elevated blood pressure (average systolic blood pressure ≥ 130 mmHg or diastolic blood pressure ≥ 85 mmHg) or current blood pressure medication use, 4) low HDL-C level (< 40 mg/dL for men and < 50 mg/dL for women), or 5) elevated serum triglyceride level (≥ 150 mmol/L) or current use of anti-dyslipidemic medication. Participants meeting three or more criteria were considered to have MetS.
Usual Food Intake Using A Semi-quantitative Food Frequency Questionnaire (Sqffq)
During the last 12 months of the interview, the food intake of each participant was determined using an SQFFQ designed for the Korean diet, and their accuracy and reproducibility were validated with the three-day food records in four seasons for Koreans [14]. The SQFFQ includes 106 food items that Koreans commonly consume, and the food frequencies were categorized into never or seldom, once per month, two to three times monthly, once or twice weekly, three or four times weekly, five or six times weekly, daily, twice daily, and ≥ 3 times daily. The food amount at a meal was scored as more than, equal to, or less than the regular portion size visualized by photographs of 103 foods. The participants checked the frequencies and the portion size of 106 food items in the SQFFQ. The daily food intake was calculated by multiplying the median of the weekly-consumed frequencies by portion sizes. The food intake is given in grams/day. The daily energy, carbohydrates, fat, protein, vitamin, and mineral intakes were calculated from the SQFFQ results using Can-Pro 2.0 nutrient assessment software designed by the Korean Nutrition Society.
Dietary Patterns By Principal Components Analysis
For dietary pattern analysis, 103 food items in the SQFFQ were classified into 30 predefined food groups, as reported previously [15]. The dietary patterns were designed using principal component analysis of the 30 food groups based on eigenvalues > 1.5, and four dietary patterns explained the criteria [15]. The orthogonal rotation procedure (varimax) yielded four dietary patterns uncorrelated with each other, and foods with ≥ 0.40 factor-loading values were considered the predominant contributors to the assigned dietary pattern [15]. Supplemental Table 1 lists the foods in each dietary pattern. These patterns indicate the participant’s diet types, and they were divided into the Korean-balanced diet (KBD), plant-based diet (PBD), Western-style diet (WSD), or rice-based diet (RBD) according to including foods with ≥ 0.40 factor-loading values.
Dietary inflammatory index (DII)
DII, an index of the pro-inflammatory potential of diets, was calculated from the equation with assigned food and nutrient intakes using their dietary inflammatory weights, including the energy, 32 nutrients, four food products, four spices, and caffeine, as described previously [16]. Because the SQFFQ did not include garlic, ginger, saffron, and turmeric, their intakes were excluded from the DII calculation. DII was calculated by multiplying the dietary inflammatory scores of the 38 food and nutrient components by the daily intakes, and the sums of the scores of 38 items were divided by 100.
Genotyping Using A Korean Chip And Quality Control
The Center for Genome Science at the Korea National Institute of Health determined the participants' genotypes in the Ansan/Ansung and city hospital-based cohorts. The genomic DNA was isolated from whole blood, and genotypes were measured using a Korean Chip (Affymetrix, Santa Clara, CA) designed to examine the disease-related single nucleotide polymorphisms (SNPs) in Koreans [9]. The genotyping accuracy was estimated using Bayesian Robust Linear Modeling in the Mahalanobis Distance Genotyping Algorithm. The inclusion criteria of the genotyping accuracy, missing genotype call rate, and heterozygosity were ≥ 98%, < 4%, and < 30%, respectively, and the data showed no gender bias. The genetic variants that were included satisfied the Hardy-Weinberg equilibrium (HWE) at P > 0.05 and minor allele frequency (MAF) at > 5% [17]. Manhattan and quantile-quantile (Q-Q) plots indicated the accuracy of GWAS data using the Fastman library in the R program [17]. A Manhattan plot of genetic variants was displayed with the negative logarithms of the association p-values for type 2 diabetes. A Q-Q plot is a probability plot to show the goodness of fit of the actual data distribution to the theoretical data distribution. The Q-Q plot of genotype data displayed the quantile distribution of observed p-values (on the y-axis) versus the quantile distribution of expected p-values (on the x-axis). The Q-Q plot was constructed to assure that the lambda value of Q-Q plot was close to 1 and confirmed that the GWAS genotypes were ideal. The pathways linked to the genetic variants associated with type 2 diabetes having P-values < 0.05 for Bonferroni correction were selected using the MAGMA gene-set analysis in SNP2GENE of FUMA web application, available through the git repository (https://github.com/Kyoko-wtnb/FUMA-webapp/).
Selection of the genetic variants that influence obesity defined by fat mass and the best model with SNP-SNP interactions
Figure 1 presents the procedure of selecting genetic variants for high body fat risk and investigating the best model for SNP-SNP interactions. The GWAS was conducted to explore genetic variants associated with obesity risk in the urban hospital-based cohort (P < 5X10− 5). From the GWAS associated with obesity risk, 1992 genetic variants were selected at P < 5×10−5. We eliminated 387 genetic variants that did not meet MAF (< 1%) and HWE (P < 0.05). In the gene name search using g:Profiler (https://biit.cs.ut.ee/gprofiler/snpense), 178 SNPs were not identified with gene names, and 126 gene names of the 1427 genetic variants were identified. The linkage disequilibrium (LD) analyses were performed on the SNPs of the 1427 genetic variants using Haploview 4.2 in PLINK. The potential genetic variants in the same chromosome were not strongly correlated (D’ <0.2). The SNPs with high D’ values were not included in the generalized multifactor dimensionality reduction (GMDR) because they provided the same information on the genetic impact. There were 19 SNPs in 18 genes selected, and genes associated with fat mass were selected using HuGE Navigator (https://phgkb.cdc.gov/PHGKB/ hNHome.action).
Among the 14 genetic variants selected for obesity risk, ten SNPs with an SNP-SNP interaction were selected automatically by GMDR. The best SNP-SNP interaction model was selected in a sign rank test of trained balanced accuracy (TRBA) and testing balanced accuracy (TEBA) while adjusting for the covariates using a GMDR program and a P-value threshold of 0.05 [10]. The covariates used were age, gender, residence area, education, and income for models 1 and 2, plus energy intake, alcohol intake, regular exercise, and smoking status. Ten-fold cross-validation was also checked for cross-validation consistency (CVC) because the sample size was larger than 1000 [18], and 10 out of 10 in the CVC met the perfect cross-validation criteria.
The risk allele number of each SNP was counted to generate the PRS of the best model. For example, the genetic score for the SNP was 2, 1, and 0 when the participants had AA, AG, and GG of one SNP, and the A allele was the risk allele, respectively. The polygenic risk score (PRS) of the best model was assessed by summing the number of the risk alleles from each selected SNP in the best gene-gene interaction model [3]. The PRS in the three and six SNP models was divided into three categories according to the number of risk alleles; they were classified as Low-PRS, Middle-PRS, and High-PRS when the number of risk alleles in the PRS was 0–2 (n = 19,686), 3–4 (n = 30,513), and ≥ 5 (n = 3,629) in the three-SNP model and 0–5 (n = 27,212), 6–7 (n = 20375), and ≥ 8 (n = 1822) in the six-SNP model, respectively. Among the best models to meet the p-value of the sign test and CVC, the model with the lowest SNP number (three-SNP model) was used to interact with the lifestyle parameters.
Expression Quantitative Trait Locus (Eqtl) Analysis
The eQTL analysis is a direct approach to estimating the candidate gene expression of the genetic variants at risk loci. The allele variants are involved in the corresponding gene expression, and the expression of candidate susceptible genes with risk alleles is estimated to influence various diseases. Gene expressions of the genetic variants related to the abdominal obesity risk was identified by eQTL analysis in the Genotype-Tissue Expression (GTE) × eQTL calculator (https://gtexportal.org/home/testyourown, accessed on July 19, 2021). The gene expressions in the subcutaneous and visceral adipose tissues, skeletal muscles, liver, and brain were calculated using the GTE × eQTL calculator.
Statistical analysis
The statistical analysis was performed using SAS (version 9.3; SAS Institute, Cary, NC, USA). A sample size of 53,828 was sufficient to achieve significance at α = 0.05 and β = 0.99 at an odds ratio of 1.05 in the logistic analysis using a G-power calculator. The descriptive statistics for categorical variables, such as gender and dietary habits, were obtained by determining the frequency distributions, which were analyzed statistically according to the immunity groups of the classification variables using a chi-square test. Descriptive statistics of the continuous variables were analyzed as the adjusted means with standard deviations after adjusting for the covariates. The statistical differences among the gender and insulin resistance groups were compared using a two-way analysis of covariance (ANCOVA) [19]. Multiple comparisons of the groups were performed using a Tukey’s test.
The association of insulin resistance on metabolic parameters was examined by logistic regression analysis with the Low-BF group as the reference after adjustment for covariates. The results are presented as the odds ratios (ORs) and 95% confidence intervals (CI) of each biochemical parameter for the High-BF and Low-BF groups. The first model was generated after adjusting for age, residence area, survey year, lean body mass, education, and income. The second model was produced with the adjustments for covariates in model 1 and the energy intake, physical activity, smoking status, and alcohol consumption.
The lifestyle-related parameters were categorized into the high or low groups using the criteria defined by the predesignated cutoffs, such as the dietary reference intake or 30th percentiles of each variable, to determine the interactions between the fat mass and lifestyle parameters. Two-way ANCOVA was used to analyze the interactions between the fat mass groups and the lifestyle parameters, including dietary intake, smoking, and physical activity. The main effects were insulin resistance and lifestyle-related parameters with their interaction terms after adjusting for covariates. The ORs and 95% CI of fat mass with lifestyle-related parameters were also calculated by logistic regression analysis in the High-BF and Low-BF groups of the lifestyle-related parameters. The significant difference in the High-BF percentage was analyzed using the PRS groups in the χ2 test in the low- and high groups of lifestyle-related parameters.