By using the most recently released imputation data of more than 93 million variants for 408,093 participants in the large-scale UK Biobank, we identified 399 genomic risk loci for self-reported traits reflecting daily consumption of red meat, processed meat, poultry, fish, milk, cheese, fruits, vegetables, coffee, and tea. Of these, 231 SNPs were either unavailable or did not reach a significant level in a previous study (19). Overall, the heritability of these foods ranged between 3.5% and 10.5%, which reflected the proportion of dietary intake variation explained by genetic factors. Our gene set enrichment analysis found several significant functional pathways for the intake of total fish, milk, cheese, fruits, coffee, tea, and alcohol.
When we searched PubMed up to September 2022 for the GWAS of dietary traits, a total of 23 GWAS were identified, and seven studies included the population of the UK Biobank (12–34) (Supplementary Table S12). Of these, traits of interest were diverse, including bitter and sweet beverages (coffee, tea, alcohol, and juices), nutrients (carbohydrates, fats, and proteins), dietary patterns (meat-related diet and fish and plant-related diet), dietary index (Dietary Approaches to Stop Hypertension), (12, 15, 16, 20, 29, 31) or all 85 single item traits in the food frequency questionnaire and their corresponding 85 PC diets (19). In Cole et al.’s study, some single food item quantitative traits shared the substantial number of significant loci (Supplementary Table S13) (19). Therefore, it was more feasible to combine single food items (beef, lamb, and pork as red meat, oily and nonoily fish, fresh and dried fruits as total fruits, and cooked and salad/raw vegetables as total vegetables), which might exert similar patterns of genetic effects.
In a large-scale population-based study, the presence of relatedness among individuals may confound the association between exposures and outcomes (35). Among seven GWAS analyzing the UK Biobank data, relatedness was considered in Cornelis et al.’s and Zhong et al.’s studies only, which excluded participants with kinship coefficients > 0.0442 (20, 31). According to data released for the entire UK Biobank, only 0.04% of participants were identified as ten or more third-degree relatives, whereas more than 30% of participants were identified with at least one relative with each other (36). The cutoff value of 0.0442, which referred to pairs of individuals with third-degree or closer relationships (37), therefore did not fully account for the relatedness among study participants.. In Cole et al.’s study, the mixed model approach implemented in the BOLT-LMM software fully adjusted for cryptic relatedness (19). By using the dense genetic relatedness matrix (GRM) and leave-one-chromosome-out, BOLT-LMM generally exerted higher power than using the sparse GRM in the FastGWA tool (38). However, FastGWA was suggested to show greater robustness because the estimate of the ‘genetic variance’ in FastGWA captures the variance attributable to common environmental effects with higher genetic variance than BOLT-LMM (38). In the present study, we further excluded those who were genetically identified as non-White British ethnic backgrounds to control for population stratification. Further studies may validate the performance between FastGWA and BOLT-LMM in identifying genetic determinants of food intake.
By obtaining dietary habits from the questionnaire, we considered the amount of food consumption in the continuous form and applied the linear mixed model. A previous study converted food-liking traits into numerical values (range 0–9) without justification (39). Given the transformation of food preference phenotypes into the hedonic scale into numeric values is not appropriate, the proportional odds logistic mixed model (POLMM) has been shown to handle ordinal categorical phenotypes, especially when the phenotype is extremely imbalanced (40). The authors further applied the POLMM for the frequency consumption of food items (never or almost never, once every few months, once a month, once a week, 2–4 times per week, and almost daily) in the UK Biobank without converting into numeric values, and determined loci in the top 10 genes that were replicated in our current study (e.g., CCDC171 for beef, pork, and lamb, XKR6 for processed meat, LY6H for poultry, and MLLT10 for oily fish) (40).
In this study, we found some variants associated with more than one dietary trait, primarily milk, coffee, tea, and alcohol consumption. In particular, variants in the ABCG2, SLC35D3, GCKR, and AC003075.4 genes were identified as genomic risk loci for all three dietary factors. Cornelis et al. also detected variants at the ABCG2 gene in relation to coffee consumption; however, Zhong et al. found variants at the ABCG2 gene associated with total bitter beverage intake but not coffee or tea consumption (20, 31). ABCG2, which is metabolized by cytochrome P450s, is expressed in the apical membranes of several organs, such as the liver, kidney, intestine, and brain, and is involved in preventing the absorption and excessive accumulation of xenobiotic and endogenous substrates in certain tissues (20, 31, 41). ABCG2 is highly induced during lactation and plays an important role in the transformation of uric acid into milk and might affect the redox potential of milk (42). Although information about the influence of the SLC35D3 and GCKR genes on dietary habits is unknown, SLC35D3 was suggested to regulate dopamine signaling and be involved in the metabolic control in the central nervous system (43). GCKR polymorphisms were linked to lactate levels, multiple lipids, and metabolic traits (44–46). AC003075.4 was suggested to have a negative feedback mechanism with the expression of AHR, which is involved in caffeine metabolism and is suppressed by catechins in tea (20, 21, 31, 47, 48).
The identification of the rs8103840 variant, near the FUT1 and FGF21 genes, was aligned with findings from Niarchou et al., in which FGF21 reached genome-wide association significance for both meat-related and fish- and plant-related dietary patterns (15). The FGF21 gene exerts its endocrine action in both the central nervous system and adipose tissue and was involved in the metabolism of glucose, lipids, and proteins (49, 50). We found decreased consumption of processed meat and fish and increased consumption of fruits in individuals carrying the C allele for rs8103840.
By obtaining summary statistics of more than 11 million SNPs from the most recent comprehensive GWAS for dietary intake (19), we identified 231 variants associated with the intake of red meat (n = 9), processed meat (n = 7), poultry (n = 1), total fish (n = 12), milk (n = 50), cheese (n = 38), total fruits (n = 41), total vegetables (n = 42), coffee (n = 11), tea (n = 13), and alcohol (n = 16) which were either not available in the previous study or did not reach the significance level (p < 5e-8) (Supplementary Tables S1-S11) (19). Of these, almost half of SNPs (n = 107) were not available in Cole et al.’s study due to their smaller scale of imputed genetic data, and 70 SNPs reached the threshold for suggestive significance (5e-8 ≤ p < 1e-5). Among 54 loci that did not reach the suggestive significance level (p ≥ 1e-5), 41 loci were for milk intake (Supplementary Table S14). However, the amount of daily milk consumption was not evaluated in the previous study, and we identified the remaining 12 novel loci for red meat (rs12144834, AL592205.1; rs150877559, DLEU1; rs12938702, ST6GALNAC1; and rs7251466, ZNF574), poultry (rs34473833, CDH11), total fish (rs4600686, RNU6-812P), cheese (rs12472445, RBMS1; rs4886168, RNU7-88P; and rs276950, LINC01082), total fruits (rs34156224, AQP4-AS1:CHST9), and coffee (rs2682909, RP11-307C19.2). Since the previous study performed genome-wide association analysis for types of milk only, all genomic risk loci for our estimated milk intake were either unavailable or did not reach a significance level of 5e-08, and thus were determined to be novel loci in this study.
Functional annotations of dietary traits have not been elucidated in previous GWASs. In this study, our gene set enrichment test identified several gene ontology terms and WikiPathways related to the consumption of fish, milk, cheese, fruit, coffee, tea, and alcohol. None of the pathways for red meat, processed meat, poultry, and vegetable intake were significant. However, among 122 novel variants that were not significant in previous studies, some SNPs related to red meat intake were linked to pathogenesis. Variants rs150877559 (DLEU1), rs12938702 (ST6GALNAC1), and rs7251466 (ZNF574) were found in genes involved in the pathogenesis of colorectal, gastric, and ovarian cancers (51–54). This may suggest the role of red meat intake-related genes in cancer cell proliferation and migration. Furthermore, rs150877559 (DLEU1) and rs7251466 (ZNF574) were identified as novel loci in our present study. However, the underlying mechanisms of novel variants involved in dietary habits of food intake remain unclear. For processed meat intake, we identified the rs17676243 variant, which is in the NR3C2 gene. However, the expression of NR3C2 was found to be upregulated by the consumption of red meat but not processed meat in a gene expression study. The inconsistency may be due to the limited sample size and the low consumption of meats in study participants in the previous study (55).
By including more than double SNPs compared to the previous study and adjusting for familial relatedness, the point estimates of heritability from summary statistics were slightly lower than those calculated in Cole et al.’s study (processed meat, 5.42% vs. 6.6%; poultry, 3.50% vs. 4.9%; cheese, 10.48% vs. 10.8%; coffee, 6.26% vs. 7.9%; tea, 8.34% vs. 9.1%; and alcohol, 12.1% vs. 9.71%); Table 2 and Supplementary Table S13) (19). Nevertheless, no statistical tests were available to inform the significant difference. The heritability of food groups (red meat, total fish, total fruits, and total vegetables) appeared to be in the range of the heritability of corresponding food items. However, we were unable to compare the heritability of milk because milk intake was not assessed as a quantitative trait in the previous study.
Although the present GWAS included much more genetic information of imputed SNPs compared to earlier GWAS (12–34) and applied the recent methodology to account for confounding effects of both population stratification and cryptic relatedness in large-scale biobank data, our results were limited to the White British population only. Given that disparities in dietary intake according to different ethnic groups may exist due to cultural knowledge and food-related skills (56, 57), analyses for individuals from ethnic backgrounds other than White British require additional investigations. Furthermore, due to the lack of replication samples, our findings need to be validated in other independent studies.
In summary, the present study comprehensively assessed the influence of genetic variants and their functional mechanisms on the dietary behaviors of participants in the UK Biobank. By cautiously accounting for population stratification and cryptic relatedness in this large-scale of recently released imputation data, we identified several novel loci for food consumption. For implementation, genetic variants associated with dietary intake may converge into groups of genetic variants and are associated differently with diseases via several biological mechanisms. Furthermore, the summary statistics of our GWAS provided accurate estimates and can be used as a source of instrumental variables in the Mendelian randomization framework to address the causal relationship between dietary intake and health outcomes.