By using the most recently released imputation data of more than 93 million variants in the large-scale UK Biobank, we identified 399 genomic risk loci for self-reported traits reflecting daily consumption of food items included in the WCRF report for CRC prevention (Additional file 2: Figure S9). Using these genomic risk loci in the one-sample MR framework, we found that genetically predicted dietary intake of fruits was associated with a lower risk of CRC, with a similar magnitude of an inverse association with colon cancer. Additionally, marginally inverse associations between vegetable intake with CRC and colon cancer were observed in the total study population. When compared with our observational analysis of a prospective cohort study design, these associations appeared to be weaker and did not reach the level of significance (Additional file 2: Figure S10).
When we searched PubMed up to September 2023 for the GWAS of dietary traits, a total of 23 GWAS were identified, and seven studies included the population of the UK Biobank (Additional file 2: Table S8). Of these, traits of interest were diverse, including bitter and sweet beverages (coffee, tea, alcohol, and juices), nutrients (carbohydrates, fats, and proteins), dietary patterns (meat-related diet and fish and plant-related diet), dietary index (Dietary Approaches to Stop Hypertension), [29–34] or all 85 single item traits in the FFQ and their corresponding 85 PC diets [12].
In a large-scale population-based study, the presence of relatedness among individuals may confound the association between exposures and outcomes [21]. Particularly, population stratification (ancient relatedness) refers to any differences that may lead to systematic differences in allele frequencies and thus spurious associations, whereas familial relatedness (recent relatedness) refers to the presence of related individuals within the sample, which may violate assumptions of common analytical tools and artificially inflate test statistics leading to false-positive [21]. Among seven GWAS analysing the UK Biobank data, relatedness was considered in Cornelis et al.’s and Zhong et al.’s studies, which excluded participants with kinship coefficients > 0.0442 [32, 34]. According to data released for the entire UK Biobank, only 0.04% of participants were identified as ten or more third-degree relatives, whereas more than 30% of participants were identified with at least one relative with each other [35]. The cutoff value of 0.0442, which referred to pairs of individuals with third-degree or closer relationships [36], therefore, did not fully account for the relatedness among study participants. In Cole et al.’s study, the mixed model approach implemented in the BOLT-LMM software fully adjusted for cryptic relatedness [12]. By using the dense genetic relatedness matrix (GRM) and leave-one-chromosome-out, BOLT-LMM generally exerted higher power than using the sparse GRM in the FastGWA tool [20]. However, FastGWA was suggested to show greater robustness because the estimate of the ‘genetic variance’ in FastGWA captures the variance attributable to common environmental effects with higher genetic variance than BOLT-LMM [20]. Further studies may validate the performance between FastGWA and BOLT-LMM in identifying genetic determinants of food intake.
By obtaining dietary habits from the questionnaire, we considered the amount of food consumption in the continuous form and applied the linear mixed model. A previous study converted food-liking traits into numerical values (range 0–9) without justification [37]. Given the transformation of food preference phenotypes into the hedonic scale into numeric values is not appropriate, the proportional odds logistic mixed model (POLMM) has been shown to handle ordinal categorical phenotypes, especially when the phenotype is extremely imbalanced [38]. The authors applied the POLMM for the frequent consumption of food items (never or almost never, once every few months, once a month, once a week, 2–4 times per week, and almost daily) in the UK Biobank without converting into numeric values [38]. Findings on the top 10 genes were similar to those identified from our current study (e.g., CCDC171 for beef, pork, and lamb, XKR6 for processed meat, LY6H for poultry, and MLLT10 for oily fish).
The anti-cancer effects of fruits and vegetables were suggested due to their bioactive compounds, such as fiber, folate, vitamins, minerals, and flavonoids [39]. Of these, fiber is fermented by several bacteria to produce short-chain fatty acids (SCFAs), including acetate (central appetite regulation), propionate (gluconeogenesis and satiety signaling regulation), and butyrate (a main energy source for human colonocytes) [40, 41]. Higher fiber intake was associated with the increase of SCFAs, and SCFA-producing bacteria, which regulate the immune system and metabolism and reduce the CRC risk [41]. According to the WCRF/AICR, there was limited evidence for the effect of fruit and non-starchy vegetable intake on CRC prevention [42]. According to pooled estimates from prospective cohort studies, per daily 100g of fruit and vegetable intakes were associated with a decreased risk of CRC by 4% (relative risk (RR) 0.96, 95% CI = 0.93–0.99) and 2% (RR = 0.98, 95% CI = 0.96–0.99), respectively [43]. However, individual studies tended to show null associations. A previous case-control analysis of nine observational studies within the Genetics Epidemiology of Colorectal Cancer Consortium and the Colon Cancer Family Registry did not observe any significant associations between fruit (odds ratio (OR) 1.04, 95% CI = 0.93–1.15) and vegetable (OR = 0.92, 95% CI = 0.82–1.03) intakes with overall CRC risk [44]. Similarly, nonsignificant associations between fruit (HR = 1.00, 95% CI = 0.94–1.05) and vegetable (HR = 1.01, 95% CI = 0.93–1.11) intakes and CRC risks were recently reported in a prospective cohort analysis of the UK Biobank [19]. These inconsistent findings with our MR estimates may be partly due to differences in study design and analytical framework. In general, observational studies are more prone to residual confounding, reverse causation, and measurement error than MR analyses, which randomly assign the exposure of interest-related IVs among individuals [4, 26]. Such sources of bias may attenuate associations toward the null [4, 26]. Furthermore, while the MR estimates reflect the effect of lifelong perturbations in risk factors (e.g., genetically predicted dietary intake), observational results may reflect more acute effects (e.g., during the follow-up period since the enrollment time point of a cohort) [45]. Our present observational analysis with a longer follow-up period (12.4 vs. 5.7 years) suggested stronger favorable effects of fruits (HR = 0.99, 95% CI = 0.91–1.01) and vegetables (HR = 0.99, 95% CI = 0.98-1.00), thus supports the evidence of long-term beneficial effects [19].
Among dietary factors, the International Agency for Research on Cancer classified processed meat as a human carcinogen (Group 1) and red meat as a probable carcinogen (Group 2A) [46]. Carcinogenic effects of red meat and processed meat were introduced via several chemicals such as N-nitroso compounds, heterocyclic aromatic amines, and polycyclic aromatic hydrocarbons formed in red meat and when cooking meat at high temperatures [47]. The WCRF/AICR also reported probable to convincing evidence of red meat and processed meat intake in association with CRC risks [42]. However, our present study observed the association between red meat and processed meat with CRC risk in observational analyses only. Besides differences in study design and analytical framework, the explained variation of IVs for the exposure of interest may affect our estimates. Although the allele score IVs explained variations of dietary intake (F-statistics greater than 90), the number of SNPs used for the calculation of allele scores for red meat and processed meat was relatively small, which may not allow us to detect any significant associations. We further observed an inverse association between processed meat intake and rectal cancer risk. These findings disappeared in sex-specific subgroups and need to be interpreted cautiously, possibly due to the small proportion of rectal cancer cases among whole study participants.
To date, very few MR studies reported the effect of dietary factors on CRC risk. Most of them considered blood concentrations of nutrients (carotenoids, calcium, copper, fatty acids, folate, iron, magnesium, methionine, phosphorus, selenium, sodium, vitamin B6, vitamin B12, vitamin D, vitamin E, and zinc) as exposure of interest [8, 48–52]. Only the MR study conducted by Cornish et al. examined the causal estimate between diet consumption of coffee and CRC risk. Although we used much more SNPs in the allele score calculation, our study revealed a similar direction of the estimates (33 SNPs, HR = 1.16, 95% CI = 0.96–1.40 in the current study vs. 4 SNPs, OR = 1.17, 95% CI = 0.88–1.55 in the previous study) [8].
Furthermore, we found inconclusive evidence of the MR estimates of total fish, milk, cheese, coffee, tea, and alcohol consumption on CRC. Of these, pooled estimates from observational studies showed significantly or suggestively inverse associations of fish (RR = 0.89, 95% CI = 0.80–0.99), milk (RR = 0.94, 95% CI = 0.92–0.96), cheese (RR = 0.94, 95% CI = 0.87–1.02), coffee (RR = 1.00, 95% CI = 0.99–1.02), tea (RR = 0.99, 95% CI = 0.97–1.01), and alcohol (RR = 1.07, 95%=1.05–1.08) intake with CRC risk [43]. Compared to observational analysis, estimates from MR may commonly have wider CIs and thus toward null findings [45].
This study has several strengths. Having large-scale individual-level data with much more genetic information of imputed SNPs compared to earlier GWAS, we applied the recent methodology to account for confounding effects of both population stratification and cryptic relatedness to identify loci associated with food intake. We also performed a comprehensive MR analysis to suggest evidence for the causal estimate of dietary intake and CRC risk. Genetic variants had adequate strengths; thus, bias due to small F-statistics or small sample size can be minimised. Undertaking sensitivity analyses to evaluate the plausibility of IV assumptions and robustness to pleiotropy and outliers, our findings from MR analyses may be less biased by residual confounding and reverse causation than observational results. Additionally, combining many SNPs into a single allele score may increase the power of the analysis and reduce the risk of bias from possible weak instruments [26]. Furthermore, available data for one-sample MR analysis allowed us to consider the effect estimate in several subgroups, such as sex and CRC subsites.
Despite providing new evidence about the causal effect of dietary intake on CRC risk, this study has some limitations that need to be addressed. Given that disparities in dietary intake according to different ethnic groups may exist due to cultural knowledge and food-related skills [53, 54], analyses for individuals from ethnic backgrounds other than White British require additional investigations. Besides, we derived SNPs and weights for IVs in all participants after quality control and performed the two-stage least square analysis in participants without any cancer at baseline. There could still be a winner’s curse on our estimate due to the overlap between the dataset in which genetic variants were selected and the dataset in which genetically predicted associations were determined [55]. However, the winner’s curse bias in our study can be mitigated by selecting more stringent SNPs based on not only significant threshold but also linkage disequilibrium among variants.