2.1 Two-sample MR study design
In order to bolster the credibility of causal inferences drawn from Mendelian Randomization (MR) studies, it is imperative to uphold three fundamental assumptions: (1) The genetic variants under consideration must demonstrate a robust association with the exposure of interest. (2) These variants should be devoid of significant correlations with confounding variables that could potentially distort the exposure-outcome relationship. (3) The influence of these genetic variants on the outcome should be exclusively through the exposure, without any independent effect.
As shown in Fig. 1, In this study, we employed a multifaceted Mendelian randomization approach to elucidate the relationship between education attainment and lung cancer. The primary analysis, designated as Univariable MR, was conducted to estimate the direct effect of educational attainment on the incidence of lung cancer. In order to enhance our understanding and address potential confounding factors, we implemented Multivariable MR as an additional analysis (Additional Analysis 1). This analysis aimed to isolate the independent effect of education al attainment, mitigating influences from confounding variables. Lastly, in Additional Analysis 2, a Two-step MR approach was utilized to delineate the biological pathways through which educational attainment might exert its influence on the risk of lung cancer, providing a more comprehensive understanding of the underlying mechanisms.
2.2 Data sources and instruments
Information about the data sources and sample sizes used in this study is summarized in Table 1. The study relied on summary-level data that have been made publicly available.
Table 1
Details on the characteristics of each included dataset.
Phenotype | Data source | Total sample size | Population | # SNPs |
Lung cancer, Lung squamous cell carcinoma, Lung adenocarcinoma | Wang, Yufei et al. “Rare variants of large effect in BRCA2 and CHEK2 affect risk of lung cancer.” Nature genetics vol. 46,7 (2014): 736 − 41. | 11,348 cases, 15,861 controls, Sum:27,209 | European | 8.9M |
Education attainment | Lee JJ, Wedow R, Okbay A, et al. Gene discovery and polygenic prediction from a genome-wide association study of education al attainment in 1.1 million individuals. Nat Genet 50, 1112–1121(2018). | 1,100,000 | European | 8.1M |
BMI | Pulit SL, Stoneman C, Morris AP et al. Meta-analysis of genome-wide association studies for body fat distribution in 694 649 individuals of European ancestry. Hum Mol Genet. 2019 Jan 1;28(1):166–174. | 694, 648 | European | 27.4M |
Smoking, Drinking | Liu M, Jiang Y, Wedow R. et al. Association studies of up to 1.2 million individuals yield new insights into the genetic etiology of tobacco and alcohol use. Nat Genet. 2019 Feb;51(2):237–244. | 1,232,091 | European | 11.8M |
SNP: single nucleotide polymorphism |
2.3 Lung cancer and its subtypes
In a comprehensive genome-wide association study (GWAS), conducted by Wang et al.[16] on behalf of the International Lung Cancer Consortium (ILCCO), an extensive meta-analysis was carried out utilizing advanced inverse-variance methodologies. This thorough examination incorporated data from four prominent European cohorts focused on GWAS for lung cancer: the MDACC GWAS, the ICR GWAS, the NCI GWAS, and the IARC GWAS. The primary focus of the research was on lung cancer, with specific attention given to its subtypes, particularly lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC).
Crucial GWAS summary data encompassing overall lung cancer (LUCA), as well as the LUAD and LUSC subtypes, were meticulously extracted from the IEU-OpenGWAS online platform. This extraction utilized the specific inquiry codes "ieu-a-966" for LUCA, "ieu-a-967" for LUAD, and "ieu-a-965" for LUSC. Such a methodical approach was instrumental in ensuring the reliability and accuracy of the data, which forms the foundation of our investigation into the complex interplay between lung cancer and educational variables.
2.4 Educational attainment
In our endeavor to elucidate the complex interplay between educational attainment and lung cancer risk, we meticulously sourced summary statistics from the comprehensive Genome-Wide Association Study (GWAS) on educational attainment, conducted by Lee JJ et al[17].
The presented data underwent rigorous quality control procedures, adhering to the highest standards of data integrity in accordance with local language conventions and medical reviewing practices. The final dataset, inclusive of more than 1.1 million individuals of European descent, underwent meticulous analysis to identify 1,271 independent single nucleotide polymorphisms (SNPs) that achieved genome-wide significance. This comprehensive analysis not only highlights the methodological strength of our study but also emphasizes the crucial influence of genetic factors in mediating the complex relationship between educational levels and susceptibility to lung cancer.
2.5 BMI
In our study, we strategically employed Genome-Wide Association Studies (GWAS) as a source of potential mediators, focusing specifically on phenotypes related to obesity. The main goal was to optimize the sample sizes from these datasets while carefully ensuring the exclusion of any duplicated samples. A crucial mediator under investigation was Body Mass Index (BMI), calculated from a cohort consisting of 694,648 individuals of European descent. This dataset was obtained from the comprehensive meta-analysis carried out by Pulit SL et al. in 2019[18].In utilizing this methodology, we successfully leveraged a substantial and unique dataset, thereby greatly reinforcing the robustness of our analysis. This approach played a crucial role in refining the accuracy of our exploration into the potential mediating role of factors associated with obesity, such as BMI, in the context of lung cancer risk and its association with educational attainment.
2.6 Smoking (SMK) and Drinking (DRK)
In the groundbreaking meta-analysis carried out by Liu M et al. in 2019, an intricate exploration was undertaken to unravel the genetic factors influencing tobacco and alcohol consumption in individuals of European descent, shedding light on the intricacies of these behaviors [19]. This comprehensive study, involving a cohort of 1.2 million individuals, provided profound insights into the genetic underpinnings of smoking and drinking tendencies.
In this analysis, we identified 378 genetic variants linked to the initiation of regular smoking (SmkInit; n = 1,232,091). Simultaneously, alcohol consumption was measured using a more direct metric – the number of drinks per week (DrinkWk; n = 941,280), revealing 99 associated genetic variants. This comprehensive research represents a significant advancement in uncovering the genetic factors that impact these prevalent behaviors within the European demographic. It contributes to a more nuanced comprehension of their potential role in the etiology of lung cancer, particularly in the context of educational disparities.
2.7 Testing instrument strength and statistical power
In this investigation, the F-statistic was calculated using the formulas: F = R²(n-2)/(1-R²), where 'F' represents the F-statistic, 'R²' denotes the proportion of phenotypic variance explained by a genetic instrument, 'n' indicates the cohort size, 'β' reflects the estimated genetic association of the single nucleotide polymorphism (SNP) with the exposure, and 'MAF' stands for the minor allele frequency. In the realm of medical genetics research, specifically in the context of lung cancer, an F-statistic of 10 or above typically suggests a notably diminished likelihood of encountering weak instrument bias in Mendelian Randomization (MR) analyses. This benchmark plays a pivotal role in safeguarding the credibility of causal inferences drawn from genetic associations.
Furthermore, we meticulously evaluated the statistical power of our study utilizing the methodology outlined by Burgess et al. This method combines the sample size from the European Population Genome Study with the degree to which genetic instruments elucidate the variance in the exposure under scrutiny. Such a thorough assessment is imperative for unraveling the genetic underpinnings of lung cancer, providing significant support to the scientific and educational communities in understanding and addressing this crucial public health issue.
2.8 Univariable MR
In the assessment of each exposure variable, our study employed the Inverse Variance Weighting (IVW) method within a multiplicative random-effects model for the principal Mendelian randomization (MR) analysis. This approach involved aggregating Wald ratio estimates from individual Single Nucleotide Polymorphisms (SNPs) to generate a consolidated causal estimate for each risk factor. These estimates were calculated by dividing the association of each SNP with the outcome by its association with the exposure. Given the dichotomous nature of the outcome, effect estimates were transformed into odds ratios (ORs) to more intuitively represent the relationship between educational attainment and lung cancer risk.
Acknowledging the potential bias in Inverse Variance Weighted (IVW) estimates stemming from pleiotropic instrumental variables, we undertook a series of sensitivity analyses. These analyses were designed to mitigate the impact of pleiotropy on the causal estimates. Specifically, we employed MR-Egger regression to examine the presence of horizontal pleiotropy, with a particular focus on its intercept term. A deviation from zero, reaching significance at P < 0.05, was interpreted as suggestive of directional pleiotropic bias.
Ultimately, a leave-one-SNP-out analysis was undertaken to assess the impact of individual genetic variants on the identified correlations. This additional step enhances the resilience and dependability of our findings within the realm of medical research.
2.9 Multivariable MR
In this study, we explored the possible intersection of Single Nucleotide Polymorphisms (SNPs) in relation to both educational attainment and lung cancer. We took into account the complicating factors of smoking, alcohol consumption, and Body Mass Index (BMI). To unravel this complexity, a comprehensive Mendelian randomization (MR) analysis was performed. This analysis aimed to discern the specific influence of educational attainment on the risk of lung cancer while considering the influences of smoking, alcohol consumption, and BMI.
Our methodology employed the multivariable extension of the Inverse Variance Weighting (IVW) Mendelian Randomization (MR) approach. This sophisticated technique, as elucidated in the referenced literature, adeptly addresses both overt and latent pleiotropic effects. The incorporation of this method played a crucial role in advancing our comprehension of the genetic interactions among educational attainment, smoking, alcohol consumption, BMI, and the risk of lung cancer. The objective of this approach was to provide a more nuanced perspective on the etiological pathways involved, thereby enriching the current understanding of the multifaceted nature of factors contributing to lung cancer risk.
2.10 Mediation analysis
To clarify potential mediating effects within the context of significant Mendelian randomization (MR) associations, a bifurcated MR analysis strategy was employed. In the first phase, genetic instruments associated with educational attainment were utilized to determine the causal impact of this exposure on potential mediators. In the subsequent stage of the analysis, genetic instruments specific to the identified mediators were applied. This procedural step was crucial for a comprehensive assessment of their role in influencing susceptibility to lung cancer.
As medical researchers, we have employed the "product of coefficients" approach to assess the indirect impact of educational attainment on the risk of lung cancer through potential mediators. This method was specifically utilized when substantial evidence indicated that educational attainment had a notable effect on the mediator, which in turn impacted lung cancer risk. The standard errors associated with these indirect effects were meticulously calculated employing the delta method, ensuring precision and reliability in our estimations.