This investigation was intended to discover association of the Wnt-genes DKK4 (8p11.21), DKK3 (11p15.3), DKK2 (4q25), FRZB (2q32.1, also known as sFRP3), SFRP4 (7p14.1), Axin2 (17q24.1) and a potential interaction with AhR-genes, to LC in a large sample of 26,458 individuals of European descent. No marginal association of AhR/Wnt-markers with overall LC was observed. Interestingly, an accumulation of associated markers was observed splitting the sample by smoking status, where respective markers in ever smokers are assigned to SFRP4. The association analysis within never-smokers may reflect complex gene-gene interactions, since single marker association disappears when adjusted by known LC-markers.
Recently, marginal associations of the AhR/Wnt-markers were reported for subjects from North India, although in a much smaller sample of about 600 individuals [21, 22]. A notable association with LC, e.g. for the SFRP4 variant rs1802073 (OR = 3.19; 95%-CI 1.81–5.63), was observed. Classification and Regression Tree (CART) analysis revealed an interaction of DKK2 and SFRP4 polymorphisms to be the best (off all investigated) predictors for LC; especially within smokers. They also reported to have identified several high-risk subgroups in smokers, e.g. characterised by DKK2 (rs17037102 / rs419558) and Axin2 (rs9915936). A similar picture was observed in a sample of 270 subjects from Istanbul, Turkey [23].
We failed to directly replicate the single marker associations reported by Bahl et al. [21, 22] (North India) and Yilmaz et al. [23] (Turkey). The Indian population is known to be a mixture of several subpopulations [40], which can result in spurious associations. E.g. for rs7396187 assigned to DKK3 Bahl et al. reported a protective effect (OR GC+CC vs CC=0.63, 95%-CI: 0.44–0.91, p = 0.01); however, along with a significant departure from HWE in controls (χ²=15.11, df = 2, p = 0.001). In contrast, the analysed sample was carefully examined for ethnic homogeneity and principal components were used to adjust for population stratification. The reports by Bahl et al. and Yilmaz et al. are themselves contradictory in some details.
Yilmaz at al. reported a two-way interaction between DKK3 (rs3206824) and SFRP4 (rs1802074) to be predictive of LC. Among other constellations, Bahl et al. reported that DKK3 and SFRP4 were placed closely to each other by a Multifactor dimensionality reduction (MDR) for overall LC, while two markers of SFRP4 were closely placed within smokers. In contrast, markers assigned to AXIN2, but also to AHR, FRZB and DKK2 were observed as associated within never smokers. According to Bahl et al. markers of AXIN2 and DKK2 were in never smokers closely placed by a MDR, too. The discrepancy between the total sample and the subsample association estimates point to smoking mediated associations.
We agreed with both previous studies in that complex interaction patterns between the investigated genes contribute to LC susceptibility as entirety or within specific subgroups. To discover patterns of Ahr/Wnt-genes involved in LC genesis we further changed the focus from significance of association to inclusion in prediction models, and followed two approaches: First, we searched for polygenic risk scores (PRS). Doing so, we add up marker main effects to construct multidimensional scores, optimising model fit (instead of marker preselection by p-value below some threshold), in order to discriminate cases from controls in a somehow ideal way. Complex gene x gene (GxG) interactions are not modelled.
Nevertheless, the proportion of Ahr/Wnt-genes entering the scoring models was remarkable large, given that these markers are not, all other candidates however genome-wide significantly associated to LC. This was particularly noticeable for SCLC, since AhR/Wnt-markers contribute more than twice as much to the score as LC-markers. It is known, that within current smokers, tobacco consumption is strongest associated to SCLC [41]. Moreover, within never smokers, a stringed defined score is made up from only two AhR/Wnt-markers, assigned to AXIN2 and SFRP4. However, the discriminative ability of PRSs for LC, contributing markers with significance for main effect at different levels, is in general poor. The AUC of the BICLC score for overall LC (0.58 in the test set and 0.55 in the extra test set) corresponds to the AUC = 0.54 based on four top LC-genes in a simulated population, as given by the GWAS-ROCS Database (https://gwasrocs.ca/). This may be due to other overpowering risk factors, since models including e.g. age, sex and smoking variables achieve higher AUCs (0.62 to 0.79) [42].
Recently two polygenic risk scores (PRSs) for overall-LC had been developed, validated and assessed with respect to improving eligibility to low-dose computed tomography (LDCT) as the only recommended screening test for lung cancer. Jia et al. [43, 44] build a PRS on 19 genome-wide associated SNPs (p < 0.5 10− 8). Hung et al. [45], integrated their PRS on 128 SNPs, including established LC-related loci and suggestive associated loci selected by LASSO-regression model, into the PLCOall2014 risk model. Both approaches have been validated using data from the UK Biobank. While no substantial increase in discriminability was reported for both set of PRS, both studies were able to show that the age at which a smoker crosses the recommended screening threshold of 1.5% for the 5 or 6-year LC risk depends on the genetic background, which is sufficiently quantified by the PRS examined. Some smokers will be eligible by < 50 years of age, others by > 60 years of age. Hence, constructing reliable PRS, even with small discriminability, may help to improve the performance of LDCT.
Two- and multiway GxG interaction can also contribute to LC susceptibility, rather than just markers with observed (marginal) main effects. GxG interaction is in general less commonly investigated, not only because this requires much larger samples. However, Li et al. [46] found RGL1:RAD51B in overall LC and non-SCLC, SYNE1:RNF43 in adenocarcinoma and FHIT:TSPAN8 in SqCLC to interactively contribute to LC susceptibility. As in the presented data analysis, the impact of these genes would also have been overlooked considering main effects only. Another reason could be that LC itself is just a generic term of several subcategories that differ in terms of LC initiation and require separate PRSs [42, 47]. A third reason of the poor performance may be due to the exclusively concentration on genetic effects, rather than modelling lifelong interaction with the environment as well. E.g. GxE interaction effects for LC have been observed smoking [48], exposure to asbestos fibres [49, 50] and exposure to radon [51, 52].
With this in mind, the data analysis presented shows that the complex interaction of Wnt-related genes has the potential to be part of an adequate risk assessment for never-smokers or in relation to certain histological subtypes of LC.
As a second approach, we constructed decision trees, which mainly depict GxG interaction patterns. Although, the ability to discriminate cases from controls is again poor, CHRNA5 was in general the most important first node for overall LC as in many subgroups. Ahr/Wnt-genes are the one that play a complex but important role in at least one quarter of never smokers, as seen before. Remarkable, TERT was central in that branch important for the remaining three quarter of never smoker. This corresponds to a concentration of relevant genes for this subgroup in the CLPTM1L-TERT region on chromosome 5, as previously reported by Hung et al. [53]. Out observations confirm the suspicion, that LC in never smokers is a different entity, justified beforehand on differences in epidemiological, clinical and molecular characteristics [47].
We would like to emphasize that this study was not intended to provide a definitive and reliable risk assessment, but rather aimed to examine in depth the LC-relevant complex interaction pattern of AhR/Wnt-genes hypnotized by Bahl et al.. Indeed, considering prediction instead of association provides weaker evidence for this, but is valid in view of the large amount of external evidence. The importance of the Wnt-signalling pathway and its antagonist’s sFRP, DKKs and Axin2 for cancer is outlined in the introduction. One can also assume a connection with the molecular functionality, since involved genes are expressed ubiquitously or in lung tissues. In summary, we were unable to replicate previously reported associations of Wnt/AhR-markers with LC. However, we observed a small but significant impact of these genomic variants on PRSs or decision trees to predict LC.