General characteristics
Basic characteristics of the case and control groups was depicted in Table 1. This study involved 1015 subjects, including 509 patients (354 males and 155 females; age at diagnosis: 58.53 ± 10.12 years) and 506 healthy controls (355 males and 151 females; age: 61.43 ± 9.47 years). There were no significant differences in terms of age, sex or smoking status between lung cancer patients and healthy controls, but there were significant differences in alcohol consumption.
Hardy-Weinberg equilibrium and SNPs alleles
The MAF distribution of selected six SNPs among all subjects were summarized in Table 2. In our study, the allele frequency of each SNP in controls was consistent with the CHB population (Han Chinese in Beijing, China) in the 1,000 genome project. Furthermore, all six SNP locus in the control subjects conformed to Hardy-Weinberg equilibrium (p > 0.05). By chi-square test, we found no SNPs sites associated with lung cancer risk.
Association of SNPs with lung cancer risk
Four genetic analysis models (co-dominant, dominant, recessive and log-additive) were applied to analyze and assess the association between each variant and lung cancer risks. In Table 3, our analysis revealed the genotype "A/C” of rs6771238 was correlated with an increased the risk of lung cancer under the co-dominant model (OR = 1.57, 95% CI=1.01 - 2.42, p = 0.044), the genotype "C/A-A/A" of rs6771238 was correlated with an enhanced lung cancer risk in the dominant mode (OR = 1.68, 95% CI = 1.05-2.68, p = 0.031), with power values of 0.534 and 0.684, respectively. Rs6771238 also reduced lung cancer risk in an additive model (OR = 1.66, 95% CI = 1.06-2.58, p = 0.026), with power values of 0.500.
Further, we stratified the samples according to pathological classification, clinical stage, lymph node metastasis and other characteristics. Within the subgroups of lung squamous cell carcinoma and lung adenocarcinoma, the allele “A” of rs6771238 (OR = 1.90, 95% CI = 1.07-3.38, p = 0.025, power = 0.642) showed an increased risk of lung squamous cell carcinoma in the allele model. Rs6771238 also was significantly associated with an increased lung squamous cell carcinoma risk under the log addition model (OR = 1.98, 95% CI = 1.01-3.87, p = 0.045, power = 0.704). Rs6771238 also was significantly correlated with an enhanced adenocarcinoma risk under log-additive model (OR = 1.79, 95% CI = 1.01-3.18, p = 0.047, power = 0.549) (Table 4).
Stratified analysis based on clinical staging showed that "A/G" and "A/G-G/G" genotypes of rs1077868 were significantly correlated with lung cancer staging in codominant (OR = 1.96, 95% CI = 1.05-3.64, p = 0.034, power = 0.998) and dominant (OR = 2.03, 95% CI = 1.11-3.73, p = 0.022, power = 0.993) models, respectively. Rs1077868 was also significantly correlated with lung cancer staging in the additive model (OR = 1.92, 95% CI = 1.10-3.35, p = 0.021, power = 0.970) (Table 5).
Stratified subgroup in the case of lymph node metastasis, rs9835916 was found to be associated with lymph node metastasis risk in patients with lung cancer. For rs9835916, allele “C” increased the risk of lymphatic metastasis based on the allele model (OR = 1.56, 95% CI = 1.08 - 2.26, p = 0.018, power = 0.940), the "T/C" genotype increased the risk of lymphatic metastasis under the co-dominant model (OR = 2.49, 95% CI = 1.36 - 4.55, p = 0.003, power = 0.956), the "T/C-C/C" genotype was related to an increased lymphatic metastasis risk in the dominant model (OR = 2.40, 95% CI = 1.37 - 4.21, p = 0.002, power = 0.998) . Rs9835916 also was significantly associated with an increased the risk of lymphatic metastasis based on the additive model (OR = 1.66, 95% CI = 1.11 - 2.48, p = 0.014, power = 0.978) (Table 6).
Association of haplotypes with lung cancer risk
The linkage imbalance and haplotype-based association study were performed to show the linkage degree of SNPs in CMTM8 and the association between haplotype and cancer risk. As shown in Figure 1, a close linkage between rs9853415 and rs6796318, rs6771238 and rs9835916, rs1077868 and rs6802418 were observed in the overall results, constituting corresponding haploid blocks, but these haplotype blocks were not significantly correlated with the risk of lung cancer.
SNP functional evaluation
In order to evaluate the possible function of the six selected variants in the CMTM8 gene, we performed a bioinformatics analysis using the HaploReg v4.1 database. The results showed that all the variants were predicted to be regulatory SNPs with different biological functions (Supplementary table S2).
GEPIA database analysis on gene expression
Furthermore, GEPIA database analyzed the expression of CMTM8 gene in lung cancer and found that the expression level of CMTM8 gene in lung adenocarcinoma was lower than that in normal tissues, which indicates that this gene has a certain relationship with the occurrence of lung cancer (Supplementary Figure S1)