General characteristics
Basic characteristics of the case and control groups was depicted in Table 1. This study involved 1015 subjects, including 509 patients (354 males and 155 females; age at diagnosis: 58.53 ± 10.12 years) and 506 healthy controls (355 males and 151 females; age: 61.43 ± 9.47 years). There were no significant difference between the lung cancer patients and healthy controls in terms of age, gender and smoking status.
Hardy-Weinberg equilibrium and SNPs alleles
The MAF distribution of selected six SNPs among all subjects were summarized in Table 2. In our study, the frequency of alleles of each SNP in controls was consistent with the HapMap CHB population. Furthermore, all six SNP locus in the control subjects conformed to Hardy-Weinberg equilibrium (p > 0.05). By chi-square test, we found no SNPs sites associated with lung cancer risk.
Association of SNPs with lung cancer risk
Four genetic analysis models (co-dominant, dominant, recessive and log-additive) were applied to analyze and assess the association between each variant and lung cancer risks. In Table 3, our analysis revealed the genotype "A/C” of rs6771238 was correlated with an increased the risk of lung cancer under the co-dominant model (OR = 1.57, 95% CI=1.01 - 2.42, p = 0.044), the genotype "C/A-A/A" of rs6771238 was correlated with an enhanced lung cancer risk in the dominant mode (OR = 1.54, 95% CI = 1.01-2.36, p = 0.047).
Further, we stratified the samples according to pathological classification, clinical stage, lymph node metastasis and other characteristics. Within the subgroups of lung squamous cell carcinoma and lung adenocarcinoma, the genotype “A/C” of rs6771238 (OR = 2.07, 95% CI = 1.08-3.97, p = 0.028) showed an increased risk of lung squamous cell carcinoma in the co-dominant model. The genotype “A/C-A/A” of rs6771238 also was significantly associated with an increased lung squamous cell carcinoma risk under the dominant model (OR = 2.07; 95% CI = 1.10-3.89; p = 0.025). Rs6771238 also was significantly correlated with an enhanced lung squamous cell carcinoma risk under log-additive model (OR = 1.90; 95% CI = 1.06 - 3.38; p = 0.030). While the genotype “A/G” of rs6802418 (OR = 1.43, 95% CI = 1.00-2.03, p = 0.049) may increase the risk of lung adenocarcinoma under the co-dominant model (Table 4).
Stratified analysis was performed according to clinical stages, it was found the genotype “T/C” of rs9835916 and rs1077868 were significantly correlated with an enhanced the risk of lung cancer staging under the co-dominant model (OR = 1.75; 95 % CI = 1.00-3.05; p = 0.031) and log-additive model (OR = 1.71; 95 % CI = 1.02-2.88; p = 0.043), respectively (Table 5).
Stratified subgroup in the case of lymph node metastasis, rs9835916 was found to be associated with lymph node metastasis risk in patients with lung cancer. For rs9835916, allele “C” also may increase the risk of lymphatic metastasis based on the allele model (OR = 1.56; 95% CI = 1.08 - 2.26; p = 0.018), the "T/C" genotype may increase the risk of lymphatic metastasis under the co-dominant model (OR = 2.51; 95% CI = 1.42 - 4.44; p = 0.002), the "T/C-C/C" genotype was related to an increased lymphatic metastasis risk in the dominant model (OR = 2.39; 95% CI = 1.40 - 4.07; p = 0.001), rs9835916 may increase the risk of lymphatic metastasis based on the log-additive model (OR = 1.64; 95% CI = 1.11 - 2.41; p = 0.013), (Table 6).
Association of haplotypes with lung cancer risk
A haplotype-based association study was performed to show the association between CMTM8 haplotype and risk of lung cancer. Among the subpopulation (staging), two SNPs (rs1077868 and rs6802418) form an LD block (Fig. 1). The frequencies’ distribution of haplotypes in case and control group was presented in Table 7. The haplotypes "GG" and “AG” were found to prominently increase the risk of lung cancer staging (OR=1.71; 95 % CI= 1.02 - 2.88; p = 0.043).
SNP functional evaluation
In order to evaluate the possible function of the six selected variants in the CMTM8 gene, we performed a bioinformatics analysis using the HaploReg v4.1 database. The results showed that all the variants were predicted to be regulatory SNPs with different biological functions (Supplementary table S2).
GEPIA database analysis on gene expression
Furthermore, GEPIA database analyzed the expression of CMTM8 gene in lung cancer and found that the expression level of CMTM8 gene in lung adenocarcinoma was lower than that in normal tissues, which indicates that this gene has a certain relationship with the occurrence of lung cancer (Supplementary Figure S1)