3.1 Characteristics of cases and controls
The basic clinical information of LC patients and controls were shown in Table 1. 510 patients presented with different distribution, according to age (age ≤ 61, 266 cases; age > 61, 244 cases), gender (male, 355 cases; female, 155 cases), pathological type (lung squamous cell carcinoma [LUSC], 120 patients; lung adenocarcinoma [LUAD], 188 patients), tumor stage (Ⅰ-Ⅱ, 84 cases; Ⅲ-Ⅳ, 261 cases), and lymph node metastasis (LNM) status (positive, 215 cases; negative, 84 cases).
Table 1
The comparison of basic characteristics between cases and controls
Characteristics | Case (n = 510) | Control (n = 495) |
Age | ≤ 61 | 266 | 224 |
> 61 | 244 | 271 |
Mean ± SD | 60.78 ± 9.96 | 61.94 ± 7.72 |
Gender | Male | 355 | 346 |
Female | 155 | 149 |
Pathological type | LUSC | 120 | |
LUAD | 188 | |
Unknown | 202 | |
Tumor stage | Ⅰ-Ⅱ | 84 | |
Ⅲ-Ⅳ | 261 | |
Unknown | | |
LNM | Positive | 215 | |
Negative | 84 | |
Unknown | | |
LUSC = lung squamous cell carcinoma; LUAD = lung adenocarcinoma; LNM = lymph node metastasis |
Basic information and allele frequencies of the COL6A4P2 gene polymorphisms were presented in Table 2. The genotype distribution of all SNPs in control subjects met the HWE (p > 0.05). HaploReg function annotation results revealed that SNPs associated with LC risk were successfully predicted to have biological functions. The correlation between COL6A4P2 polymorphisms and LC risk under the allele model was shown in Table 2, the results showed that rs34445363 is associated with an increased LC risk (OR = 1.26, 95%CI: 1.01–1.58, p = 0.038), and there were no differences between other four SNPs (rs7625942, rs77941834, rs61733464, and rs11914893) in the COL6A4P2 gene and LC risk (p > 0.05).
Table 2
Basic Information about SNPs in COL6A4P2 and association with risk of lung cancer in allele model
Gene | SNP ID | Chr. | Alleles(A/B) | Frequency (MAF) | p-value for HWE | OR (95% CI) | p | Function |
Case | Control |
COL6A4P2 | rs34445363 | 3q22.1 | A/G | 0.217 | 0.180 | 0.879 | 1.26 (1.01–1.58) | 0.038 |
Selected eQTL hits | COL6A4P2 | rs7625942 | 3q22.1 | A/G | 0.223 | 0.225 | 0.608 | 0.98 (0.80–1.21) | 0.915 | Motifs changed, Selected eQTL hits |
COL6A4P2 | rs77941834 | 3q22.1 | A/T | 0.122 | 0.097 | 0.798 | 1.29 (0.97–1.71) | 0.086 | Motifs changed, Selected eQTL hits |
COL6A4P2 | rs61733464 | 3q22.1 | A/G | 0.186 | 0.213 | 0.346 | 0.85 (0.68–1.06) | 0.146 | DNAse, Motifs changed, Selected eQTL hits |
COL6A4P2 | rs11914893 | 3q22.1 | A/C | 0.108 | 0.115 | 0.825 | 0.93 (0.70–1.23) | 0.620 | Motifs changed, GRASP QTL hits |
SNP = single nucleotide polymorphism; Chr. = chromosome; A/B = minor/major, MAF = minor allele frequency; HWE = Hardy Weinberg equilibrium. |
p < 0.05 indicates statistical significance. |
Bold values indicate a significant difference. |
3.2 Association between the COL6A4P2 gene and the risk of LC
Genetic models (codominant, dominant, recessive, and log-additive) and genotype frequencies were used to further identify any associations between the SNPs and the risk of LC. The results showed that rs34445363 in the COL6A4P2 gene significantly increased the LC risk under the log-additive model (adjusted by age and gender, OR = 1.26, 95%CI: 1.01–1.58, p = 0.041, Table 3), and no significant difference was found for the other SNPs between cases and controls (all p > 0.05).
Table 3
Distribution of genotypes of COL6A4P2 polymorphism depicting their association with lung cancer risk and its histological subtypes
SNP ID | Model | Genotype | Control | LC | LSCC | LUAD |
Case | OR (95%CI) | p | Case | OR (95%CI) | p | Case | OR (95%CI) | p |
rs34445363 | Codominant | GG | 329 | 313 | 1.00 | | 72 | 1.00 | | 112 | 1.00 | |
| GA | 146 | 173 | 1.25(0.10–1.64) | 0.102 | 43 | 1.27(0.82–1.96) | 0.278 | 66 | 1.39(0.96-2.00) | 0.082 |
| AA | 15 | 24 | 1.63(0.84–3.17) | 0.151 | 5 | 1.52(0.52–4.46) | 0.442 | 10 | 1.88(0.81–4.36) | 0.144 |
Dominant | GG | 329 | 313 | 1.00 | | 72 | 1.00 | | 112 | 1.00 | |
| GA/AA | 161 | 197 | 1.29(0.99–1.67) | 0.056 | 48 | 1.29(0.85–1.97) | 0.229 | 76 | 1.43(1.01–2.04) | 0.046 |
Recessive | GG/GA | 475 | 486 | 1.00 | | 115 | 1.00 | | 178 | 1.00 | |
| AA | 15 | 24 | 1.51(0.78–2.92) | 0.220 | 5 | 1.40(0.48–4.06) | 0.533 | 10 | 1.68(0.73–3.87) | 0.223 |
Log-additive | -- | -- | -- | 1.26(1.01–1.58) | 0.041 | -- | 1.26(0.88–1.80) | 0.212 | -- | 1.38(1.02–1.86) | 0.034 |
rs61733464 | Codominant | GG | 310 | 340 | 1.00 | | 82 | 1.00 | | 133 | 1.00 | |
| GA | 158 | 150 | 0.86(0.66–1.13) | 0.278 | 33 | 0.79(0.50–1.24) | 0.299 | 46 | 0.65(0.44–0.96) | 0.031 |
| AA | 26 | 20 | 0.70(0.38–1.28) | 0.246 | 5 | 0.75(0.27–2.07) | 0.581 | 9 | 0.76(0.34–1.69) | 0.504 |
Dominant | GG | 310 | 340 | 1.00 | | 82 | 1.00 | | 133 | 1.00 | |
| GA/AA | 184 | 170 | 0.84(0.65–1.09) | 0.181 | 38 | 0.78(0.51–1.21) | 0.265 | 55 | 0.66(0.46–0.96) | 0.031 |
Recessive | GG/GA | 468 | 490 | 1.00 | | 115 | 1.00 | | 179 | 1.00 | |
| AA | 26 | 20 | 0.73(0.40–1.34) | 0.310 | 5 | 0.81(0.30–2.21) | 0.683 | 9 | 0.87(0.39–1.91) | 0.724 |
Log-additive | -- | -- | -- | 0.85(0.68–1.05) | 0.139 | -- | 0.82(0.57–1.18) | 0.285 | -- | 0.74(0.55–1.01) | 0.059 |
SNP = single nucleotide polymorphism; LC = lung cancer; LUAD = lung adenocarcinoma; LSCC = lung squamous cell carcinoma; OR = odds ratio; 95%CI = 95% confidence interval. |
p < 0.05 indicates statistical significance. |
Bold values indicate a significant difference. |
Furthermore, we identified by pathological analysis that rs34445363 locus variation significantly increased the risk of LUAD in the dominant model (adjusted by age and gender, GA/AA vs. GG, OR = 1.43, 95%CI: 1.01–2.04, p = 0.046) and log-additive model (adjusted by age and gender, OR = 1.38, 95%CI: 1.02–1.86, p = 0.034); However, mutations of rs61733464 in the COL6A4P2 gene have a lower incidence of LUAD with the GA genotype in the codominant model (adjusted by age and gender, GA vs. GG, OR = 0.65, 95%CI: 0.44–0.96, p = 0.031) and under the dominant model (adjusted by age and gender, GA/AA vs. GG, OR = 0.66, 95%CI: 0.46–0.96, p = 0.031).
3.3 Relationship between COL6A4P2 polymorphism and clinicopathological features
To evaluate association of COL6A4P2 SNPs on various clinic-pathological features, we segregated patients according to clinical stage (Ⅰ - Ⅱ vs. Ⅲ - Ⅳ), and status of LNM (positive vs. negative). There was no significant correlation between LNM status and COL6A4P2 gene polymorphism variation (Supplementary table 1). However, for rs77941834 variant, the codominant model (adjusted by age and gender, TA vs. TT, OR = 0.52, 95%CI: 0.29–0.94, p = 0.030), dominant model (adjusted by age and gender, TA/AA vs. TT, OR = 0.49, 95%CI: 0.28–0.86, p = 0.013), and log-additive model (adjusted by age and gender, OR = 0.55, 95%CI: 0.34–0.87, p = 0.011) were significantly decrease the LC risk in patients with Ⅲ - Ⅳ as compared to patients with Ⅰ - Ⅱ tumor stage (Table 4). No statistically significant association was observed for tumor staging and the other four SNPs (rs34445363, rs7625942, rs61733464 and rs11914893).
Table 4
Relationship between COL6A4P2 polymorphism and tumor staging of lung cancer
SNP ID | Model | Genotype | Control | Case | OR (95%CI) | p |
rs77941834 | Codominant | TT | 57 | 210 | 1.00 | |
| | TA | 23 | 46 | 0.52 (0.29–0.94) | 0.030 |
| | AA | 4 | 5 | 0.33 (0.09–1.31) | 0.116 |
| Dominant | TT | 57 | 210 | 1.00 | |
| | TA/AA | 27 | 51 | 0.49 (0.28–0.86) | 0.013 |
| Recessive | TT/TA | 80 | 256 | 1.00 | |
| | AA | 4 | 5 | 0.39 (0.10–1.49) | 0.167 |
| Log-additive | | | | 0.55 (0.34–0.87) | 0.011 |
SNP = single nucleotide polymorphism; OR = odds ratio; 95%CI = 95% confidence interval. |
p < 0.05 indicates statistical significance. |
Bold values indicate a significant difference. |
3.4 Stratification analysis of age and gender
Multiple inheritance model analysis showed that age and gender were significantly affect the correlation between COL6A4P2 SNPs and LC risk. We found that rs34445363 was related to a higher incidence of LC in people aged ≤ 61 with the AA genotype in the codominant model (adjusted by gender, AA vs. GG, OR = 2.62, 95%CI: 1.00–6.85, p = 0.049) and in the log-additive model (adjusted by gender, OR = 1.42, 95%CI: 1.03–1.95, p = 0.033); rs61733464 was associated with a decreased LC risk under the dominant model (adjusted by gender, GA/AA vs. GG, OR = 0.68, 95%CI: 0.46–0.99; p = 0.048) and log-additive model (adjusted by gender, OR = 0.72, 95%CI: 0.52–0.99, p = 0.048) in people aged ≤ 61 years (Table 5).
Table 5
Distribution of COL6A4P2 polymorphisms in populations of different ages and genders and its relationship with risk of lung cancer
SNP ID | Model | Genotype | Age > 61 | Age ≤ 61 |
Control | Case | OR (95%CI) | p | Control | Case | OR (95%CI) | p |
rs34445363 | Codominant | GG | 179 | 152 | 1.00 | | 150 | 161 | 1.00 | |
| | GA | 82 | 86 | 1.24 (0.85–1.81) | 0.254 | 64 | 87 | 1.29 (0.87–1.92) | 0.210 |
| | AA | 9 | 6 | 0.76 (0.26–2.19) | 0.606 | 6 | 18 | 2.62 (1.00–6.85) | 0.049 |
| Dominant | GG | 179 | 152 | 1.00 | | 150 | 161 | 1.00 | |
| | GA/AA | 91 | 92 | 1.20 (0.83–1.72) | 0.340 | 70 | 105 | 1.41 (0.96–2.06) | 0.079 |
| Recessive | GG/GA | 261 | 238 | 1.00 | | 214 | 248 | 1.00 | |
| | AA | 9 | 6 | 0.70 (0.24–2.02) | 0.513 | 6 | 18 | 2.41 (0.93–6.24) | 0.070 |
| Log-additive | -- | -- | -- | 1.11 (0.81–1.53) | 0.524 | -- | -- | 1.42 (1.03–1.95) | 0.033 |
rs61733464 | Codominant | GG | 174 | 159 | 1.00 | | 136 | 181 | 1.00 | |
| | GA | 81 | 74 | 0.98 (0.67–1.44) | 0.923 | 77 | 76 | 0.70 (0.47–1.03) | 0.073 |
| | AA | 15 | 11 | 0.82 (0.36–1.86) | 0.636 | 11 | 9 | 0.58 (0.23–1.46) | 0.249 |
| Dominant | GG | 174 | 159 | 1.00 | | 136 | 181 | 1.00 | |
| | GA/AA | 96 | 85 | 0.96 (0.66–1.38) | 0.812 | 88 | 85 | 0.68 (0.46–0.99) | 0.048 |
| Recessive | GG/GA | 255 | 233 | 1.00 | | 213 | 257 | 1.00 | |
| | AA | 15 | 11 | 0.83 (0.37–1.86) | 0.642 | 11 | 9 | 0.66 (0.26–1.63) | 0.365 |
| Log-additive | -- | -- | -- | 0.95 (0.70–1.28) | 0.713 | -- | -- | 0.72 (0.52–0.99) | 0.048 |
SNP ID | Model | Genotype | Male | Female |
Control | Case | OR (95%CI) | p | Control | Case | OR (95%CI) | p |
rs34445363 | Codominant | GG | 225 | 220 | 1.00 | | 104 | 92 | 1.00 | |
| | GA | 110 | 118 | 1.10 (0.80–1.52) | 0.547 | 36 | 55 | 1.73 (1.04–2.86) | 0.034 |
| | AA | 11 | 17 | 1.47 (0.67–3.24) | 0.334 | 4 | 7 | 1.98 (0.56–6.98) | 0.289 |
| Dominant | GG | 225 | 220 | 1.00 | | 104 | 92 | 1.00 | |
| | GA/AA | 121 | 135 | 1.14 (0.84–1.55) | 0.411 | 40 | 62 | 1.75 (1.08–2.85) | 0.024 |
| Recessive | GG/GA | 335 | 338 | 1.00 | | 140 | 147 | 1.00 | |
| | AA | 11 | 17 | 1.43 (0.65–3.11) | 0.372 | 4 | 7 | 1.67 (0.18–5.82) | 0.423 |
| Log-additive | -- | -- | -- | 1.15 (0.88–1.49) | 0.314 | -- | -- | 1.60 (1.05–2.44) | 0.028 |
rs77941834 | Codominant | TT | 279 | 284 | 1.00 | | 124 | 112 | 1.00 | |
| | TA | 63 | 61 | 0.94 (0.64–1.39) | 0.763 | 23 | 39 | 1.88 (1.06–3.34) | 0.032 |
| | AA | 4 | 10 | 2.42 (0.75–7.84) | 0.141 | 1 | 2 | 2.21 (0.20–24.76) | 0.519 |
| Dominant | TT | 279 | 284 | 1.00 | | 124 | 112 | 1.00 | |
| | TA/AA | 67 | 71 | 1.03 (0.71–1.50) | 0.878 | 24 | 41 | 1.89 (1.07–3.33) | 0.027 |
| Recessive | TT/TA | 342 | 345 | 1.00 | | 147 | 151 | 1.00 | |
| | AA | 4 | 10 | 2.44 (0.76–7.91) | 0.136 | 1 | 2 | 1.94 (0.17–21.66) | 0.590 |
| Log-additive | -- | -- | -- | 1.11 (0.80–1.53) | 0.547 | -- | -- | 1.81 (1.06–3.08) | 0.030 |
SNP = single nucleotide polymorphism; OR = odds ratio; 95%CI = 95% confidence interval. |
p < 0.05 indicates statistical significance. |
Bold values indicate a significant difference. |
Also, we found that gender significantly affects the association between SNPs of the COL6A4P2 gene and LC risk (Table 5). The mutation of COL6A4P2 rs34445363 in females could significantly increase the LC risk with the GA genotype under the codominant model (adjusted by age, GA vs. GG, OR = 1.73, 95%CI: 1.04–2.86, p = 0.034), dominant model (adjusted by age, GA/AA vs. GG, OR = 1.75, 95%CI: 1.08–2.85, p = 0.024) and log-additive model (adjusted by age, OR = 1.60, 95%CI: 1.05–2.44, p = 0.028); Women with rs77941834 mutation have a higher incidence of LC with the TA genotype under the codominant model (adjusted by age, TA vs. TT, OR = 1.88, 95%CI: 1.06–3.34, p = 0.032), in dominant model (adjusted by age, TA/AA vs. TT, OR = 1.89, 95%CI: 1.07–3.33, p = 0.027) and log-additive model (adjusted by age, OR = 1.81, 95%CI: 1.06–3.08, p = 0.030).
3.5 Association of COL6A4P2 haplotypes with the risk of LC
SNPs in the current study were in linkage disequilibrium for the study population (Fig. 1). Unfortunately, there was no statistically significant difference among any of the COL6A4P2 haplotype frequencies in cases and controls (Supplementary table 2).
3.6 Expression of COL6A4P2 and SNPs
Database analysis showed that compared with healthy subjects, expression of the COL6A4P2 gene was significantly higher in LUAD (p = 1.62 × 10− 12), and expression of the COL6A4P2 gene was significantly higher in LUSC (p = 2.44 × 10− 7, Fig. 2A and 2C). OncoLnc database analysis showed that expression of the COL6A4P2 gene was significantly correlated with survival rate in LUAD patients (Fig. 2B, p = 4.25 × 10− 3). However, the expression of the COL6A4P2 gene had no significant effect on the prognosis of LUSC (p = 3.00 × 10− 1, Fig. 2D). Furthermore, GTEx database prediction results showed that four SNPs (rs34445363, p = 5.80 × 10− 14; rs7625942, p = 8.90 × 10− 8; rs77941834, p = 1.60 × 10− 5; rs61733464, p = 1.00 × 10− 9) on the COL6A4P2 gene are significantly expressed in normal lung tissues (Fig. 3).