Impact of COL6A4P2 gene polymorphisms on the risk of lung cancer in a Chinese Han population

Background The aim of this study was to investigate the effects of COL6A4P2 polymorphisms on lung cancer (LC) in Chinese Han population. Methods To examine whether variants of COL6A4P2 contribute to LC, ve single nucleotide polymorphisms (SNPs) of COL6A4P2 were genotyped by Agena MassARRAY in 510 LC patients and 495 controls. Odds ratio (OR) and 95% condence intervals (CIs) were calculated by logistic regression. Results We found that COL6A4P2 rs34445363 signicantly increased the risk of LC in the alleles model (OR = 1.26, 95%CI: 1.01 - 1.58, p = 0.038). And rs34445363 also increased the LC risk under the log-additive model (OR = 1.26, 95%CI: 1.01 - 1.58, p = 0.041) with the multigene model analysis. Further stratication analysis showed that rs34445363 increased the LC risk under the log-additive model (OR = 1.42, 95%CI: 1.03 - 1.95, p = 0.033) in people aged ≤ 61; and rs61733464 was associated with a decreased LC risk in the log-additive model (OR = 0.72, 95%CI: 0.52 - 0.99, p = 0.048) in people aged ≤ 61. We also found that the mutations of rs34445363 and rs77941834 were associated with increased LC risk in the codominant model (rs34445363, GA vs. GG, OR = 1.73, 95%CI: 1.04 - 2.86, p = 0.034; rs77941834, TA vs. TT, OR = 1.88, 95%CI: 1.06 - 3.34, p = 0.032) in females. Conclusions This study provided an evidence for polymorphisms of COL6A4P2 gene associated to the development of LC, also a new insight into etiology of LC.


Introduction
Lung cancer (LC) is the malignant tumor with the fastest growth in morbidity and mortality and the greatest threat to people´s health and life (1). According to the database of Global Cancer Observatory (http://gco.iarc.fr/) (2), there are 2,093,876 new cases of LC worldwide, accounting for 11.6% of all cancers; the number of people who died of LC is 1,761,007, accounting for 17.9% of all cancer deaths in 2018. Among them, the incidence and mortality of LC in female were 13.1% and 6.9%, respectively. LC has become the most malignant tumor with the highest incidence and mortality (3)(4)(5). In China, LC also has high incidence and mortality, and men's morbidity and mortality are more than twice than that of women (6). Most studies suggested that the occurrence of LC is related to environmental (smoke, occupational exposure, and air pollution) and genetic factors (7,8), especially genetic factors play important role. Li et al. (9) revealed that LC susceptibility in Chinese Han population is related to HOTAIR gene mutation. Dimitrakopoulos et al. (10) believed that NF-kB2 gene mutation is signi cantly associated with LC risk. However, the correlation between COL6A4P2 gene polymorphisms and LC susceptibility has not been reported.
COL6A4P2 (Collagen Type VI Alpha 4 Pseudogene 2), also named as COL6A4, located on Chr.3q22 in humans. COL6A4 gene expressed type VI collagen (COL6), which is an extracellular matrix protein that plays an important role in maintaining the integrity of lung tissue. Chiu et al. (11) showed by quantitative secretion cleavage that COL6 is a protein involved in tumor metastasis. Voiles et al. (12) demonstrated that the expression of COL6 protein in LC is upregulated. Thus, we suspect that the COL6A4 gene may be associated with LC.
It is reported that COL6A4 transcribe as an unprocessed pseudogene due to the presence of multiple stop codons on the gene sequence (13). Many studies have been made to show that pseudogenes play an important role in the development of cancer. Cheng et al. (14) found that pseudogenes affect the occurrence and development of cancer by forming lncRNA-pseudogene-mRNA competitive triples. Lynn et al. (15) con rmed that the polymorphisms of the MYLKP1 pseudogene is associated with an increased risk of colon cancer. Wei et al. (16) found that the pseudogene DUXAP10 promotes the invasiveness of LC. Therefore, we speculated that the COL6A4P2 gene may play a role in cancer development.
In this study, we rst explored the association of COL6A4P2 gene and LC risk, revealing the relationship between COL6A4P2 gene polymorphism and LC susceptibility in Chinese Han population.

Study participants
Using a case-control design, 510 LC patients (mean age: 60.78 ± 9.96 years) and 495 controls (mean age: 61.94 ± 7.72 years) were enrolled. All patients were recruited from Shaanxi Provincial Cancer Hospital (Xi´an City, Shaanxi, China). Patient inclusion criteria: 1) patients with newly diagnosed LC, 2) histopathological LC diagnosed by an experienced pathologist, 3) no previous radiation therapy or chemical therapy, 4) no history of cancer and metastatic carcinoma. Patients with asthma, bronchitis, pneumonia, lung abscess, tuberculosis and other lung diseases, autoimmune diseases, trauma or other tumors were excluded from the study. After that, we investigated and collected clinical indicators of LC patients, including gender, age, histological classi cation, tumor stage, and the status of lymph node metastasis.
The controls were healthy volunteers from Shaanxi Provincial Cancer Hospital (Xi´an, Shaanxi, China) recruited during the same period. Inclusion criteria of control group included no medical or family history of cancer or any pulmonary disease. At the time of recruitment, each subject was personally interviewed by trained personnel using a structured questionnaire to obtain information regarding demographic characteristics. This study was approved by the ethics committee of the Shaanxi Provincial Cancer Hospital, and conformed to the ethical principles for medical research involving humans of the World Medical Association Declaration of Helsinki. All participants signed informed consent forms before participating in this study.
Subsequently, a sample of approximately 5 mL of venous blood was obtained from each participant and collected into tubes containing ethylenediamine tetra-acetic acid for anticoagulation. Genomic DNA was extracted from peripheral blood samples using a Whole-Blood Genomic DNA Extraction Kit (GOLDMAG, Xi´an, China) according to the manufacturer's instructions. The purity and concentration of the DNA samples were evaluated using a NanoDrop 2000C system (Thermo Scienti c, Waltham, MA, USA). Isolated DNA was stored at − 80 ℃ until analysis.

SNP genotyping
Five candidate SNPs in the COL6A4P2 gene were selected with a minor allele frequency (MAF) > 0.05 from global population in the 1,000 Genome Projects (http://www.internationalgenome.org/). Then we used HaploReg v4.1 (https://pubs.broadinstitute.org/mammals/haploreg/haploreg.php) to predict the possible functions of SNPs. The primers for ampli cation and single-base extension were designed using the Assay Design Suite, V2.0 (https://agenacx.com/online-tools/). Genotyping of the ve SNPs was carried out on MassARRAY iPLEX (Agena Bioscience, San Diego, CA, USA) platform using matrix-assisted laser desorption ionization-time of ight mass spectrometry (17). Genotyping results were generated using Agena Bioscience TYPER software, version 4.0. Genotyping was carried out by laboratory personnel in a double-blinded fashion.

Analysis of COL6A4P2 and SNPs expression
Data regarding the expression of COL6A4P2 in LC was obtained from the UALCAN online database (http://ualcan.path.uab.edu/analysis.html), a web server providing customizable functions. Tumors and normal samples in the UALCAN database were derived from The Cancer Genome Atlas (TCGA) and the Genotype-Tissue Expression (GTEx) projects. Then predicted the effect of COL6A4P2 gene expression on LC prognosis use OncoLnc database (http://www.oncolnc.org/). We also predicted the expression of SNPs in the COL6A4P2 gene in normal lung tissues by GTEx database (https://gtexportal.org/home/).

Statistical analyses
An independent sample t-test was used to assess differences in population and clinical characteristics of study participants. Fisher's exact tests for HWE were performed by comparing the observed and expected genotype frequencies to calculate the genotype frequencies among the controls. Pearson's χ 2 test was used to compare the allelic and genotype frequencies of each SNP between LC patients and controls. Multiple genetic model analyses (codominant, dominant, recessive, and log-additive) were applied using PLINK software (http://zzz.bwh.harvard.edu/plink/ld.shtml) to assess the association between SNPs and LC risk. Furthermore, we calculated strati cation factors using gender and age to adjust for possible confounders. Finally, we used Haploview software (version4.2) to construct haplotype and to estimate the pairwise linkage disequilibrium, the SHEsis software platform (http://analysis.biox.cn/myAnalysis.php) was used to estimate the correlation between haplotype and LC risk. Odds ratios (ORs) and 95% con dence intervals (CIs) were calculated using logistic regression analyses adjusted for gender and age (18), with the wild-type allele used as a reference. Statistical analyses were performed using SPSS software (version 21.0, IBM Corporation, Armonk, NY, USA). All p-values of statistical tests were two-sided, and p < 0.05 was considered indicative of statistical signi cance.

Characteristics of cases and controls
The basic clinical information of LC patients and controls were shown in Table 1      Bold values indicate a signi cant difference.
Also, we found that gender signi cantly affects the association between SNPs of the COL6A4P2 gene and LC risk (

Association of COL6A4P2 haplotypes with the risk of LC
SNPs in the current study were in linkage disequilibrium for the study population (Fig. 1). Unfortunately, there was no statistically signi cant difference among any of the COL6A4P2 haplotype frequencies in cases and controls (Supplementary table 2).

Discussion
In this study we have analyzed the association of the COL6A4P2 gene polymorphisms towards susceptibility for LC. We identi ed that rs34445363 in COL6A4P2 gene was associated with an increased risk of LC. And our results also suggested that rs34445363 site mutations increase the risk of LUAD, while the mutation of rs61733464 signi cantly decrease the LUAD risk. Those suggested an association between genetic polymorphism of COL6A4P2 and the susceptibility of LC.
Numerous studies have shown that collagen levels play an important role in the development of LC (19,20). Naveen et al. (21) identi ed collagen VI as a potential biomarker for early diagnosis of LC by proteomic analysis, suggesting that LC is associated with collagen-encoding genes. The COL6A4P2 gene is a pseudogene formed by the chromosomal break of the collagen-encoding gene COL6A4 (13,22), so we speculate that the COL6A4P2 gene may be related to LC. Our results suggested that the rs34445363 mutation on the COL6A4P2 gene signi cantly increases the risk of LC, validating our conjecture, and consistent with previous studies.
Our results also found that the relationship between the COL6A4P2 gene polymorphism and LC risk was in uenced by gender and age. A retrospective analysis of Oh et al. (23) assessed the important effects of gender and age in the development of LC. Aareleid et al. (24) revealed that LC has different incidence rates in different genders and ages. These studies were consistent with our results and enhance the credibility of our ndings.
Further, we predicted the differential expression of COL6A4P2 gene in normal lung tissues and LC tissues through a database. Voiles et al. (12) found that collagen VI protein levels increased in tumor lung tissue, speculated that the expression of COL6A4P2 gene in tumor lung tissue is variable. This coincides with our predictions. Fagerberg et al. (25) found that the COL6A4P2 gene is speci cally expressed in human lung tissue by genome-wide integration analysis of transcriptomics and antibody proteomics. These ndings suggested the important research signi cance of COL6A4P2 gene in the development of LC, prompting that the COL6A4P2 gene deserves further study.
In conclusion, the present study is the rst to investigate the relationship between COL6A4P2 gene and LC, and pointed out that COL6A4P2 gene polymorphism is associated with LC risk in Chinese Han population. However, further studies are warranted on larger patients from other ethnic groups to con rm our results.

Declarations
Ethics approval and consent to participate This study was approved by the ethics committee of the Shaanxi Provincial Cancer Hospital, and conformed to the ethical principles for medical research involving humans of the World Medical Association Declaration of Helsinki. All participants signed informed consent forms before participating in this study.

Consent to publish
All the authors agreed to publish the manuscript.

Availability of data and materials
The datasets used and analyzed in the current study are available from the corresponding author on reasonable request.

Competing interests
The authors declare that they have no con ict of interest.