Associations of the Polymorphisms of the NHEJ Pathway Genes With HIV-1 Infection and Aids Progression Among Men Who Have Sex With Men in Northern China


 Background: Men who have sex with men (MSM) are at high risk of HIV infection. Non-homologous end joining (NHEJ) pathway is the main way of double-stranded DNA break (DSB) repair in the higher eukaryotes, and can repair the DSB timely at any time in cell cycle. The objective of this study was to investigate the association of SNPs of the NHEJ pathway genes with susceptibility to HIV-1 infection and AIDS progression among MSM residing in northern China.Results: In the present study, a total of 481 HIV-1 seropositive men and 493 HIV-1 seronegative men were included. And genotyping of 22 SNPs in NHEJ pathway genes was performed using the SNPscanTM Kit. Our results disclosed significant associations of XRCC6 rs132770 and XRCC4 rs1056503 genotypes with susceptibility to HIV-1 infection. The generalized multifactor dimensionality reduction (GMDR) analysis found a significant SNP-SNP interaction between the XRCC6 and XRCC4 variants in the risk of HIV-1 infection. In stratified analysis, the positive effects of XRCC5 rs16855458 and LIG4 rs1805388 on the CD4+ T cell count and clinical phase of disease were validated.Conclusions: Our results confirmed that the NHEJ gene polymorphisms played an important role in HIV-1 infection and AIDS progression in the northern Chinese MSM population.

Double-stranded DNA break (DSB) is one of the main reasons for the gene mutation and chromosome break, and plays an important role in tumorigenesis and progression of tumors [3]. Non-homologous end joining (NHEJ) pathway is the main approach of DSB repair (DSBR) in the higher eukaryotes, and can repair DSBs timely at any time in cell cycle [4,5]. There are ve core genes (XRCC7, XRCC6, XRCC5, XRCC4 and LIG4) in the NHEJ pathway that encode ve proteins (DNA-PK, Ku70, Ku80, XRCC4 and LIG4), respectively. Studies have shown that NHEJ gene polymorphisms are associated with susceptibility to a wide variety of cancers and disease progression. For instance, XRCC7 gene polymorphisms play an important role in prostate cancer [6], bladder cancer [7], liver cancer [8], thyroid cancer [9] and lung cancer [10]. The other gene polymorphisms such as XRCC4, XRCC5, XRCC6 and LIG4 SNPs are also associated with many different types of cancers [11][12][13][14][15].
We believe that NHEJ pathway is associated with HIV-1 infection because the DSB in host genome DNA occurs in the process of HIV-1 integration based on previous functional studies of NHEJ genes. For example, the DNA-PK protein interacts with HIV-1 Tat to regulate HIV-1 replication and transcription [16,17]. HIV-1 proviral DNA integration triggers cell death during HIV-1 infection because of the activation of DNA-PK, which causes phosphorylation of p53 and histone gamma-H2AX [18]. The Ku70 and Ku80 proteins are closely associated with HIV-1 integrase that is bene cial to virus integration and replication [19,20] and protect cells against toxicity induced by HIV-1 integrase or integration [21]. However, the functional researches of other NHEJ genes and the association studies of NHEJ gene polymorphisms in the process of HIV-1 infection have not been reported. Up to now, the role of SNPs in NHEJ genes and their importance in HIV-1 infection and AIDS progression remain unclear. Therefore, to assess the involvement of NHEJ gene polymorphisms, we performed an association study of 22 SNPs in XRCC7, XRCC6, XRCC5, XRCC4 and LIG4 genes in 974 northern Chinese individuals. Participants were genotyped to investigate whether the polymorphisms in ve genes were associated with the susceptibility to HIV-1 infection and the progression of AIDS.

Hardy-Weinberg equilibrium test
In this study, 479 HIV-1-infected and 487 HIV-1-uninfected individuals from northern China were genotyped for 22 SNPs in NHEJ genes. The success rates were > 98 % for all SNPs and a mismatch rate of 0% detected in 50 replicate samples. As shown in Table 1, all 22 SNPs did not deviate from the Hardy-Weinberg equilibrium in the control group (P > 0.05).

Associations of NHEJ gene polymorphisms with HIV-1 infection
To explore the possible associations, the genotype distribution of the 22 SNPs was investigated and the differences of genotype frequencies between cases and controls were analyzed under three genetic models (codominant model, dominant model and recessive model). As shown in Fig. 1 However, no association with HIV-1 infection was observed in any genetic models for the rest 20 SNPs (P > 0.05).

Analysis of the SNP-SNP interaction
Then, the GMDR method was used to study the association of 10 SNPs within XRCC6 and XRCC4 gene with high-order interactions on HIV-1 infection. Through the 10-fold cross-validation, the best four-locus model involving XRCC6 (rs2267437) and XRCC4 (rs10040363, rs963248 and rs1056503) was found. The model had the testing balanced accuracy of 53.42%, the maximum crossvalidation consistency of 10/10, and a sign test P-value 0.010 (Fig. 2). In order to obtain the ORs for the joint effects of the four SNPs on HIV-1 infection, traditional statistical methods were applied to this four-locus model to aid in interpretation, which identi ed three signi cant genotype combinations from all possible high-risk genotype combinations. In this four-locus (rs1056503-rs2267437-rs10040363-rs963248) model, the ORs for the three signi  Table 2).

Analysis of haplotype associations
Analysis of LD between the SNPs in NHEJ genes was performed with HaploView software. There was strong LD between the four SNPs in XRCC6 gene, eight SNPs in XRCC5 gene, six SNPs in XRCC4 gene and two SNPs in LIG4 gene. There were no signi cant differences in the frequencies of all haplotypes between HIV-1-infected individuals and healthy cohorts and no association with the susceptibility to HIV-1 infection (P > 0.05). Table 3 shows all blocks and haplotypes identi ed and the frequencies of these haplotypes. Association analysis for NHEJ gene SNPs with CD4 + T cell count and clinical phase in AIDS patients In order to discover the relationship between the NHEJ gene polymorphisms and the progression of AIDS, differences in allele frequencies were analyzed between the case subgroups which were divided on CD4 + T-lymphocyte count and clinical stage as an index. The CD4 + T cell counts of the study participants ranged from 3 to 1038 cells/µl (mean ± SD, 335.57 ± 198.79). The associations between SNPs and CD4 + T cell counts were used to assess the in uence of these polymorphisms on immunity status. As shown in Table 4, there were signi cant differences of genotype frequencies for XRCC5 rs16855458 and LIG4 rs1805388 between different case groups (P < 0.05). The subjects with AA or AC of rs16855458 have the signi cantly lower CD4 + T-lymphocyte count, compared to subjects with CC genotype (P = 0.025, OR = 1.538, 95% CI 1.054-2.243). In addition, the subjects with AA or AG of rs1805388 have the higher progression risk of AIDS, compared to subjects with GG genotype (P = 0.036, OR = 1.506, 95% CI 1.027-2.209). However, other SNPs were not associated with the CD4 + T-lymphocyte count and clinical stages (P > 0.05). These results suggest that rs16855458 and rs1805388 were associated with the clinical features and that the polymorphisms in XRCC5 and LIG4 genes likely play an important role in the progression of AIDS in the northern Chinese population.

Discussion
According to the molecular mechanism of HIV-1 infection, viral DNA is inserted into the host genomic DNA in the process of HIV-1 integration. The integration process was equivalent to genomic DNA with DSBs in host cells under the action of HIV-1 and then the signal of damage repair would start NHEJ pathway. Thus, we believed that the NHEJ genes were involved in HIV-1 infection and the disease progression. To the best of our knowledge, this comprehensive study is the rst to systematically evaluate the association between the polymorphisms in NHEJ genes and the susceptibility to HIV-1 infection and the progression of AIDS.
In our study, the differences of genotype frequencies of XRCC6 rs132770 and XRCC4 rs1056503 were seen between the cases and the controls under different genetic models. Our results implied a positive association of the polymorphisms in NEHJ genes with the susceptibility to HIV-1 infection in the northern Chinese MSM population. The XRCC6 gene codes Ku70 protein, which functions as a single-stranded DNA-and ATP-dependent helicase and may be involved in the repair of non-homologous DNA ends such as that required for DSB repair. The Ku 70 protein also interacts with HIV-1 integrase in the process of the HIV-1 infection [19][20][21] . The rs1056503 is a synonymous codon in XRCC6 gene. The association may be due to the fact that the rs1056503 affects the mRNA expression by alternative splicing, and regulates the XRCC6 protein function; or this SNP is closely linked to another SNP which is associated to HIV-1 infection. Similar to our ndings, it was also reported that different XRCC6 genotypes could contribute to the susceptibility of another infectious disease, namely hepatocellular carcinoma [22,23].
The XRCC4 gene codes XRCC4 protein which can activate and enhance the activity of LIG4 protein, and plays an important role in NEHJ repair pathway [24]. The XRCC4 gene mutations can lead to the occurrence of small head dwar sm [25]. Although the SNPs in XRCC4 gene can in uence the susceptibility and progression of infectious disease such as liver cancer [26,27], their effects on HIV-1 infection have not been reported yet. Our study suggested that XRCC4 gene polymorphisms were associated with HIV-1 infection, which was in accordance with the above reports. Thus, the association in this study could be explained as following. The As an indicator of AIDS clinical characteristics, CD4 + T cell count re ects the count of the patient's body immune cells. The AIDS patients with CD4 + T cell count less than 350 cells/µl should be given antiretroviral therapy or other treatments according to the World Health Organization (WHO) [28][29][30]. In our study, we found that there was a signi cant difference of rs16855458 genotype frequency in XRCC5 gene between two case subgroups, and genotypes of AA and AC were associated with a small number of CD4 + T cells. This result showed that XRCC5 rs16855458 was associated with progression of AIDS. The XRCC5 gene codes Ku80 protein which forms Ku heterodimer with Ku70 protein. Functional studies showed that changes in expression levels of Ku80 protein are the main reason of tumor development and can be used as a predictor of patient survival as well as treatment outcome [31,32].
In the process of HIV-1 infection, the XRCC5 gene is closely related to HIV-1 integration and translation [33][34][35]. We propose that the rs16855458 in XRCC5 gene intron may regulate the transcription and the expression of the XRCC5 gene by alternative splicing, which interacts with HIV-1 to promote its integration and translation, leading to the decrease in the CD4 + T-lymphocyte count and the AIDS acceleration. Similar to our ndings, the polymorphisms of XRCC5 gene were also reported to be associated with viral disease such as liver cancer [23].
In this study, we divided the cases into two subgroups by clinical stage as an index, which is a clinical feature of AIDS and directly re ects the disease progression. The clinical symptoms of patients with phase I and II are mild and just show HIV-1 antibody positive. On the contrary, patients with phase III and IV have serious clinical symptoms such as nervous system lesions, continuous fever and diarrhea, sepsis and various kinds of tumors caused by the loss of immune functions, and should be timely given the antiretroviral therapy or other treatments. The result of our studies revealed that there was a signi cant difference of genotype frequency of LIG4 rs1805388 between MSM cases with clinical phase I + II and those with clinical phase III + IV, and that genotype AA/AG could signi cantly promote the disease progression of AIDS. The LIG4 gene codes LIG4 protein which connects the DSB end and then completes the NHEJ repair. Previous study showed that LIG4 gene polymorphisms were associated with clinical features of cancer such as treatment outcome, progression-free survival, and overall survival [36]. The LIG4 gene mutation can not only lead to abnormal development of the immune defects but also cause severe combined immunode ciency disease of normal individuals [37]. The rs1805388 is located in the exon region of LIG4 gene, which is a missense mutation of threonine and isoleucine. Here we propose that the reason for association was the functional changes in LIG4 protein which directly affected the AIDS clinical stage.

Conclusions
In conclusion, our results con rmed that the NHEJ gene polymorphisms play an important role in HIV-1 infection and AIDS progression among MSM in northern China. Up to now, the mechanism underlying the interaction between the NHEJ genes and HIV-1/AIDS remains unclear, and our study opens a new eld for further studying on the functional signi cance and the underlying mechanism of the association between the NHEJ gene polymorphisms and HIV-1/AIDS. However, the results and conclusions of this study needs to be tested and veri ed with more association studies and subsequent function researches in different races.

Subjects
In the present study, a total of 481 HIV-1 seropositive men were recruited from Heilongjiang Center for Disease Control and
Genomic DNA was extracted from 200 µl of peripheral blood of all participants using the QIAamp blood kit (Qiagen, Germany) according to the manufacturer's protocol. All 22 SNPs were genotyped using a custom-designed 48-Plex SNPscan™ Kit (Genesky Bio-technologies Inc., Shanghai, China), based on a method of high-throughput SNP genotyping utilizing double ligation and multiplex uorescence PCR. For quality control, a 5% random sample of cases and controls was genotyped twice to verify the genotyping accuracy, the reproducibility was 100%.

Statistical analysis
The genotype and allele frequencies were calculated through directly counting the numbers after the genotypes of the cases and controls were determined. Chi-square test was used for examining the deviation from Hardy-Weinberg equilibrium (HWE) for all SNPs of the control group, the association between the genotype frequency and susceptibility to HIV-1 infection and the clinical features of cases (such as the CD4 + T-lymphocyte count and clinical stage). Odds ratios (ORs) and 95% con dence intervals (95% CI) were estimated as the relative risk associated with SNPs. The generalized multifactor dimensionality reduction (GMDR) software (http://www.ssg.uab.edu/gmdr/) was applied to assess SNP-SNP interactions. SPSS 23.0 (IBM-SPSS, Inc., Chicago, IL, USA) was used for all statistical analyses. The analyses of linkage disequilibrium (LD) and the haplotypes frequencies were performed using the HaploView software (ver. 4.2, http://sourceforge.net/projects/haploview/). The differences with a P value less than 0.05 were considered statistically signi cant.

Declarations
Ethics approval and consent to participate The study protocol was approved by the local ethics review board (No.: HMUIRB20180019) and all experimental procedures complied with the Declaration of Helsinki. All participates gave written informed consent to take part in the present study.

Consent for publication
Not applicable.

Availability of data and materials
The data-sets used and/or analysed during the current study available from the corresponding author on reasonable request.  Genetic association of NHEJ polymorphisms between cases and controls. The bar marked by the letters a and b corresponds to the ordinate of the minimum value of 0.2%. Codominant 1, the rst column homozygote versus the third column homozygote; Codominant 2, heterozygote versus the third column homozygote. Bold italic indicates statistical signi cance.

Figure 2
Best four-locus SNP-SNP interaction model identi ed by the generalized multifactor dimensionality reduction method. High-risk cells are in dark, low-risk cells are in grey, and empty cells are indicated by no shading. In each cell, the left bar represents case while the right bar represents control. The heights of the bars are proportional to the sum of samples in each group. Note that the patterns of high-risk and low-risk cells differ across each of the different multilocus dimensions, presenting evidence of SNP-SNP interaction or epistasis.