Analysis of the genetic diversity in TLR4 3'-untranslated region among Asian populations and the biological effects of altofrequent SNPs

Background: Numerous case-control studies have demonstrated that single nucleotide polymorphism (SNP) loci in the TLR4 3'-untranslated region (UTR) are associated with multiple inflammatory disorders in various populations, particularly in Asians. However, the distribution frequency of polymorphisms in different Asian races and its effects on TLR4 expression remain unclear. Here, we extracted variants from the 1000 Genomes project database and analyzed the SNPs and haplotypes genetic diversity of TLR4 3'-UTR in 10 Asian populations, and then assessed the biological effects of the altofrequent polymorphic loci on TLR4 expression by luciferase assay. Results: Total 6 SNPs were identified as true polymorphic loci with minor allele frequencies (MAFs) ≥ 1% in Asian populations, including rs41426344, rs7869402, rs11536889, rs7873784, rs11536891, and rs11536896. Considering 4 Tag SNPs, we inferred 5 haplotypes that present a Asiatic frequency higher than 1%. It is noteworthy that rs41426344 is unique to East Asian populations, and the H-5 haplotype frequency was reduced when analyzing pooled data from East and South Asian populations. The MAFs of rs7869402 and rs11536889, and H-2 and H-4 frequencies significantly differed between the populations (P < 0.001). We constructed a pGL3-3494-3UTR luciferase plasmid to simulate TLR4 gene structure in vivo, and used PCR-mediated site-directed mutagenesis to construct a series of mutant luciferase constructs corresponding to the six SNPs. In addition, we identified that TLR4 mRNA was selectively expressed in SiHa and THP-1 cell lines, but not in C33A, Hela, and 293T cell lines. The luciferase activity of constructs containing rs7869402 T allele and rs11536889 C allele increased significantly upon LPS or IL-6 stimulation in THP-1 and SiHa cells. Conclusions: The distributions of SNPs and haplotypes in TLR4 3'-UTR were significantly different among Asian populations. The biological effects of rs7869402 and rs11536889 on TLR4 are significant clues that revealing its critical role in harmful TLR4-mediated responses. These results provided a reminder for future investigation on TLR4 related inflammatory diseases susceptibility. and rs11536889 C allele that contributed to the up-regulated TLR4 expression upon LPS or IL-6 stimulation might serve as potential functional mutations to apply to the future genetic study. The biological effects of rs7869402 and rs11536889 on TLR4 are significant clues that revealing its critical role in harmful TLR4-mediated responses. These results provided a reminder for future investigation on TLR4 related inflammatory diseases susceptibility. TLR4: Toll-like Receptor 4; SNPs: Single nucleotide polymorphisms; UTR: Untranslated region; 1KG: 1000 Genomes project; MAFs: Minor allele frequencies; LPS: Lipopolysaccharides; HPV: Human papillomavirus; IL-6: Interleukin-6; HWp: Hardy Weinberg equilibrium p value.

recognition receptor, which are the portal proteins of inflammation signal transduction [3].
Overexpression of TLR4 in the presence of LPS ligands results in the release of a variety of inflammatory cytokines, and effectively activates innate immune responses [4]. Inflammation-induced TLR4 usually plays a specific role in triggering various diseases.
The Human TLR4 gene (Gene ID: 7099) is located at chromosome 9q33.1, has four exons and three introns, and encompasses a region of 20,332 nucleotides. Polymorphism loci located within the 3'-UTR may influence mRNA translation efficiency via altering the microRNA binding ability, affecting mRNA stability, and modulating disease susceptibility [5]. Analysis of the dbSNP database revealed 669 SNPs in the TLR4 3'-UTR (http://www.ncbi.nlm.nih.gov/SNP/). Only six of these SNPs (rs41426344, rs7869402, rs11536889, rs7873784, rs11536891, and rs11536896) had been analyzed in specific populations using case-control studies [6][7][8][9][10][11][12]. Of these SNPs, rs11536889 is the most extensively studied. The rs11536889 C/C genotype is significantly associated with the risk of periodontitis in Japanese subjects, and the "C" allele frequency is significantly higher than the "G" allele frequency in patients with severe periodontitis [13]. A large study involving ~1400 patients and ~800 controls revealed that the "C" allele obviously increased the risk of prostate cancer in a Swiss population [12].
Sun et al. reported that the "C" allele was significantly associated with ventilator-associated pneumonia in the Chinese population [14]. Another study reported that the rs11536889 "C" allele increases the risk of coronary artery disease in ethnic Zhuang and Han Chinese populations [15,16]. Chang et al. found that the G/C and C/C genotypes were associated with head and neck cancer in Taiwan's population [17]. This SNP was also associated with increased organ failure risk in patients with sepsis in the Caucasian population [18]. Additionally, a significant association between this SNP and Psoriasis vulgaris was identified in the Southern Chinese population [19].
The 1000 Genomes (1KG) Project is an integrated survey of genetic variation from ~2,500 individual genomes from diverse populations worldwide [20]. This genetic catalogue of human genome information can facilitate improvement of the reference sequence, understanding of population traits and evolutionary histories, and exploration of the genetic basis of many complex diseases [21]. A significant association between genetic variation and population distribution has been previously 4 demonstrated, and population-specific polymorphic loci can be used to further understand the mechanisms of population related disease [22,23]. In Asia, there are abundant genetic diversities, especially in India, resulting from a long and complex history of mass migration [22,23].
TLR4 is expressed on myeloid cells, epithelial cells, B cells, muscle cells, and some cancer cells [1,[24][25][26]. THP-1 is monocyte cell line from the blood of human acute leukemia that expresses endogenous TLR4, and has been used for multiple classical TLR4 studies [1,8]. Inflammation within the tumor microenvironment is a critical factor in tumor development [27]. Immune system evasion is the main reason of papillomavirus (HPV) infection and cervical cancer development. TLR4 expression plays a role in the oncogenic potential of HPV positive cells [28], and the rs7873784 G allele in the TLR4 3'-UTR is significantly associated with increased cervical cancer risk [29]. Therefore, THP-1, HEK 293T, human cervical cancer C33A (HPV-negative), HeLa (HPV-18), and SiHa (HPV-16) cell lines were chosen as alternative hosts.
In this study, we extracted genetic variation data for the full-length TLR4 3'-UTR from 993 Asians in 10 Asian populations from the database of 1KG Project (Phase 3 release). We then investigated TLR4 3'-UTR variation and haplotype diversity in these populations individually or in combination. We also assessed the effects of SNPs (with MAFs of at least 1%) on TLR4 expression.

SNP extraction and haplotype construction
Data from 74 SNPs across the entire TLR4 3'-UTR, between positions 120,476,927 and 120,479,769 (GRCh37.p13) of chromosome 9, were downloaded from the 1KG Phase 3 Pipeline using the Ferret version 2.1.1 JAVA tool [38]. The data used in this Analysis from 10 human populations, included 993 Asian individuals as described in 1KG Browser (https://www.ncbi.nlm.nih.gov/variation/tools/1000genomes/) (Additional file 1: Table S1). The two sequences defined by NCBI (DNA: NG_011475.1, and mRNA: NM_003266.3) were used as references in the study. Haploview version 4.2 software was used to filter the SNPs to ensure that that the MAF was present at least 1% of the time, to select tag SNPs, calculate Hardy Weinberg equilibrium p value (HWp), and analyze haplotype patterns and linkage disequilibrium [6] among the SNPs [39]. Six SNPs were identified in the TLR4 3'-UTR = using this screening approach ( Table 1). The Haploview software tagger option was used to predict values r 2 and D' for the remaining SNPs in the dataset. Block-based and block-free approaches were used to increase the accuracy and representativeness of tag SNPs [40]. Haplotypes were evaluated with tag SNPs in 10 populations using Arlequin version 3.5 software [41].

mRNA quantification
Total RNA was isolated from cells using TriPure Isolation Reagent (Roche). Reverse transcription was performed using the PrimeScript™ RT reagent Kit (TAKARA, Japan). Taq Plus DNA Polymerase (TianGen, China) was used for RT-PCR, and TB Green® Premix Ex Taq™ II (TAKARA) was used to perform the qRT-PCR in a CFX96 Real-Time thermocycler platform (Bio-Rad). Relative quantifications were determined by the comparative 2 −ΔΔCt method [42]. All primers used are listed in Additional file 2: Table S2.

Plasmid construction
The human TLR4 promoter (-3494 to +235) was amplified from normal human genomic DNA using Cobuddy Super Fidelity DNA Polymerase (CwBio, China) with primer pair TPprimer1/TPprimer2 (Additional file 2: Table S2). The thermal cycling conditions were 2 min at 98°C, 35 cycles of 10 s at 98°C, 30 s at 67°C, 2 min at 72°C, and 5 min at 72°C. The full length TLR4 3'-UTR sequence was amplified by Cobuddy Super Fidelity DNA Polymerase with the primer set TPprimer3/TPprimer4 (Additional file 2: Table S2). The thermal cycling conditions were 2 min at 98°C, 35 cycles of 10 s at 98°C, 30 s at 62°C, 1.5 min at 72°C, and 5 min at 72°C. Amplicons were electrophoresed and the products were gel purified using a Gel Extraction Kit (Omega Bio-Tek, USA). The pGL3-Basic luciferase vector (Promega, USA) was digested with Xba I and Nco I (TAKARA, Japan), and the linearized vector fragments (1656 bp and 3162 bp) were purified from the electrophoresis gel. Gibson assembly mix was prepared following the one-step isothermal DNA assembly protocol [43]. Approximately 10 ng of the 1656 bp linearized vector fragment, ~15 ng TLR4 3'UTR fragment, ~20 ng 3162 bp linearized vector fragment, and the TLR4 promoter (-3493 to +234) were mixed with 10 μL of Gibson assembly mix, and the final reaction volume adjusted to 20 μL. The mixture was incubated at 50℃ for 1 hour, and the entire volume was transformed into 200 μL of DH5α competent cells. Plasmid DNA extracted using an Endo-free Plasmid Mini Kit (Omega Bio-Tek, USA) and validated by sequencing.

Site-directed mutagenesis
To simulate the physiological state of SNPs, PCR-mediated site-directed mutagenesis for circular macromolecules was performed using pGL3-3493-3UTR. For rs41426344, a mutagenic reverse primer (mut344_G/C_R) was designed to introduce the necessary base (Additional file 2: Table S2, see the underlined base) substitutions to convert guanine (G) to cytosine (C). The forward primer (mut344_G/C_F) was designed in the opposite direction from mut344_G/C_R. The annealing positions of the two primers were back-to-back closed to each other. The primers underwent 5' phosphorylation using T4 Polynucleotide Kinase (NEB, USA) to allow for subsequent ligation. PCR was performed with the pGL3-3493-3UTR template to introduce mutation using the Cobuddy Super Fidelity DNA Polymerase system (CwBio, China). the amplification conditions were 2 min at 98°C, followed by 35 cycles of 10 sec at 98°C, 30 sec at 55°C, and 5 min at 72°C, and a 5 min final extension at 72°C. The PCR products were recovered, purified using the Omega Gel Extraction Kit, and self-ligated following the T4 DNA Ligase Protocol (NEB). The constructs were sequenced by the Beijing Genomics Institute for confirmation. After digestion with BglII, the constructs were religated to generate pGL3-684, pGL3-684-3UTR, and seven mutated constructs (-684 to +235).

Cell culture and transfection
The THP-1 human leukemia monocytic and HEK embryonic kidney 293T cell lines were purchased from the Conservation Genetics CAS Kunming Cell Bank (Kunming, China). THP-1 was maintained in Lglutamine-containing RPMI 1640 (GIBCO, Carlsbad, CA, USA) supplemented with 15% fetal bovine serum (GIBCO), 1% non-essential amino acids, 1% sodium pyruvate, and penicillin/streptomycin as previously described [44]. The SiHa human cervical carcinoma cell line, C33A, and HeLa cells were obtained from the American Type Culture Collection (ATCC; Manassas, VA, USA) and cultured at 37℃ in Dulbecco's modified Eagle's medium (DMEM; GIBCO) containing 10% fetal bovine serum in an incubator with 5% CO 2 . THP-1 and SiHa cells were transiently transfected with luciferase constructs.

Luciferase reporter assay
THP-1 cells were transiently co-transfected with 350 ng of constructs and 35 ng renilla luciferase plasmid pRL-CMV (Promega, USA), as an internal control for normalizing luciferase activity. After 36 h, the cells were treated with serum starvation for 12 h, and 1 ng/ml recombinant human Interleukin-6 (IL-6; Sangon Biotech, China) or 1 µg/ml LPS (Sigma-Aldrich, USA) was added 6 h before harvesting.
SiHa cells were co-transfected with 350 ng of constructs and 17.5 ng of pRL-CMV, and stimulated with 50 µg/ml LPS and 50 ng/ml IL-6. Firefly and renilla luciferase activities were measured following the Dual-Luciferase® Reporter Assay System (Promega, USA) protocol using a Molecular Devices 8 SpectraMax iD5 multi-mode microplate reader. Experiments were performed at least in triplicate.

Statistical calculations
SPSS version 22 software (IBM Corp, USA) was used for statistical analyses. The unpaired Student's t test was used to compare relative luciferase activities between two groups. Frequency differences among haplotypes were assessed using the Chi-square statistical method. Statistical analyses were expressed as mean ± SD of three or more independent experiments. P < 0.05 was considered statistically significant.

Variants and haplotypes in the TLR4 3'-UTR
The Asian people in the 1KG were classified as being either East or South Asian, and consisted of 994 singletons from 816 families were included (Additional file 1: Table S1). In the TLR4 3'-UTR of, 74 SNPs were reported in the 1KG (Phase 3), six of which had MAF ≥ 1% (Table 1). Four polymorphic loci were selected as tag SNPs (bold SNPs), and one LD block was identified in the 3'-UTR (Fig. 1).
The frequency of each SNP in the different Asian populations analyzed is listed in Table 2. The reference MAFs of rs41426344, rs7869402, and rs11536889 significantly differed between the ten populations (P < 0.001). rs41426344 is a unique SNP in East Asian populations and has a lower MAF in rs11536889 frequency was significantly higher in the CDX population than in the other nine populations. Moreover, rs7869402 was very rare in the KHV population (Fig. 2, Table 2). Five haplotypes (H-1 to H-5) emerged with an Asiatic frequency higher than 1% ( Table 3). The distribution of haplotypes differed between East and South Asian populations. H-4 is more prevalent in South Asian populations (P < 0.001) and H-5 is absent from South Asian populations, but has a frequency higher than 1% in East Asian populations (Fig. 3, Table 4). LD analysis revealed modest linkage between rs7873784, rs11536891, and rs11536896 polymorphisms (rs7873784/rs11536891: D' = 0.995, LOD = 230.51, r 2 = 0.984, and rs11536891/rs11536896: D' = 1.0, LOD = 239.44, r 2 = 1.0). rs11536891 and rs11536896 are in complete linkage disequilibrium (Fig. 1).

Effects of Polymorphism loci on TLR4 Expression
Using RT-PCR, we determined that TLR4 mRNA was highly expressed in SiHa and THP-1 cells, making them more for the analysis of TLR4 expression than the C33A, Hela, and 293T cell lines (Fig. 6a). In SiHa cells, an LPS agonist concentration of 50 μg/ml, activated TLR4 mRNA, and TLR4/MyD88/NF-κB pathway proteins at the highest level (Fig. 6b, c). In THP-1 cells, 10 μg/ml was an optimal concentration, and increased TLR4 mRNA expression ~2.6-fold, and activated the TLR4/MyD88/NF-κB pathway to highest l expression evel (Fig. 6d, e).
To determine whether the SNPs influence TLR4 gene expression, we performed transfection analysis.
Relative luciferase activities for individual cell lines were compared to those of the pGL3-3494-3UTR construct group and the control group. The relative luciferase activity of the pGL3-3493-3UTR construct decreased ~1.5-fold in SiHa cells and ~1.4-fold in THP-1 cells in the presence of the TLR4 3'-UTR (Fig. 7a, c). Both of the promoter regions, -3494 to +235 bp (pGL3-3494) and -684 to +235 bp (pGL3-684), increased luciferase activities following LPS or IL-6 stimulation (Fig. 7). None of the six mutated plasmids showed significant activity changes in the control group. However, the significantly increased activities of pGL3-3494-mutrs402 and pGL3-3494-mutrs889 constructs demonstrate the "T" allele of rs7869402 and "C" allele of rs11536889 may increase the expression levels of TLR4 gene in vivo when stimulated with LPS or IL-6 ( Fig. 7). In addition, to investigate whether simultaneous mutation of the moderately linked rs7873784, rs11536891, and rs11536896, which were modest linkage each other, would alter TLR4 expression, we created the pGL3-3494-TRImut construct, containing all three changes. LPS or IL-6 pretreatment had almost no effect on pGL3-3494-TRImut relative luciferase activity in either cell lines (Fig. 7).

Discussion
We found that SNPs were common in the TLR4 3'-UTR. Analysis of 1KG project data from 10 populations from Asian revealed up to 74 SNPs in this region, and that six of these (8.1%) had a MAF > 1%. The distribution of these SNPs significantly differed among these populations. Moreover, we constructed five haplotypes based on four tag SNPs, rs41426344, rs7869402, rs11536889, and 3'-UTR sequence length plays a pivotal role in translational regulation [30]. Moreover, the nucleotides within the in 3'-UTR may post-transcriptionally regulate the observed stimulus-specific induction [31,32] and repress luciferase translation in vivo [33]. Therefore, we cloned the full-length TLR4 3'-UTR and inserted it seamlessly into LUC gene construct to imitate the TLR4 gene structure in vivo.
TLR4 is a transmembrane receptor protein expressed on a small number of cell types, including 11 myeloid cells, epithelial cells, B cells, muscle cells, and some cancer cells [1,[24][25][26]. Upregulated TLR4 expression, via LPS or inflammatory cytokine stimulation, can cause inflammation [34]. In this study, we found that TLR4 mRNA is expressed at a higher level in SiHa cells, a cervical carcinoma cell line infected with human papillomavirus 16 (HPV-16), than in HeLa (HPV-18), C33A (HPV negative), and 293T cells. TLR4 mRNA expression is lower in SiHa cells than in THP-1 cells. The TLR4 gene can respond to LPS stimulation in these two cells. Taken together, our results show that TLR4 signal pathway responses to LPS may be controlled by HPV types in cervical cancer cell lines. Indeed, TLR4/MyD88/NF-κB pathway-related proteins were significantly up-regulated following LPS pretreatment. Therefore, we selected SiHa and THP-1 as the host cells for our luciferase assays.
The rs11536889 is located in the center of the TLR4 3'-UTR. Sato et al. used truncated fragments and transient transfection of a series of luciferase-reporter plasmids to demonstrate that, after LPS and IL-6 stimulation, the "G" and "C" alleles suppressed and upregulated luciferase activity, respectively, in THP-1 cells [8]. In this study, luciferase reporter experiments were designed to identify the effect of "G" and "C" alleles on TLR4 expression using the full-length 3'-UTR. Consistent with the previous results, after stimulation with LPS or IL-6, the relative luciferase activities were significantly higher in the "C" allele than in the "G" allele in both of SiHa (P < 0.01) and THP-1 cells (P < 0.01). To predict SNP effects on TLR4 mRNA secondary structure, we used RNAfold WebServer (http://rna.tbi.univie.ac.at//cgi-bin/RNAWebSuite/RNAfold.cgi) and RNAsnp web server (http://rth.dk/resources/rnasnp/) [35]. The transversion had no obvious implication in mRNA folding, and the minimum free energy of optimal secondary structure is -1600.20 and -1600.30 kcal/mol for the "G" and "C" alleles, respectively, and the p-value of maximum structural change is 0.1690 (Additional file 3: Figure S1a, d). The difference in the effect these alleles on gene expression may be due to this location being the target binding site for hsa-miR-1236 [8]. In a large case-control study include 780 controls match with 1383 prostate cancer patients in a Swiss population, Zheng et al. previously demonstrated that the C/C and G/C genotypes of this polymorphism increase prostate cancer risk [12]. In conclusion, these results suggested that rs11536889 may increase the risk of inflammatory disease and some cancers, that HPV-16 positive illnesses such as condyloma 12 acuminatum [36], and cervical and anal carcinoma [37] can be identified as priorities for studies on the diseases caused by rs11536889. rs7869402 is also located in the center of the 3'-UTR. After LPS or IL-6 stimulation, the luciferase activity of the T/T genotype was significantly up-regulated in both SiHa and THP-1 cells. Only one population-based genetic study showed that genetic variation in rs7869402 is significantly associated with susceptibility to pulmonary tuberculosis in Sudanese population [11]. There is a marginal difference of minimum free energy between the C/C and T/T (-1600.60 kcal/mol) genotypes, no evident change in mRNA folding, and the p-value of maximum structural change is 0.2121 (Additional file 3: Figure S1a, c). Our results indicate that rs7869402 is a pivotal variant that interferes TLR4, and the abnormal expression may the result by interacted with microRNA.
In addition, limited population-based genetic studies have demonstrated that the polymorphism loci of rs41426344, rs7873784, and rs11536891 are associated with the risk of rheumatoid arthritis and juvenile idiopathic arthritis [9,10], IgA nephropathy [6], and bipolar disorder [7]. In contrast, the luciferase activities of constructs corresponding to these variations produced almost constant levels of luciferase activity. rs11536891 and rs11536896 both have significant structural changes between the two genotypes (P < 0.2) (Additional file 3: Figure S1a, f, g), but no significant differences in minimum free energy. Therefore, further studies need to be performed to identify the possible mechanisms of these SNPs. Additional future research should focus on the associations between polymorphic sites and TLR4-related diseases phenotypes.

Conclusions
In summary, our analyses have showed that the distributions of TLR4 variants and haplotypes were significantly different among 10 Asian populations, in 3'-untranslated region. The rs7869402 T allele and rs11536889 C allele that contributed to the up-regulated TLR4 expression upon LPS or IL-6 stimulation might serve as potential functional mutations to apply to the future genetic study. The biological effects of rs7869402 and rs11536889 on TLR4 are significant clues that revealing its critical role in harmful TLR4-mediated responses. These results provided a reminder for future investigation on TLR4 related inflammatory diseases susceptibility.

Availability of data and materials
The datasets supporting the results of this article are included within the article (and its additional files). 1000 Genomes project data is publicly available through its website.

Competing interests
The authors declare that they have no competing interests