Association of polymorphisms in SLC gene family with risk of leprosy and malformation in Chinese familial and sporadic patients

Background: Leprosy is a chronic infectious disease caused by Mycobacterium leprae, and usually affects the skin surface and peripheral nerves, and may lead to deformities if not be healed. Genes in solute carrier (SLC) super-family encode a series of the largest transport-proteins, and are reported to be associated with several connective tissue diseases. Methods: Here we performed a multiple-stage study to investigate the polymorphisms of SLC gene family in familial leprosy patients using whole exome sequencing data and sporadic leprosy patients using TaqMan custom genotyping assays, in China. Results: We found that, rs202071873 in SLC35E1 is associated with leprosy, rs3731865 in SLC11A1 and rs117820608 in SLC35E2B are associated with the degree of malformation in leprosy. Conclusions: This report of the polymorphisms in SLC gene family supports the view of a important role of human genetics in the natural development of the disease. g,g,g,r,r,r,f,f,f,f,f,f,f,f,f,f,f,f --nastring NA --remove’. Variants located in simple

different SLC gene could cause different diseases. For instance, some Bartter-like syndromes result from defects of SLC12A1 [11] , mutations in SLC40A1 [17] can lead to hemochromatosis type 4(HFE4), and SNPs (rs26722 and rs16891982) in SLC24A5 [18] was found to be involved in reduced melanogenesis and thereby making the skin lighter. Specially, SLC11A1, a major candidate gene of leprosy susceptibility [6] , was found to highly express in mononuclear macrophages [7] and restrictively expresse in myeloid cells [7] , neuronal cells [8] and human exocrine tissues, including prepituitary gland, adrenal medulla and pancreatic island.
The 35 th member of SLC super family (SLC35) has 6 sub-types (A-F) [9,10] , and 31 SLC35 genes were found in human genome. SLC35 genes, which transport multiple types of monosaccharide nucleotide in the transporter cells, are essential genes in the process of glycosylation. In SLC35 defect research, SLC35 genes were found to participate in a variety of connective tissue diseases and genetic diseases [19,20,21] , and played an important role in the occurrence and metastasis of tumor in many pathological processes [22,23] .
In this study, we applied a two-tiered procedure (Figure 1) that first sequences the whole exome of 29 individuals (19 leprosy patients and 10 unaffected control) from 8 families, and then selects 17 mutations of SLC gene family observed in the sequencing data and 33 candidate mutations based on susceptibility and resistance in Mendel-model, respectively. To determine whether the selected mutations were causal or susceptible to leprosy, we examined these mutations in 191 additional leprosy patients and 100 unaffected healthy controls to estimate the association between such mutations and leprosy patients and their multiple phenotype. Meanwhile, considering the low frequency of some polymorphic sites, gene-based burden test was also carried out to estimate the storage effect of such mutations.

Familial data
Eight families of Han Chinese were recruited from Guangdong province for exome sequencing( Figure   2). Parents and siblings of the proband in family I are unaffected, and the proband has 5 children, 1 of father and his affected son. Proband of family II has 4 sons and 3 daughters, one of his sons and one of his daughters are affected, in addition to the 3 affected patients, an unaffected son was also selected for sequencing. In family III, blood of proband, her affected daughter and, her unaffected son and granddaughter were enrolled. For family IV and V, there are only two patients and are all selected. When comes to family VI, both parents of the proband are affected, and the proband also has an affected brother and an unaffected brother, an unaffected sister. All members of this family were enrolled. In contrast, in family VII, only one of his unaffected parents and all his siblings were sequenced. And for family VIII, blood was obtained from the proband and her affected sister and daughter, unaffected mother and brother. Totally, 19 leprosy patients and 10 unaffected samples were enrolled. In the replication stage, a cohort of 196 patients and 100 healthy controls were collected, and clinical information was recorded in the process of diagnosis and follow-up telephone visit (Supplementary Table 1). All these subjects were recruited by Guangdong Dermatology Hospital from 2008 to 2014, and written informed consent was also collected from all subjects.
Exome sequencing and data analysis Genomic DNA of each individuals was extracted from the peripheral blood with using the QIAamp DNA Blood Mini kit (QIAGEN GmbH, Hilden, Germany), under a strict quarantine measures to avoid the spread of leprosy. Pruned DNA was hybridized by the Agilent SureSelect Human All Exon V4 array to capture the coding sequencing of the DNA. All samples from 8 leprosy families were performed whole exome sequencing with 100bp pair-end reads using Illumina Hiseq 2000 platform (Illumina, San Diego, CA). The raw image files were transformed into compressed fastq files using the standard Illumina Pipeline (version 1.3.4) for base calling with default parameters.
After removing reads with adapter sequencing, or with abnormal high (>50%) proportion of low quality (<5) bases or missing bases, Fastqc (version 0.11.4) were utilized to measure the sequencing quality of each files. The raw reads were aligned to the human reference genome (hg19 version) by BWA (version 0.5.9-r16), and the duplicated reads produced during PCR were marked by Picard (MarkDuplicates.jar, http://broadinstitute.github.io/picard/command-line-overview.html). To improve the alignment effect, we use Genome Analysis Toolkit (GATK 3.3) to perform the indel re-alignment and recalibrate the base quality score. Under the best practice pipeline, SNVs and Indels were detected simultaneously by HaplotypeCaller in GATK, taking each family as a unit.

Family analysis
Putative candidate causal mutations were firstly extracted based on the pedigree and phenotypic info of each family (Figure 2). For family 1, the clinical characterization was considered to be caused by de novo mutations occurred in sample 141200406, and transmitted to his two kids as dominant inheritance. In family 2, 3 and 4, the inheritance model was suspected as dominant inheritance. In addition to recessive inheritance, analysis based on incomplete dominant inheritance was also conducted for family 5, 7 and 8. Considering both parents in family 6 were affected, we just excluded loci which presented to be mutational in either of the two unaffected family members (141200424, 141200425).
According to the pedigree of each family, no promiscuous family relationship or confused gender were found, based on the sequencing data. Candidate causal mutations were detected under the following quality criteria: 1) followed the putative inherited model of each family, 2) not in the known repeat region of the genome, 3) alter allele frequency < 1% in public database 1000g_eas, ExAC_eas and gnomAD, 4) missense mutations with SIFT score < 0.05 or loss of function(nonsense, splicing, or frameshift InDel) mutations.

Genotyping and quality control of selected mutations
To validate novel signals in the leprosy study, rare familial inheritance mutations, and mutations located in SLC genes found in exome sequencing data were selected. Genotyping of two validation stages was conducted using the Sequenom MassARRAY system (Sequenom, San Diego, CA) and Before genotyping in larger cohort, rare familial inheritance mutations that failed genotyping were discarded. Meanwhile, to avoid the disturbance of complexity genome sequence, mutations with homopolymer length larger than 6 were also removed. Besides, we removed mutations with a P-value < 1.0E-04 in Hardy-Weinberg equilibrium test in healthy control.

Association and QTL analysis, Mutation burden analysis
Three methods in PLINK, chi-square test, fisher's exact test and conditional logistic repression analysis, were utilized in the association analysis of single mutation, and the parameters of the three methods were '--assoc', '--fisher' and '--logistic', respectively. In addition to case-control model, QTL analysis (PLINK) based on diverse clinical information of leprosy patients was performed under the linear regression model, including malformation, nerve damage, response ability, and skin damage.
All clinical information were transformed into number to express the degree of phenotypes. For skin damage, 0, 1, 2 and 5 were the real amount of lesion, 3 reflects there was more than 2 and less than 5 lesions, while 8 shows the patients carried more than 5 lesions. For nerve damage and response ability, patients with larger scores have severer damage. Meanwhile, 1, 2 and 3 presented to be first-, second-and third-level disability, 0 means the patient has no malformation. Besides, missing phenotype info of any patients was marked as -9.
The overall burden of all variants was assessed across all 39 genes. To assess mutation burden, mutations in a gene were performed burden test using function

Immunohistochemistry
The cells were lysed with RIPA lysis buffer on ice and the supernatants were boiled in SDS loading buffer. Cell lysates and prestained molecular weight markers were subjected to SDS-PAGE and transferred to PVDF membrane to incubate with the appropriate antibodies. ECL reagent (Thermo Scientific) was applied for immunoblot analysis.
We zoomed the original dyeing pictures for 100 times, and selected multiple areas in each pictures to count the dyeing strength. And, the average of these several product of dyeing strength and the dyeing area was defined as a sample's immunoreactive score. Subsequently, we utilized wilcox rank sum test to estimate the protein expression difference in various types of tissues.

Results
Exome sequencing data analysis. To narrow down the variants set and include only potentially pathogenic variants (Methods), we focused only on nonsynonymous variants, splice acceptor and donor site mutations, and coding frameshift InDels. We also removed variants with >1% population frequency in east Asian in 1000 genomes project or ExAC, gnomAD. To reduce the effect of complicated genome sequences, variants located in segmental duplication region or simple repeat regions were also removed.

Variant validation and Replication
To deeply investigate the pathology of leprosy, in addition to 33 selected putative mutations above, 17 mutations (16 SNPs and 1 InDel, Supplementary Table 4b) that comes from SLC gene family were also genotyped by Sequenom MassARRAY System, in a larger Chinese leprosy cohort(including 15 exome sequencing samples that can be collected repeatedly, and 177 additional collected samples) and 100 healthy control. Among of these, 5 mutations that were not repeated in sequencing samples were discarded in following analysis.
We found a candidate susceptible SNP rs202071873 in SLC35E1, but no SNPs were found to satisfy the significant cutoff (5.0E-08) for single loci association analysis under various types of test, including chi-square (Supplementary Table 5a), fisher's test (Supplementary Table 5b) and logistic regression test (Supplementary Table 5c). To further our understanding, we used linear model (general allele model) to assess the effect of allele on the degree of malformation, nerve damage, response, and skin damage. All clinical information were transformed into quantity to simply the test (Method). And we found rs3731865(p=0.00822, beta=-0.2923) in SLC11A1, and rs117820608(p=0.033, beta=0.3254) in SLC35E2B to be suggestively associated with the degree of malformation. In addition, patients with genotype A/G at rs117820608 are tended to be more likely to cause disability(p=0.0268, OR=3.939), and no significant difference was found in the genotype of rs3731865. Meanwhile, in the gene-based burden test, SLC35E1 was showed to be associated with leprosy( Table 2).

Protein-protein interaction
For all 40 genes selected in our study, protein-protein interaction networks were predicted by DAPPLE (Disease Association Protein-Protein Link Evaluator). Direct protein-protein interaction was found between MAST2 and TEAD3, and SLC35E1-SLC35E2-SLC35E2B were found to be in an isolated network (Supplementary Figure 1).

Validation using Immunohistochemistry
To deeper investigate the role of SLC genes in the development of leprosy, we enrolled another 88 FFPE samples(including 28 healing samples, 12 healthy control, 16 multibacillary patients, 13 paucibacillary patients and 19 untreated patients) and examined the protein expression level of SLC35E1 and SLC35E2B in normal skin tissue of 12 healthy controls and skin lesions from different types of leprosy patients (Figure 3). And immunoreactive score (IRS, the product of dyeing strength and dyeing area) was used to estimate the protein expression level.
We also genotyped the polymorphic loci of SLC35E1 and SLC35E2B in these samples, but unfortunately only a few proportion of samples were succeed. Association analysis between the genotype and IRS was processed, but no significant result was found due to the limited genotype information(results not shown).
But for the IRS itself, we observed that, the protein expression of SLC35E2B varied prominently in different tissues of various samples (Figure 4). For SLC35E2B, all four types of tissues were observed to be obviously lower-expression in epidermis( Table 3, Supplementary Table 8). And in dermis tissues, there was no significant protein expression difference between patients and healthy samples, except for tissues from patients after treatment (P=0.005, wilcox rank sum test). Moreover, the protein expression level of SLC35E2B in dermis of un-treated patients were observed to be higher than that in healing patients (P=0.0445). Meanwhile, the difference of protein expression of SLC35E1 in diverse tissues in neither epidermis nor dermis were not obvious.

Discussion
Genetic factors have been recognized as an important contributor of leprosy after infected with leprosy bacillus. Genome wide association studies (GWAS) have identified several susceptibility common variants [24,25,26] and lncRNA [27] associated with leprosy. Meanwhile, immune response genes have been proved to be associated with pathogenesis of different forms of leprosy [28] .
However, these factors could only explain a small proportion of disease heritability.
In this study, we performed a two-stage study to investigate SLC gene polymorphisms in familial and sporadic leprosy patients. First, we performed exome sequencing in 8 leprosy families, and extracted rare inherited mutations based on the infection and immune response to the Mycobacterium leprae, respectively. And meanwhile, we found a large mount of polymorphisms in SLC genes in patients of these 8 families, did not find any known causal rare mutations. These results suggested that, infection and fall ill could be caused by numbers of genes, and were not limited to rare mutations.
Second, validated rare mutations and some specific common mutations of SLC gene that found in leprosy family, were replicated in larger cohort. Finally, 3 SLC genes (SLC11A1, SLC35E1, and SLC35E2B) were found to be suggestively associated with leprosy at different ways. Meanwhile, indirect protein-protein interactions were predicted between SLC35E1 and SLC35E2B. SLC11A1 which translates a multi-pass membrane protein, is a member of the solute carrier family 11 family. Mutations in this gene have been reported to be associated with many disease [32] , including tuberculosis [33,34] , leprosy [35] , rheumatoid arthritis [34,36] and Crohn's disease [37,38,39] . Association analysis result of our data also found SLC11A1 to be significant associated with leprosy.
MAST2 and TEAD3 are protein coding genes of human genome, and implicate direct protein-protein interaction, but no additional mutations were found in expanded leprosy samples, There might be some interesting findings under further studies with larger sample size. Conclusions SLC35E1 and SLC35E2B are homologous genes that predicted to be located on membrane, and both belong to the solute carrier family 35 family, and thus far, no direct evidence showed these two genes are specifically linked to any disease. For SLC35E1 and SLC35E2B, we performed Immunohistochemistry in the skin lesions of multibacillary and paucibacillary, before and after treated, and skin tissues of healthy control, respectively. Our data showed that, protein expression level of SLC35E1 in diverse skin tissues were no-difference neither in epidermis nor dermis. In contrast, protein expression level of SLC35E2B in skin tissues selected from various types leprosy patients were significantly lower than that of the healthy controls, especially in epidermis. Meanwhile, in dermis, protein expression level of SLC35E2B of tissues collected from healing leprosy patients were lower than that of both healthy controls and un-treated leprosy patients. But there was no evidence in our results that can link the gene polymorphisms to the protein expression level. This may need more samples or should be examined in further study.
Moreover, our data show that, the pathology and genetics factor of leprosy varies along with diverse types of leprosy. And we found that the protein expression of SLC35E2B in epidermis was correlated with bacterial infection, which suggested that, SLC35E2B might play an important role in the occurrence of leprosy. Meanwhile, the protein expression difference between healing patients, untreated and healthy controls suggested that, SLC35E2B might also play an important role in the treatment of leprosy, and can acted as an candidate marker to evaluate the curative effect of leprosy.

Ethical approval and consent to participate
Ethics approval was obtained from the Ethics Review Committee of Dermatology Hospital, Southern Medical University. All participants provided written informed consent in this research.

Consent for publication
Not Applicable.

Competing Interests
The authors declare no conflict of interest.
Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Declarations
Written informed consent was obtained from the patient for publication of this Case Report and any accompanying images. A copy of the written consent is available for review by the Editor of this   Table 3. Wilcox rank sum test based on IRS of SLC35E2B and SLC35E1, in various type of skin tissues collected from paucibacillary and multibacillary, before treated and healing patients. Figure 1 Data analysis workflow for estimating the association between SLC gene polymorphisms and leprosy.  Results of staining immunohistochemistry with magnification 100 times. a) healthy control b) paucibacillary patients. c) multibacillary patients d) un-treated patients e) healing patient.

Supplementary Files
This is a list of supplementary files associated with this preprint. Click to download. Supplementary