Role of Paired Box 5 (PAX 5) Genetic Variant in the Development and Progression of Leukemia.

Leukemia is a heterogeneous disorder, characterized by a high proliferation of white blood cells. Various genetic studies have tried to reveal the role of contributory SNPs responsible for the development of leukemia. The expression of this factor is involved in several aspects of B-cell differentiation, including immunoglobulin gene rearrangement, BCR signal transduction, and B-cell survival so that the deletion or inactivating mutations of Pax5 causes cell arrest in the Pro-B-cell stage. The role of the present gene has previously been studied in various population groups; however, the role of this variant in Leukemia within Indian populations is unclear. In the present study, the role of genetic variant of rs3780135 of PAX5 was investigated within leukemia patients from Northern India. The variant was identied by exome sequencing and then genotyped using high throughput real-time based TaqMan assay technology and validated by SSCP and Sanger sequencing. The association of SNP with the disease was evaluated using logistic regression. It was observed that the variant rs3780135 of PAX5 showed signicant association with Leukemia in the North Indian Population [OR of 1.94 (95% CI 1.51-2.48), p=1.2x10 -6 ] in real time based TaqMan assay and allelic OR of 1.54 (1.02-2.34), at 95% CI, p=0.002 for sanger sequencing when corrected for age, gender, BMI, smoking, and alcohol. The present study concludes that the variant rs3780135 of PAX5 polymorphism act as a risk factor in the development of leukemia within Northern India population.


Introduction
Leukemia consists of a group of heterogeneous malignancies in which immature and dysfunctional hematopoietic progenitors proliferate and accumulate in the bone marrow 1 . In hematopoietic stem cells, a disruption of the cellular processes including proliferation, differentiation, and cell growth ultimately leads to Leukemia 2 . According to the Population-Based Cancer Registry of India, males are more affected than females by the ratio of 2:1 3,4 . The majority population groups of Jammu and Kashmir practice endogamy, thus preserving the gene pool. This factor has been increasingly documented as an inherited genetic factor that can contribute to the progression of leukemia 5 6,7 .
The variant rs3780135 of PAX5 is a fellow of the paired box domain gene family that translates for nuclear transcription factors which plays signi cant part in growth, variation, cell movement, propagation, and hematopoiesis [5][6][7][8][9] . The transcriptional movement of PAX5 (chr9:36840510-36840768 in genome browser) which regulates its target genes when determined by interaction of distinct partner proteins with the central and C-terminal protein interaction motifs of PAX5 8 . The fractional homeodomain of PAX5 associates with the TATA-binding protein of the basal transcription machinery and the C-terminal transactivation domain which regulates gene transcription most probably by interacting with histone acetyltransferases (HAT), such as the co-activator CBP or SAGA complex 9 as shown in gure 1.
The appearance PAX5 in regular mature tissue is restricted to the hematopoietic system however it is abnormally articulated in a number of solid cancers as well as B-cell malignancies 10,11 . Damage of PAX5 expression hindered B cell development at an early pro-B-cell stage and relapses committed B-cell precursors (BCPs) to progenitors 12,13 . PAX5 has regularly remained embattled for alterations and chromosomal translocations in childhood acute lymphoblastic leukemia (ALL) [14][15][16][17][18][19][20] . It was observed that the genetic variant rs3780135 of PAX5 was associated with Acute lymphoblastic leukemia in various ethnic Chinese and Japanese populations 21,22 . The aim of the present study is to discover the association of variant rs3780135 of PAX5 with the risk of Leukemia from the Jammu & Kashmir region. This study bene ts in the detailed valuation of variant rs3780135 in the considered population and will highlight the pathways associated with it.

Results
The total exonic data generated was 45.63 GB with more than 100X sequencing coverage. A total of 62 shortlisted genes were screened and studied through this analysis. In the present study, it was observed that for homozygous 78,878 variants were reported in cases and 31,873 were present in controls while about 128,253 heterozygous variants were in cases and 46,938 were present in controls. A total of 15,024 missense variants were observed in all samples, out of which the PAX5 variant was frequently observed in all samples with a high percentage frequency. Gene details is given in further validate in a larger sample size by using TaqMan assay and sanger sequencing. Furthermore, SSCP and was also done to con rm the mutation. The main aim of the study was to explore the association of variant rs3780135 of PAX5 with the risk of Leukemia in the present population.  To observe the maximum effect of allele A, association of PAX5 was observed by using dominant model. The OR observed was 1.6 (0.94-2.4) at 95% CI in leukemia corrected for age, gender and BMI. Furthermore, evaluation was done on variant rs3780135 of PAX5 for other genetic models as per the risk allele and the results observed showed positive association of variant in all the three models as shown in Table 4.    Table 5 when corrected for age, gender, and BMI. Thus, it was observed that the variant rs3780135 of PAX5 shows a signi cant association with the risk of leukemia for North Indian population.  It was observed from the Jpred secondary structure analysis that the region in question, (aa 211-295) containing the mutation at T 264, did not show any prominent structural characterization. However, the region from aa 231 till 251 shows a short beta-sheet and alpha-helix, as shown in Figure 5. Green represents the beta-sheet while red denotes the alpha helix. The mutation T 264 is marked in the box in the vicinity of the unstructured region.
The same region (aa 211 till 295) was also analyzed in the Raptor X server 24 , for the characterization of the secondary structure elements. Figure 6 (a) also shows the absence of a structured region along with the mutation at T 264 which is marked in the box in the gure. The short alpha-helix is observed in the position from aa 240 to 250. Raptor X, also measures the protein disorder propensity, from the disorder propensity analysis it was observed that the chunk of 10 aa covering T 264 as shown in the box marked in the gure (Figure 6 b) indicates the maximum disorder in this region. The red block indicates the maximum disorder which is followed in the rest of the sequence as well.
Domain Identi cation of the Pax5 uncharacterized region.
The Pfam database at EMBL-EBI which is a large collection of protein families and analyses the protein sequence from Pfam matches and based on which it determines the family of the proteins 25 . The region 211-295 aa of PAX5 which is largely unstructured as shown in the analysis above, was provided as an input to the Pfam search. The Pfam characterized this region in the Homeodomain family (PF00046) belonging to the clan CL0123. Homeodomain proteins are responsible for regulating gene expression and cell differentiation during the early stages of embryonic development and have a characteristic protein fold that binds to the DNA and regulates the expression of target genes 26 . The clan CL0123 has a diverse range of DNA-binding domains that predominantly contain a helix-turn-helix motif. Discussion PAX5 plays important role at time of checkpoints in B lymphoid maturation and leukemogenesis 27 . Mutations and deletions of PAX5 have been well-thought-out so far as subordinate oncogenic measures because they were originated in several BCP-ALL subtypes and probably in slight subclones 28 . In the present study, an attempt was made to explore the association of the variant rs3780135 of PAX5 with Leukemia among patients from North India. The same variant was found signi cantly associated with acute lymphoblastic leukemia in Turkish populations 29 , Germany 30 . However, the same variant show protective effect in B cell acute lymphoid leukemia in Pakistan population 31 . The present study indicated that genetic variants rs3780135 of PAX5 pose as risk factors for leukemia.
PAX5 is a transcription factor that is required for B-cell development and its maintenance. PML is a tumor suppressor and a pro-apoptotic factor 32 . PML has been found to be translocated to the PAX5 locus to generate a PAX5-PML fusion gene in childhood acute lymphoblastic leukemia 33 which disrupts PAX5 function. Therefore, as it indicates that PAX5 are regulators of cellular processes, differentiation, and haematopoiesis and slightly modi cation in these genes may lead to the extension of leukemic risk. Therefore, it is much desirable to discover the effect of these genetic variants in the molecular functioning of PAX5 and the linked downstream signalling pathways.

Conclusions
Our ndings provide evidence that the variant rs3780135 of PAX5 is associated with the predisposition of leukemic risk in the Jammu and Kashmir region. A study on large cohort may help in understanding the effect of this variant in different ethnic populations and may act as a predictive or prognostic biomarker for leukemia.

Ethics statement
An approval for the study under the noti cation number (SMVDU/IERB/16/41) was taken from the Institutional Ethics Review Board (IERB) of Shri Mata Vaishno Devi University (SMVDU). All the details were recorded in a predesigned proforma and a well written informed consent was obtained from both cases and controls. All experimental procedures were conducted according to the guidelines and regulations set by the IERB, SMVDU.

Sample Collection and DNA isolation
An overall of 600 subjects were drafted for the study, of which 200 were cases (leukemic patients) and 400 remained healthy controls. Using the FlexiGene ® Qiagen DNA isolation Kit (Catalogue No.51206), the genomic DNA was isolated from the blood samples. The quality of the genomic DNA was checked by 0.8% agarose gel electrophoresis (Bio-Rad Gel Doc™ EZ imager) and quanti cation was done by using Bio-Spectrometer™ (Eppendorf India Pvt. Ltd.).

Whole Exome Sequencing
The exome sequencing for samples was done on the Illumina NGS platform. The alignment against the hg GrCh37 genome was carried out using Burrows -Wheeler Aligner (BWA) and the variant calling was done using the Gene Alignment Tool Kit (GATK) pipeline. The alignment against the hg19 GrCh37 genome was carried out by using BWA and the variant calling was done by GATK pipeline. The variants were identi ed by using the GATK tool and annotated using ANNOtate VARiation (ANNOVAR). The databases used for the study include the 1000 genome, (database for nonsynonymous SNP's functional predictions (DBNSFP), Single Nucleotide Database (dbSNP), Genome Wide Associated Studies (GWAS). After ltration of variants against various lters by the use of Exome Capture Kits targeted region.

Genotyping
Genotyping of variants rs3780135 of PAX5 was achieved by using the TaqMan allele discrimination assay MX3005p labeled with VIC and FAM dyes (Thermo Fisher Scienti c) and UNG Master Mix (Applied Bio-systems, USA). The capacity of the total PCR reaction was 10µl, including of 2.5 µl of TaqMan UNG Master Mix, 0.25 µl of the probe, 3µl DNA (5ng/µl) and 4.25 µl nuclease-free water further together to make the nal volume. The thermal conditions approved were 10 minutes at 95 °C, 40 cycles of 95°C for 15 seconds and 60°C for 1 min. Entirely the samples were run in a 96-well plate with three no template controls (NTCs). The post PCR detection system was used to measure allele-speci c uorescence. A overall of 93 random samples respectively from cases and controls were selected and re-genotyped for crossvalidation of the genotyping calls and the agreement rate was 100%.

Single-Strand Conformational Polymorphism (SSCP):
SSCP is a PCR based technique for the rapid detection of mutations in target gene fragments under nondenaturing conditions 35 . In such conditions, PCR ampli ed DNA fragments are denatured at 95º C and immediately snap chilled. Following this the samples are loaded and run through 10% native polyacrylamide (PAGE) gel (Bio-Rad-Mini-PROTEAN Tetra Cell) 36 after that the gel was stained with Ethidium bromide (Etbr 1 microgram/ml) for 5-7 minutes and was checked (Bio-Rad Gel Doc™ EZ imager). If mutation at a speci c nucleotide position persists in the DNA fragment, conformational alterations can be identi ed.

Sanger Sequencing
Genotyping of PAX5 (rs3780135) was done by polymerase chain reaction (PCR) and the primers forward 5'-TCACCCTCAATAGGTGCCATC and reverse 3'-ACTGGGACAGAGATCTTGGTGA which were designed by primer 3 based software. Overall samples were ampli ed using the PCR program with 95ºC for 1 minute, 95ºC for 1 min, 60ºC for 45 seconds with 36 cycles, 72ºC for 1 min, 72ºC for 2min, 4ºC in nity (∞), and sequencing was performed by Aggrigenome Pvt. Limited using Sequence Scanner Software (ABI 3730 XL DNA Analyzer) and Chromas v2.6.6 (Technelysium Pty Ltd, South Brisbane, Australia) Secondary structure analysis of PAX5 The unstructured and uncharacterized region of the PAX5 (isoform 5) containing a mutation at T 264, from amino acid (aa) 211 till 295 was subjected to the secondary structure analysis in the Jpred 4, Protein Secondary Structure Prediction Server 23 . The same region (aa 211 till 295) was also analyzed in the Raptor X server 24 , for the characterization of the secondary structure elements and the protein disorder propensity.

Domain Characterisation in Pax5
The domains present in the same uncharacterized region (from aa 211 till 295) of Pax5 were determined from the Pfam database at EMBL-EBI 25 .

Statistical analysis:
Statistical Analysis of the data was done by using Statistical Package for the Social Science (SPSS) software (v.20; Chicago, IL). Chi-square (χ2) was performed and genotype frequencies were tested for total Hardy-Weinberg equilibrium. Binary Logistic Regression was also used to estimate OR at 95% CI and the respective level of signi cance was estimated as p-value as the level of signi cance from confounding factors like age, gender, and BMI