Genomic Insight Into 21 Hypervariable Molecular Markers in The Population of Lower Himalayan Geographical Province Himachal Pradesh, India

A natural geographical barrier between Tibetan plateau and south Asian countries, the Himalayan Mountain range, serves as a natural barrier between these regions for genetic ow. This has lead to vast genomic divergence among the population residing in Indian Himalayan region..This study was designed with an aim to decipher the genomic diversity and molecular characterization of 21 hypervariable molecular markers in the population of geographical province of Himachal Pradesh in the lower Himalayan region. Randomly selected 401 unrelated individuals native of the lower Himalayan geographical region were included for this study. 21 hypervariable molecular markers included in the PowerPlex ® 21 system were amplied and genotyped. A total of 246 alleles and 12.3 (SE 0.927) private alleles per locus were observed. Population differentiation analysis revealed that the studied population showed a genetic anity with the population of North India, North-west India, Central India, and Uttar Pradesh rather than the population of East India, South India, East Asia, and West Asia. Heterozygosity at the studied loci was found in the range of 0.686 to 0.920. The unite discrimination power (PD) and exclusion power (PE) was found to be 1 and 0.999999998073765 respectively. The unite value of matching probability and typical paternity index was found to be 9.33x10 -26 and 5.05x10 8 respectively for the studied population. All the tested loci were found in Hardy Weinberg Equilibrium (HWE) expectations. Overall the studied population exhibited a great extent of genomic diversity and had a greater genetic anity with Indo-European speakers rather than Dravidian and Tibeto-Burman speakers. diverse country in hierarchy of caste tribal various linguistic groups . Indian population’s Australoid, Indo-Caucasoid, Indo-Mongoloid and Negrito ethnics groups along Indo European, Dravidian, Astro-Asiatic and Sino-Tibetan speakers . The eastern Himalayan belt has the majority of Tibeto-Burman speakers whereas west and north-west belt has the majority of Indo European lingual family The genetic structure of the Indian population is broadly categorized into Ancestral North Indian (ANI) and Ancestral South Indian (ASI) population. ANI is genetically closer to the Middle East, Central Asian, and Europeans Several migration waves during the history reected the complex genetic structure of the Several genetic studies


Introduction:
The Himalayan Mountain range which runs from west-northwest (Pakistan) to east-southeast (Burma) in an arch of 2400 km 1 acts as a geographical barrier for gene ow between the Tibetan plateau and the South Asian subcontinent 2 . These geographical conditions of the Himalayan landscape lead to unique genetic characteristics in that region. The North and North-east geographical region of India lies in the foothills of the Himalayan Mountains and harbors genomic diversity in the populations of this regions 6 . Genomic 3,4 , and lingual studies 5 have revealed initial settlement of modern men in the Tibetan plateau about 25000-30000 years ago 2 . It has been well established that India is a highly diverse country aligned in the hierarchy of caste and tribal populations and various linguistic groups 7,8 . Indian population's is broadly divided into Australoid, Indo-Caucasoid, Indo-Mongoloid and Negrito ethnics groups along with Indo European, Dravidian, Astro-Asiatic and Sino-Tibetan speakers 9 . The eastern Himalayan belt has the majority of Tibeto-Burman speakers whereas west and north-west belt has the majority of Indo European lingual family 10 . The genetic structure of the Indian population is broadly categorized into Ancestral North Indian (ANI) and Ancestral South Indian (ASI) population. ANI is genetically closer to the Middle East, Central Asian, and Europeans 11 . Several migration waves into India during the history re ected the complex genetic structure of the Indian populations. Around 4000 years ago, the last major migration wave of Indo-European speakers attributed to the high degree of genetic complexion 12 . Several genetic studies based on the Short tandem repeats (STRs) molecular markers have been reported with reference to Indian population to understand the diversity in terms of ethnicity, cultural, lingual, and geographical a nity 9,8,13 .
STRs are the highly acceptable and gold standard markers in the forensic application due to their highly polymorphic nature, heterozygosity, short sequence length along with wide distribution throughout the human genome 14,15 . However, mutation rates in the STR markers are higher than the single nucleotide polymorphisms (SNPs) 16 , but the STR markers are very useful and potential markers for the genealogical studies as well as forensic application 17 .
To understand the genetic diversity in the lower Himalayan geographical province Himachal Pradesh, 15 autosomal STR markers 18 and 17 Y-STR marker 19 have been used but it is not rationale in the terms of molecular markers and effective sample size. Therefore, the present study was undertaken to explore the genetic structure of the admixed population of the North Indian Himalayan region, Himachal Pradesh using 21 hypervariable STRs.

Material And Methods:
Samples: 401 unrelated healthy individual volunteers residing in the geographical province of Himachal Pradesh ( Fig.1) were selected randomly for this study. About 1 ml of peripheral blood was collected from these volunteers in the EDTA vials following the guideline described under the declaration of Helsinki 20 and stored at -20 o C till further processing. DNA Isolation: Genomic DNA was isolated from the selected samples using Phenol Chloroform Isoamyl Alcohol (PCIA) organic extraction method as described by Sambrook et al., 1989 21 .
DNA Quantitation: Isolated DNA was quanti ed with Quanti ler ® Duo DNA Quanti cation Kit (Thermo Fisher Scienti c, CA, USA) using RT-PCR 7500 platform as per recommendations of the manufacturer.
Ampli cation:1 ng of genomic DNA was used to amplify the 21 STR markers included in the PowerPlex® 21 system (Promega, CA, USA) using 9700 thermal cycler (Thermo Fisher Scienti c, CA, USA) as per manufacturer's protocol except for 10 µl reaction volume.
Genotyping: The ampli ed fragments were subjected to size based separation through capillary electrophoresis using Genetic Analyzer 3500XL (Thermo Fisher Scienti c, CA, USA). 1 µl amplicon was diluted in the 10 µl mixture volume of HiDi Formamide (Thermo Fisher Scienti c, CA, USA) and WEN ILS 500 (Provided along with kit) and denatured as per recommended protocol. POP TM -4 and 36 cm capillary array was used while running the samples on Genetic Analyzer. Allelic ladder (provided along with kit) was used to assign allele number in the DNA pro le. GeneMapper TM IDX software v1.5 was used to analyze the genotyped data.
Quality control: To monitor quality parameter during the experiments, positive control (with DNA template of 2800M) and negative control (without DNA template) was used. The authors also quali ed international DNA pro ciency test conducted by GITAD, Spain (http://gitad.ugr.es/principal.htm).
Statistical analysis: Allele frequency and forensic interest parameters viz., Power of discrimination (PD), Power of exclusion (PE), Polymorphic Information Contents (PIC), Matching Probability (Pm), Typical Paternity index (TPI), Observed Heterozygosity (H obs ), Expected Heterozygosity (H exp ), Gene Diversity (GD) and Hardy Weinberg equilibrium (HWE) test were performed using GenaAlex6 22 and STRAF statistical tool 23  Genetic data of reported populations for comparison: The genetic dataset of present study was compared with the dataset of reported indigenous populations viz., Balmiki (Punjab) 29 , Sakaldwipi Brahmin (Jharkhand) 29 , Konkanastha_Brahmin (Maharashtra) 29 , Iyengar (Tamilnadu) 29 , Kurumans (Tamilnadu) 29 , Munda (Jharkhand) 29 , Chenchu (Andhra Pradesh) 30 , Lambadi (Andhra Pradesh) 30 , Naikpod_Gond (Andhra Pradesh) 30 , Yerukula (Andhra Pradesh) 30 , Munda (Chotanagpur) 31 , Santal (Chotanagpur) 31 , Oraon (Chotanagpur) 31 , Lodha (Bengal) 32 , Kora (Bengal) 32 , Maheli (Bengal) 32 , Central Indian Population 33 , Gond (MP) 34 , Gond_2 (MP) 35 , Gond_1 (MP) 29 , Oraon (Chhattishgarh) 36 , Population of Jharkhand 37 , Jat Sikh (Punjab) 38 , Baniya (Punjab) 38 , Khatri (Punjab) 38  Results And Discussion: Allelic frequencies and forensic parameters: A total of 246 alleles were found in the studied population at all the tested loci. A total of 19 alleles were observed at the loci D1S1656 and FGA and these were the highest number of the observed alleles. At the locus TH01, lowest number of alleles were observed i.e. 6 alleles. In the currently studied highly diverse population, the mean number of alleles observed were 12.3 (SE 0.927) and mean number of effective alleles were 5.911 (SE 0.073) per locus. Allele frequencies ranged from 0.001 to 0.389 and allele 8 of locus TPOX was found to be the most frequent allele among all the studied loci ( Table 2). All the loci were found under the Hardy Weinberg Equilibrium (HWE) (p<0.05). The observed heterozygosity (Hobs) ranged from 0.686 (CSF1PO) to 0.920(D1S1656) and gene diversity (GD) varied from 0.698 (TPOX) to 0.913 (Penta E). The power of discrimination (PD) and exclusion (PE) were found in range of 0.850 (TPOX) and 0.407 (CSF1PO) to 0.984(Penta E) and 0.837(D1S1656) respectively. The cumulative value of discrimination (CPD) and exclusion CPE) was found to be 1 and 0.999999998073765 respectively. The matching probability (Pm) and typical paternity index (TPI) were found in a range of 0.016 (Penta E) and 1.591 (CSF1PO) to 0.150 (TPOX) and 6.266 (D1S1656) respectively. The combined value of matching probability (coPm) and typical paternity index (coTPI) was observed as 9.33x10 -26 and 5.05x10 8 respectively for all the tested loci. An autosomal marker is considered as highly polymorphic when its discrimination power (PD) value is greater than 0.80 (PD>0.80), power of exclusion value is greater than 0.50 (PE>0.50) and PIC value is greater than 0.5 (PIC>0.5) 54,55,56 . In the present study, all the loci were highly polymorphic for the studied population because the value of PIC>0.5. Noticeably, the locus Penta E was found to be most informative in the terms of highest GD, PD, and PIC and lowest value of Pm. Locus TPOX was found to be least informative in the terms of lowest GD, PD, and PIC and the highest value of Pm among all the tested loci ( Table 3). Findings of the present study hold a prominent forensic importance as other previously reported population studies of Indo European speakers 43, 57, 40, 39, 58, . The data observed regarding the locus Penta E is highly informative among all the studied loci.

Interpopulation comparison:
Allele frequencies at the common 15 STR markers of the studied population were compared with 47 populations using Nei's Da distance to draw a Phylogenetic tree. Population genetic distance Fst was calculated between studied population and other population groups based on the geographical distance as shown in Table 4. The Fst revealed higher genetic a nity between the population of Himachal Pradesh and geographically closer population groups of Uttar Pradesh, Central India, Northwest India rather than East and South India. When compared with West and East Asian populations, the population of Himachal Pradesh had 2.5 times higher genetic a nity with the West Asian than the East Asian population. The studied population had 7 times higher genetic distances with the South Indian populations in comparison with the population of central and north India (Fig. 2). This nding observed in the present study corroborates with the earlier published study wherein Indian genetic structure was broadly divided into ANI and ASI 13 . Moreover, the present study also indicated the presence of a small genetic pool in East India which showed a degree of genetic distance from the South, North, North-west and Central Indian genetic pool 59,60,61 .

Phylogenetic Analysis:
The allelic data of the studied population was compared with the 47 reported populations of India, West and East Asia at common 15 genetic markers. The Da distance was calculated by the Neighbor-Joining method using PopTree2 software. The distance was re ected in the Phylogenetic tree using MEGA-6 software. The Phylogenetic tree revealed that the population of Himachal Pradesh was most closely related with the Rajasthani and Punjabi population of Pakistan, followed by Pashtun, Sindhi, Baloch, and Saraiki populations of Pakistan, which might be results of shared ancestry (Fig. 3). In the Phylogenetic tree, East Indian, East Asian, and South Indian populations are depicted into the separate branches. This Phylogenetic relationship indicates the similar genetic constituents among the North, North-west, Central India, and West Asia 62,53 . However geographically close North and Northeast populations' viz., Nepalese, Gorkhas, Kathmandu, Tibet, Newar, Tibetan population (Nepal) showed a higher genetic a nity with the East Asian populations viz., Manchu population of China, Korean population, Royal Kingdom of Bhutan Population and Han population of Southern China rather than the population of Himachal Pradesh. The possible reason for this genetic variation could be the geographical barrier created by to the Himalayan range for the genetic ow 2 .
Principal Coordinate Analysis (PCoA) and Heat map : The PCoA plot was drawn using Nei's Da distance matrix based on the allele frequencies of studied and compared 47 populations on common 15 STR markers. Euclidean similarity index was used to analyze principal coordinates, in which Coordinate 1 and 2 covered 96.301% of the variations. Due to a higher level of genetic variations, few populations viz., Chenchu (Andhra Pradesh) 30, Central Indian Population 33 , Jat Sikh (Punjab) 38 and Kathmandu 60 depicted in the outlier and resulted in clumping of most of the populations at one part of the plot (Fig. S1). To visualize genetic distance among the clumped populations, outlier populations were removed from the analysis, and the PCoA plot was redrawn (Fig. 4). Studied population clustered with the population of Himachal

Conclusion
This study provides a genetic evidence for an unique gene pool among the North, North-west, Central Indian populations which signi cantly differed from the East and South Indian genetic pool. The ndings of the population differentiations, Phylogenetic relatedness, PCoA, a Heat map, and structure analysis revealed that the studied population showed gradient gene ow towards West rather than South and East Indian populations. The studied 21 STR markers were found to be highly useful for the population genetics as well as population migration study. The dataset of this study will be of wide use in the preview of the recently introduced "The DNA technology (use and application) regulation bill 2019. It will enrich the existence of an autosomal STR Indian DNA database.    Figure 1 Geographical location of the studied population. Note: The designations employed and the presentation of the material on this map do not imply the expression of any opinion whatsoever on the part of Research Square concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries. This map has been provided by the authors.