Fetal Gene Variants Associated With Birth Weight Protection in a Native High Altitude Ladakhi Population

Pathological low birth weight ‘fetal growth restriction’ (FGR) is an important predictor of adverse obstetric outcomes including stillbirth. It is more common amongst native lowlanders when gestating in the hypoxic environment of high altitude, whilst populations who have resided at high altitude for many generations are relatively protected. Genetic study of pregnant populations at high altitude allows for exploration of the hypoxic inuence on FGR pathogenesis. Pregnant women were recruited from Sonam Norboo Memorial Hospital, Ladakh between February 2017-January 2019 in this study. Principal component, admixture and genome wide association analysis (GWAS) were applied on umbilical cord blood DNA samples from 316 neonates, to explore ancestry and the genetic inuence on low birth weight. Our ndings support Tibetan ancestry in the Ladakhi population, with subsequent admixture with neighboring Indo-Aryan populations. Fetal growth protection was evident in Ladakhi neonates. Seven loci from ve different genomic regions (ZBTB38, ZFP36L2, HMGA2, CDKAL1, PLCG1) previously associated with birthweight, were likewise similarly associated here. In summary, the Ladakhi population show evidence of enrichment of variants in genes that may help mitigate altitude-associated fetal growth restriction, supporting novel biological pathways and therapeutic targets for FGR, worthy of further investigation.


Introduction
Birth weight is a complex trait driven by metabolic, vascular and immune interactions between mother and fetus. Successful placental and fetal growth leads to appropriate birth weight, reduced neonatal morbidity and ultimately success of a species [1][2][3]. Fetal growth restriction (FGR), de ned as an inability for a fetus to reach its growth potential, leads to lower birth weight and compromised neonatal survival [1,[4][5][6].
Studies support a genetic role in in uencing birth weight [7]. Genome wide association studies (GWAS) have identi ed approximately 70 single nucleotide polymorphisms (SNPs) of robust in uence [8], although these studies have been primarily based around European ancestry cohorts of normal birth weight. Understanding the genetic contribution to in-utero fetal growth, particularly where signi cant genetic adaptation may have protected birth weight in populations at greater risk of FGR, might suggest new interventions to mitigate FGR in other situations.
Although FGR is more prevalent in those residing above 1500m (high altitude, where oxygen availability is reduced) [4,9,10], populations who have resided at high altitude for many generations appear to be protected [11,12]. For example, at altitudes above 3000m, babies born to Tibetan women are more than 500g heavier than those born to native lowland Han Chinese [13]. Similarly, in La Paz, Bolivia (3,600m), native Andean babies are born heavier than their European counterparts [14].
To date, genetic studies of birth weight at high altitude have primarily focused on maternal genotypes. The maternal PRKAA1 gene locus (coding for AMPK, a central regulator of cellular energy metabolism) has been associated with birth weight and maternal uterine artery diameter in high altitude Andean residents [15]. Less is known about fetal genotype and its association with birth weight at high altitude in Tibetan ancestry populations.
Ladakh, in the Jammu and Kashmir region of India, lies between the Karakoram and Himalayan mountain ranges. The Ladakhi population that have been resident at ≥3400 meters altitude for many generations. The term "La-dvags" in Tibetan means "land of high passes" and "Ladakh" is the Persian spelling [16]. The highland Ladakh region connects South Asia with the Tibetan plateau via the ancient trade corridor, "the Silk Road". Genetic adaptation to the hypoxia of high altitude has now been well described [17]. The Ladakhi population, although sharing similar linguistic, cultural and religious practices with Tibetans [16,18], are very poorly studied at the level of population structure and genetic selection. Sonam Norboo Memorial Hospital in Leh [3,540m], provides maternity care for the region of Ladakh. It has a very high institutional birth rate of more than 90% 19 making it a unique site to study a pregnant population at high altitude.
We hypothesized that the fetal genotype of native Ladakhis would be enriched for gene variants which protect in-utero growth despite the adverse effects of high altitude gestation. We rst performed a genomic structural survey of the population. Then, in those subjects identi ed as distinctly Ladakhi from the genomic survey, we investigated fetal genetic elements that associated with birth weight. Committee (3634/002). All methods were performed in accordance with the relevant guidelines and regulations and informed consent was obtained from all participants and/or their legal guardians. The research was performed in accordance with the Declaration of Helsinki.

Methods
Pregnancies were included if both parents were aged over 18 years and unrelated ( rst cousin or closer), were having a singleton pregnancy and planned to have their baby at the hospital. Women were recruited if the estimated due date could be calculated from last menstrual period or dating ultrasound. Those pregnancies where the fetus had a clinically obvious fetal structural or chromosomal abnormality were excluded. Women completed a questionnaire documenting their social (including nutritional), family, obstetric and medical histories (including smoking, alcohol, chronic medical problems, and medications). These data have already been reported by our group [28]. Geographical ancestry was recorded for more than three generations for both parents.
Following birth and delivery of the placenta and umbilical cord, at least three 1ml aliquots of residual whole umbilical cord blood with a separate serum sample were frozen immediately at -80 o C. Information concerning the birth process (mode of delivery) and the neonatal characteristics routinely collected by the hospital (sex, weight, head circumference, crown-heel length and APGAR score) were recorded.
Cord blood from studied offspring (and whole blood from their parents for future testing) was then shipped to Delhi and genomic DNA extracted at the Institute for Genomics and Integrative Biology, Delhi using Qiagen kits and validated, standardized protocols. Common bi-allelic SNPs were genotyped in samples using Illumina Global Screening Array SNP-microarray technology. Genotype assignment from the microarray uorescence data was performed using Illumina's Genome Studio software.
We combined the genotypes of 316 Ladakhi individuals with reference individuals from surrounding populations living at high and low altitude. Human Genome Diversity project 35 datasets were included for the following different populations; Burusho, Han, Hazara, Indo Aryan, Japanese, Yakut Siberian. Similarly other population datasets included in this study were Munda [36], Sherpa [21], Tibetan [37] and Tibeto Burman [38]. These individual cohorts were merged using autosomal SNPs found in common between all separate datasets. This merged dataset was then processed through several quality control steps using the software PLINK 1.9 [39,40]. We included only individuals or SNPs that had < 5% missing genotypes, SNPs with a minor allele frequency (MAF) >1%, and SNPs with a HWE at signi cance of >1e -6 . Furthermore, we calculated closely related individuals using King 41 , pruning one individual from a pair related by 2nd degree or closer. This left a nal dataset of 1,413 individuals. Due to the variety of genotyping platforms between the different population references, the nal set of common SNPs contained 60,280 markers.
We calculated principal components using PLINK [39], rst pruning SNPs in linkage disequilibrium using PLINK's --indep-pairwise command using a window size of 1000, moving 50 SNPs, and using an r 2 threshold of 0.2 -leaving 41,718 SNPs in approximate linkage.
We conducted a genetic clustering analysis of Ladakhi populations with its neighboring populations using ADMIXTURE v1.2 [42], estimating individual ancestry proportions in an unsupervised analysis. The same cohort assembled in PCA was used for ADMIXTURE analysis, i.e., non-missing autosomal SNPs, individuals ltered for relatedness, and SNPs pruned for linkage disequilibrium. Unsupervised ADMIXTURE analysis was carried out over k=2-4 populations.
We further investigated evidence of admixture within the Ladakh population's history by applying fstatistics 20 testing the strength of evidence that the Ladakh population is admixed between pairs of neighboring populations using the f 3 implementation within ADMIXTOOLS 2 (https://uqrmaie1.github.io/admixtools/index.html -a manuscript describing ADMIXTOOL 2 is currently under preparation by its authors). The same cohort of individuals and SNPs that were used for PCA and ADMIXTURE analysis was used for f 3 -statistic work.
We detected Runs of Homozygosity (ROH) [43] in Ladakhi individuals and compared population-averages to individuals from neighboring lowland and highland populations using PLINK's --homozyg command, with the following speci c parameters; --homozyg --homozyg-window-snp 50 --homozyg-snp 50 --homozyg-kb 1500 --homozyg-gap 1000 --homozyg-density 50 --homozyg-window-missing 5 --homozyg-window-het 1. This was carried out on a subset of the individuals described in principal component, ADMIXTURE, and f 3 analyses, selecting individuals who were genotyped on SNP-microarray chips with an overlap of common SNPs >100K in number to faithfully detect ROH. This subset included individuals from the Indian Indo-Aryan, Ladakhi, Sherpa, Tibeto-Burman Bhutanese, Tibeto-Burman Nepalese, or Tibetan population labels. Common SNPs between these individuals were ltered according to the same parameters as in the PCA, leaving 117,044 common SNPs.
Genome-wide association tests were conducted using 601,887 genotyped SNPs. Linear regression was applied, implemented in PLINK v1.9 with the--linear command. We performed an additive genetic model adjusting sex and the rst four principal components obtained from genome-wide SNP data. Gene variants associated with birthweight in the Ladakhi were compared to variants in the Global Biobank Engine [23] to con rm their association with birthweight in other populations.
Sample size calculations using algorithms incorporated in GWApower [44] were based on mean birth weights obtained from audit data collected in Leh prior to the study commencing. Calculations identi ed the study had an 80% power to detect a variant that would explain 11.4% (or greater) of the variability observed in birth weight given a discovery sample size of 300 neonates and alpha level of 5 x 10 -8 .

Results
Over a two year period (February 2017-January 2019), all women who presented for antenatal care at Sonam Norboo Memorial Hospital in Leh were approached to take part in the study. In total, 316 families were recruited. Maternal and fetal baseline characteristics are summarized in Table 1. Mean birth weight was 3.18kg (mean birth weight centile 44.5th ) in term individuals (>37 weeks' gestation). According to intergrowth charts (https://intergrowth21.tghn.org), 14% of infants were born at less than the 10th centile (which de nes a small for gestational age [SGA] newborn) and only 7.8% of all infants were born with a low birth weight (LBW, de ned as less than 2.5kg). The average age for pregnant women was 28.9 years and 37.9% (120/316) were primigravida. The majority of subjects (306/316; and 96.8%) reported >3 generations of Ladakhi ancestry. The ten women who reported a Tibetan ancestry all gave birth at term and with a mean birth weight of 3.6kg. Preterm birth was infrequent (15/316 [4.7%] infants were born before 37 weeks). Amongst these, birth weight was, as expected, lower (mean 2.4kg) than those born at term (mean 3.2kg) [P=0.0001] although birth weight centile was similar (52nd vs 45th P=0.324) suggesting appropriate growth for gestational age. Recognizing that being preterm would confound birth weight if used in isolation, for further analysis we instead used birth weight centiles as a more useful measure of growth potential, given that it adjusts for gestational age. [n=306]

Reconstruction Of Ladakhi Population History
To optimize our GWAS, we rst sought to understand the population history of the Ladakhi population. We compared Ladakhi with other surrounding populations based on major language groups including Tibeto-Burman (Tibetans, Sherpa, Ladakhi), Indo-European (Indo-Aryan, Hazara), Austro-Asiatic (Munda) and isolate languages (Bursushaski) (Fig. 1)  We further contextualized our population structure analysis using the maximum-likelihood estimation of individual ancestries using ADMIXTURE (Fig. 2). At k=2, the two ancestral components, are maximized in either Himalayan or the East Asian Han populations (in red), and Indo-Aryan Indians (green). We observe that Ladakhi individuals are modelled with a slightly higher proportion of the East Asian ancestry than South Asian ancestral component, in agreement with their average location on principal component 1. At k=3, one ancestry component (represented by blue in Fig. 2) (Fig. 1). In summary, the data supported Ladakhi as a genetically distinct Himalayan population closer to Tibeto-Burman population and Tibetan populations than to Indo-Aryan.
We next performed two additional analyses to further con rm that the Ladakhi are admixed between lowland Indo-Aryan and highland Tibeto-Burman sources. Firstly, we leveraged f-statistics [20] in the form of f 3 (X; A, B) where X is tested for evidence of admixture between sources A and B, where a negative f 3statistic is indicative of admixture. We placed Ladakhi as X, and tested combinations of lowland/Indo-Aryan populations (Burusho, Indo_Aryan, Hazara, Munda) as X and highland/East or North Asian populations (Han, Japanese, Sherpa, Tibetan, TB_Bhutanese, TB_Nepalese, Yakut_Siberian) as Y, reporting those f 3 results with an absolute Z score >3 (Fig. 3). The strongest evidence of admixture (most negative f 3 -statistic) is between Tibetan or Sherpa sources and Indo-Aryan or Burusho sources -supportive of results from PCA. Additionally, we performed Runs of Homozygosity (ROH) detection using PLINK, comparing Ladakhi to a subset of neighbouring lowland and highland references. We detect elevated ROH (See Supplementary Fig.  S1) in the high altitude Tibeto-Burman and Sherpa populations, agreeing with previous estimates [21], but only modest levels of ROH in the Ladakhi, more comparable to the general Tibetan or Indo-Aryan labelled individuals. These modest levels of ROH would be consistent with admixture between different ancestries.

Birth Weight Analysis On Ladakhi Subjects
In order to determine if there were any genome-wide signi cant predictors of birth weight in the Ladakhi population, we performed a GWAS of birth weight in the 176 individuals, who were genetically identi ed as Ladakhi from the population study (see Supplementary Fig S2). We did not nd any signals that were genome-wide signi cant after correction for multiple testing (i.e. p<5 x 10 -8 ). However, when we looked in the tail of the association statistics from our birth weight GWAS (i.e. p=1 x10 -4 to 1 x 10 -7 uncorrected), we noted the presence of multiple variants previously associated with traits relevant to high altitude adaptation, including body mass index (rs10968576 in LINGO2) and blood related traits (rs16893892 in RP5-874C20.3, rs2298839 in AFP, rs9261425 in TRIM31 and rs362043 in TUBA8) [23].
Next, in order to determine whether the genetic architecture of birth weight in the Ladakhi population overlaps with that observed in lowland populations, we sought the association with birth weight in Ladakhis of the 70 genetic signals previously associated with birth weight in lowland populations [8]. Overall, 32 of these 70 signals were either directly genotyped or captured through linkage disequilibrium (r 2 >0.8) in our Ladakhi dataset. Seven of these 32 signals were signi cantly associated (P < 0.05, uncorrected) with birth weight in the Ladakhi GWAS study (see Table 2). These variants mapped to ve different birth weight associated genes (ZBTB38, ZFP36L2, HMGA2, CDKAL1 and PLCG1) with the direction of effect being consistent with the original discovery reporting [8] all seven cases.

Discussion
This study is the rst to undertake a comprehensive fetal genotypic exploration of a high altitude population in relation to a birth weight phenotype. We showed that the Ladakhi are more closely related to Tibeto-Burman speaking populations than to the Indo-Aryan groups of South Asia, and provide clear evidence that Tibeto-Burman expansion occurred in North East India crossing the Himalayan range. [24]. The majority of Ladakhi form a common clustering PCA, indicating the presence of a Ladakhi speci c genetic ancestry component that is intermediate between Tibeto-Burman and Indo-Aryan ancestries. Further analysis utilising f-statistics and ROH data further supports a demographic history of admixture between these sources. It suggests a distinct demographic history in the Ladakh region, plausible due to its close proximity with the Tibetan plateau. Lack of large overlap between Ladakhi genotype data and existing references limited haplotype analyses of Identity-by-Descent segments or "ChromoPainter"-based methods [25], which limited the power to analyse the demographic history of the population in more detail. Future work overcoming these technical limitations may provide further insights.
Individuals in Leh are born at a signi cantly higher birth weight than expected, based on existing literature for average birth weights in India at 2.8-3kg [26,27] and seen from our previously published work [28]. One such mechanism of protection of birth weight is through genetic selection, as evidenced in other phenotypic traits of high altitude populations [17].
Birthweight is a complex trait with potentially con icting parental inheritance patterns. As such, it is perhaps unsurprising that no one clear signal was identi ed at GWAS level, especially given the low numbers in the nal study. Study of parental DNA and genetic signals in genes responsive to the hypoxia-responsive transcription factors hypoxia inducible factors (HIFs) in relation to offspring birth weight is an obvious follow up study to further explore HIF function and high altitude adaptation.
In relation to SNPs previously associated with birth weight by GWAS, we identi ed that rs1351394 of the HMGA2 gene (encoding the high mobility group-A2 protein) was replicated in Ladakhi offspring. Highmobility group (HMG) proteins are ubiquitous nuclear proteins that binds to DNA and induce structural change in chromatin, thereby regulating gene expression [15]. HMGA2 has been associated with human height and birth weight in lowland populations [29,30] and with adipose mass variation in pigs [31], making it a biologically plausible functional candidate.
ZBTB38 encodes a zinc nger transcription factor that binds methylated DNA to enhance or repressing transcription in a way which is complex and likely cell-dependent [32]. Its expression appears to play a role in skeletal development [30]. Recent GWAS have reported SNPs in this gene to be associated with human height-perhaps through upregulation of insulin growth factor 2, a potent fetal growth factor [33]. The derived allele frequency of the associated ZBTB38 variant (rs6440006) is higher (> 15%) and more highly differentiated in Ladakhis compared with low altitude resident Han Chinese (F st > 0.01), supporting signi cant enrichment in Ladakhis residing at high altitude.
One of the strengths of this study relates to the accurate dating of gestation from last menstrual periods and early dating ultrasounds. This allowed for calculation of birth weight centiles which adjusts neonatal weight for gestational age and sex at birth, allowing for better correlation with pathological poor growth.
This study focused on fetal genotype, but it is recognized that maternal [34] and paternal genotype also have an effect. We have parental DNA, and it would be bene cial in the future to analyze inheritance patterns of these fetal gene variants of interest.
A limitation of the study was the overall number of subjects in the birth weight analysis. Despite most respondents describing themselves as 'many generation' Ladakhi, the cohort was more genetically diverse than anticipated. This limitation meant that our birth weight analysis focused only on 176 infants of the original cohort of 316 (55.7%). Our interpretations would be strengthened by the addition of more subjects, strati ed according to their genomic ancestry.
Although birth weight is a complex multifactorial trait to correlate with high altitude ancestry, this study provides evidence that variants are enriched in genes associated with height and weight along with skeletal growth and development in Ladakhi infants who attain an optimal birth weight. Further studies are warranted to con rm this in uence and better understand the functional implications of the genetic variants identi ed in relation to the hypoxic environment. This detailed knowledge may help inform new treatments in growth restriction, acting on development pathways intrinsic in reaching optimized in-utero growth.

Conclusion
We found Tibetan ancestry in the Ladakhi population, and evidence of recent limited admixture with neighboring Indo-Aryan populations. We replicated in the Ladakhi population, the effects of a subset of variants known to predict birthweight in European-descent populations. Enriched variants were focused on body anthropomorphic traits and hematological parameters. Further genomic study of high altitude infants is indicated to validate these ndings in conjunction with mechanistic study to gain insight in potential mechanisms of action that may be amenable to therapeutic intervention.
Declarations HM, DK, SH and DJW conceived the study. PD led the study in Ladakh with assistance from SH and DK. VJ and VD completed the study in Delhi and liaised with IGIB for sample processing. MM, BP and AB processed blood samples and run analysis. SB led the combined analysis and population reconstruction with EG and GLC supervising. All authors have approved the submitted version and have agreed both to be personally accountable for the author's own contributions and to ensure that questions related to the accuracy or integrity of any part of the work, even ones in which the author was not personally involved, are appropriately investigated, resolved, and the resolution documented in the literature.

Additional Information
The author (s) declare no competing interests Data Availability The datasets supporting the current study have not been deposited in a public repository yet but are available on request.  Evidence of admixture in Ladakhi between lowland or Indo-Aryan populations and highland or north/east Asian populations measured using f3 outgroup statistics. Along the x axis is the f3 statistic score, the more negative that statistic is the greater evidence of admixture. The error bars correspond to three standard errors of the estimated f3 statistic. Along the y axis are different pairs of putative source populations for an admixture event which creates the modern Ladakhi population.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.