Selection of Species Specic Panel of Reference Genes Across Native Livestock Species Adapted to Trans-himalayan Region of Leh-ladakh

The identication of appropriate references genes is an integral component of any gene expression-based study for getting accuracy and reliability in data interpretation. In this study, we evaluated the expression stability of 10 candidate reference genes (GAPDH, RPL4, EEF1A1, RPS9, HPRT1, UXT, RPS23, B2M, RPS15, ACTB) in peripheral blood mononuclear cells of livestock species that are adapted to high altitude hypoxia conditions of Leh-Ladakh. A total of 37 PBMCs samples from six native livestock species of leh-Ladakh region such as Ladakhi cattle (LAC), Ladakhi yak (LAY), Ladakhi donkey (LAD), Chanthangi goat (CHG), Double hump cattle (DHC) and Zanskar ponies (ZAP) were included in this study. The commonly used statistical algorithms such as geNorm, NormFinder, BestKeeper and RefFinder were employed to assess the stability of these RGs in all the livestock species. Our study has identied different panel of reference genes in each species; for example, EEF1A1, RPL4 in Ladakhi cattle; GAPDH, RPS9, ACTB in Ladakhi yak; HPRT1, B2M, ACTB in Ladakhi donkey; HPRT1, B2M, ACTB in Double hump camel, RPS9, HPRT1 in Changthangi goat, HPRT1 and ACTB in Zanskar ponies. To the best of our knowledge, this is the rst systematic attempt to identify panel of RGs across different livestock species types adapted to high altitude hypoxia conditions. In future, the ndings of the present study would be quite helpful in conducting any transcriptional studies to understand the molecular basis of high altitude adaptation of native livestock population of Leh-Ladakh.


Introduction
In recent years, high-throughput techniques such as serial analysis of gene expression (SAGE), expressed sequence tag (EST), microarray and RNA-seq have been widely employed to study the gene functions and understand the transcriptional regulations in humans, animals as well as plants [1][2][3][4][5] . However, the high throughput expression data requires validation using real-time quantitative polymerase chain reaction (qPCR). The qPCR technique because of its dynamic range, scalability, sensitivity, and reproducibility has always been considered as precise technique to estimate the relative abundance of mRNA transcripts in any cell types [6][7][8][9] . However, in order to perform appropriate gene expression analysis, it has become mandatory to select stable reference genes (RGs) that can normalize provide accurate and reliable qPCR results for each and every experimental condition 10 . This has become the most popular approach to normalize the qPCR-based gene expression data as evident from numerous publications across mouse [11][12][13] , human 5,14 plants 15,16 and livestock species [17][18][19][20][21] . Lack of appropriate RGs can greatly compromise the reliability of qPCR due to technical variations or errors arises during sample preparation, like quality and starting amount of RNA, e ciency of reverse transcription, e ciency of PCR and errors during pipetting 22 . All these technical variations will affect both the target genes as well as selected panel of RGs. Therefore, it's important to normalize the gene expression data by identifying suitable RGs or internal control genes (ICGs) in order to obtain an accurate and reliable gene expression data. Identi cation and validation of appropriate RGs has thus become an essential component in any gene expression studies wherein RGs are exposed to the same experimental conditions as target genes 23 .
Ladakh, the newly formed union territory in the northern most region of India bordering China and Pakistan is one of the world's highest inhabited region (3,500-5,500 m above sea level) surrounded by snow-capped Himalayan, Zanskar and Karakoram ranges. The cold-arid desert of Ladakh is characterized by harsh climatic conditions such as extreme temperature variations, ranging from -40 o C in winter and 35 o C in summer; low humidity (25-40%), low precipitation (80-300 mm) and low oxygen level (nearly 60-70% of the oxygen concentration at sea level); high UV radiations and wind erosion.
In such a di cult terrain of Ladakh, where land resources are meager, animal wealth plays an important role in the life of the local people. Ladakh is blessed with several unique native animal genetic resources such as yak, cattle, dzomo, dzo, goat, sheep, donkeys, horses, and double hump camel. Each of these species living has a unique ability to adapt themselves to chronic hypoxia and low ambient temperature. The economy of local people is mainly dependent on these livestock species.
The native cattle known as "Ladakhi cattle" (Bos indicus) is a unique germplasm having excellent adaptation potential to high altitude hypobaric stress. In spite of extreme climatic conditions, subsistence on poor quality feed and low availability of water, it provides around 2.5-4.5 kg of milk and thus serves as an important source of animal protein for the local people, especially during lean winter period. Similar to Ladakhi cattle, local yak population (Bos grunniens) are also the major resources of milk and milk products for the local people. The locally made butter and churpi from yak and local cow are always in high demand in the local market. The Ladakhi goat (Capra hircus) commonly known as Changthangi goat, or world famous pashmina goat is mainly reared for meat, milk & ber (Pashmina and Mohair), hide and skin. Ladakhi donkey (Equus asinus) and Zanskari ponies (Equus caballus) is yet another important animal genetic resources in the region that serves as an important pack animal for the local people and Indian army. The Zanskari ponies are medium size mountain horse and very well adapted to work at higher altitudes. Another unique species; double hump camel (Camelus bactrianus) is quite popular amongst tourists especially for safari in world famous cold desert stretch of Nubra valley region of Ladakh. In last ve years, our group at ICAR-NBAGR, Karnal in collaboration with DRDO-DIHAR, Leh and Animal Husbandry Department, Leh has initiated the efforts to characterize the livestock breeds of Ladakh. Under this programme, the native cattle, yak and donkey populations of Ladakh have been studied for both phenotypic and genotypic characterization. We have also made efforts to identify genes and pathways responsible for high altitude adaptation in Ladakhi cattle 24,25 . Each of these species has developed effective mechanism to survive at high altitude and low oxygen condition. Under such adverse climatic conditions, the survival and performance of exotic breeds is not a viable option in hypoxia condition. It only allows the well adapted animal genetic resources to thrive and perform. Therefore, understanding transcriptome signatures and identifying genes highly abundant across all these species will provide strong clue on molecular mechanism operating at transcriptional level in response to abiotic hypoxia stress across these species. By making such advancements, not only these resources will be characterized and documented but will also help to understand these unique animals production attribute in a better way for future exploitation and overall improvement. As a step forward, the present study was designed to identify and select panel of stably expressed RGs for future transcriptional studies in each of the six livestock species of Ladakh.  25,19,17,18,27,33,34,32 . It is now evident that set of RGs that perform well in one particular condition or species may not work well in other experimental conditions or other species. Therefore, in the present study, an effort was made to evaluate and identify panel of appropriate RGs in peripheral blood mononuclear cells (PBMCs) of six livestock species well adapted to high altitude region of Leh-Ladakh viz., Ladakhi cattle, Ladakhi yak, Ladakhi donkey, Changthangi goat, Zanskar ponies and double hump camel. All these livestock species are native of Leh and Ladakh and have been naturally selected not only to sustain but perform and reproduce well under high altitude hypoxia stressful conditions. The 10 candidate RGs that were evaluated in the present study were; glyceraldehyde 3phosphate dehydrogenase (GAPDH), beta-Actin (ACTB), ubiquitously expressed transcript (UXT), ribosomal protein S15A (RPS15A), beta 2-microglobulin (B2M), ribosomal protein L-4 (RPL4), ribosomal proteinS18 (RPS18), ribosomal protein S9 (RPS9), ribosomal protein S23 (RPS23), hydroxymethylbilane synthase (HMBS), hypoxanthine phosphoribosyl transferase (HPRT1).

Speci city, expression abundance and coe cient of variation of individual RGs
In the present study, an effort was made to identify the appropriate RGs in all the major livestock species that are native of Leh-ladakh region viz., Ladakhi cattle (LAC), Ladakhi yak (LAY), Ladakhi donkey (LAD), Double hump camel (DHC), Changthangi goat (CHG), Zanskar ponies (ZAP). The speci city of each primer pair was con rmed by the speci c ampli cation checked in agarose gel and presence of single peak in melt curve analysis. The correlation coe cient (R 2 ) and ampli cation e ciency (E) for individual primer pair in each of the six livestock species are given in Table 1. The expression abundance of individual RGs in each species is shown in Box Whisker plot (Fig. 1a-  The geNorm analysis ranked candidate reference genes as per their mean expression stability value (M value) which was below the threshold value of 1.5 for all the 10 RGs. The ranking order based on M value were EEF1A1=RPL4> RPS23> RPS9> UXT> B2M> GAPDH> RPS15> HPRT1> ACTB (Fig. 2a). The M value ranged from 0.147 (EEF1A1) to 0.689 (ACTB). The lower M value indicates higher expression stability while higher M value indicates lower expression stability. On the basis of M value, EEF1A1=RPL4 RG pair was most stable expressed while ACTB was least stable. Another parameter that was evaluated by geNorm was the pairwise variation Vn/n + 1 in order to calculate the optimal number of RGs to be required for normalization. The pairwise variation (V) score of all the RGs were below 0.15 (Fig. 2b) which is an ideal pairwise recommended score 35 . Therefore, as per V value, combination of two RGs could be suggested to normalize the qPCR data in PBMCs of Ladakhi cattle.
In NormFinder analysis as well, the ranking stability of individual RGs were decided by the lower values indicating higher stability. In LAC, Norm nder analysis resulted in same panel of stable RGs (EEF1A1, RPL4, UXT, RPS23,) as identi ed in geNorm analysis. On the other hand, ACTB, HPRT1, RPS15 RGs were identi ed as least stable. The ranking order from most to least stable RGs was as follows: EEF1A1>RPL4> UXT> RPS23> RPS9> B2M> GAPDH> RPS15> HPRT1> ACT ( Fig. 2c & Table  3).    The M value for all the 10 genes in geNorm analysis were found to be within acceptable range in LAY. The ranking order of RGs was GAPDH=RPS9> ACTB> RPS23> HPRT1> UXT> B2M> EEF1A1> RPL4> RPS15 (Fig. 2d). GAPDH and RPS9 showed higher gene expression stability with M value of 0.223 followed by ACTB, RPS23 and HPRT1 with M value of 0.386, 0.507, 0.595 respectively (  Table 7). The high correlation values for these genes indicated their reliability as RGs, The GAPDH, RPS9 and ACTB were termed as best RGs on the basis of highest correlation value and less SD.   In Ladakhi donkey as well, the geNorm analysis showed mean expression stability values of 10 RGs within the acceptable range and varied from 0.250 (HPRT1= B2M) to 1.405 (RPS9) ( Table 3). The stability ranking of RGs was: HPRT1=B2M> RPS23> ACTB> EEF1A1> GAPDH> UXT> RPL4> RPS15> RPS9 (Fig. 2g). The B2M and HPRT1 RGs showed highest expression stability with lowest M value while RPS9 and RPS15 RGs showed least expression stability with highest M value. Based on pair-wise variation analysis (V value), V3/4 combination (B2M HPRT1 and RPS23) with V value of 0.142 was found to provide the most accurate normalization in Ladakhi donkey (Fig. 2h). In Norm nder analysis as well; HPRT1 (0.123), B2M (0.324) and ACTB (0.518) were most stable with lowest values (Fig. 2i). On the other hand, the RPS9 (1.912), RPS15 (1.377), and RPL4 (1.366) RGs on the other hand were least stable.
The BestKeeper analysis showed ACTB gene to be most stable with the lowest crossing point SD value of 0.295. This was followed by HPRT1, RPS23 and B2M RGs with SD value of 0.311, 0.318, and 0.472, respectively. On the other hand, RPS9 with highest crossing point SD value of 1.635 was found to be the least stable (Table 8). In addition, the inter-gene relation for 10 RGs pairs was also estimated. B2M/GAPDH (r=1.0), HPRT1/B2M (r=0.985), HPRT1/GAPDH (r=0.985), B2M/EEF1A1 (r=0.855) and EEF1A1/GAPDH (r=0.854) showed the strong correlation coe cients ( Table 9). The highly correlated RGs were combined into BestKeeper index, and the correlation between each candidate RGs and BestKeeper was estimated. The relationship between RG and BestKeeper was described in terms of Pearson correlation coe cient (r), coe cient of determination correlation between BestKeeper and RGs was observed for HPRT1 (r=0.942) and GAPDH (r=0.941) followed by B2M (0.940) and EEF1A1 (0.927) genes. The statistically signi cant correlation shown by RGs (HPRT1, B2M) with the BestKeeper index appeared to be consistent with their evaluation as assessed by geNorm and Norm nder. RefFinder was another tool, were evaluating and identi ed RGs from comprehensive data set. HPRT1, B2M and ACTB were most stable and RPS9, RPS15 and GAPDH were least stable genes identi ed by RefFinder in LAD.   The geNorm analysis of all the 10 candidate RGs in Changthangi goat exhibited mean expression stability (M) values well below 1.5 ( Table 3). The stability ranking RGs were in the following order; RPS9=HPRT>ACTB>RPS23>EEF1A1>UXT >GAPDH>RPL4>RPS15>B2M (Fig. 2j) Further, the pair-wise variation analysis provided within the acceptable limit on sequential addition of another gene to the two most stably expressed genes, viz., B2M and HPRT1, the pair-wise combination V2/3 gave the acceptable V value of 0.143 (<0.15) suggesting that the geometric mean between RPS9, HPRT1 and ACTB is optimal for data normalization in Changthangi goat (Fig. 2k). Similar to geNorm, Norm nder also identi ed RPS9 (0.310), RPS23 (0.477), ACTB (0.486) and HPRT1 (0.740) as most stable and B2M (2.517) and RPS15 (2.015) as least stably expressed genes (Fig. 2l & Table 3). There was good agreement between geNorm and Norm nder outcome, albeit slight variation was observed in the ranking of RGs. The BestKeeper algorithm showed consistent expression levels for all the RGs. RPS9 (0.176), exhibited low SD and 0.422 correlation coe cients in BestKeeper analysis, pointing towards their expression stability (Table 10). Additionally, RPS9/ACTB (r=0.974), B2M/RPS23 (r=0.801), B2M/EEF1A1 (r=0.739), and RPS23/GAPDH (r=0.712) showed the strong correlation coe cients (Table 11). B2M (0.847) showed the high correlation value but they showed the high fold change thus their reliability as a RGs is not applicable. RefFinder were identi ed the overall ranking of the gene.    for all the RGs were within the acceptable limit of <1.5. On the basis of relative expression stability and stepwise exclusion, the ranking order of RGs was: B2M=RPS9 > HPRT1> ACTB> RPS23> EEF1A1> UXT> GAPDH> RPL4> RPS15 (Fig. 2m). The expression of RPS9 and B2M RGs with lowest M values of 0.600 were found to be most stable while RPL4 and RPS15 RGs with highest M values of 1.180 and 1.352, respectively were found to be least stable RGs in DHC. Based on pair-wise combination, the V values for V3/4, V5/6 and V6/7 and were close to the threshold value of 0.15. Therefore, the combination of V3/4 with ACTB, HPRT1 and B2M RGs should provide the accurate normalization of qPCR data in DHC (Fig. 2n).
In BestKeeper analysis, HPRT1 gene with the lowest crossing point SD value of 0.203 was found to be most stable. This was followed by EEF1A1, ACTB and RPS23 genes with SD values of 0.294, 0.377, and 0.462, respectively (Table12). On the other hand, RPS15, RPL4 and GAPDH RGs with high crossing point SD values of 1.61, 1.54, 0.86 respectively were found to be least stable. Strong correlation was observed in inter gene relationship of the RGs RPL4/UXT (r=0.908), HPRT1/UXT (r=0.884) and HPRT1/B2M (r=0.755) ( Table 13). The relationship between RGs and BestKeeper was described in terms of Pearson correlation coe cient (r), coe cient of determination correlation between BestKeeper and RGs was observed for HPRT1 (r=0.797) and B2M (r=0.809) followed by UXT, RPS9 and ACTB gene.
In Norm nder analysis ranking of genes in high altitude ZAP from most stable to least stable was as follows:  (Table 14). The best correlation between RGs and BestKeeper was observed for B2M (r=0.947) and ACTB (r=0.662) ( Table 15). The high correlation values for these genes indicated their reliability as RGs.    hours of blood sample collection. The density gradient centrifugation procedure adopted for puri cation of PBMCs has been described in one of our previous publication 25 . The entire work ow of the experiment was showed in Fig. 3. Puri cation of total RNA and cDNA synthesis For isolation of total RNA, the puri ed PBMCs were suspended in 1.0 ml Trizol reagent (Thermo Fisher Scienti c, USA). After homogenization, the standard protocol based on chloroform and isopropanol extraction was followed to isolate the total RNA.
The total RNA was further puri ed by employing silica-membrane RNeasy spin columns (Qiagen, Germany) along with on column digestion by DNase enzyme (Qiagen, Germany). The concentration and purity of extracted was measured using Nano view plus (Biohrome Spectros, USA). The integrity of each RNA sample was also con rmed by presence of 28S and 18S ribosomal bands on 1.5% agarose gel.
cDNA synthesis and real time quantitative PCR (qPCR) The rst strand cDNA synthesis was carried out using RevertAid First Strand cDNA Synthesis Kit (Thermo Fisher Scienti c, CA, USA). First strand cDNA was synthesized using 200 ng of puri ed RNA, oligo-dT (18) primer, dNTP mix, random primers, RiboLock TM RNase inhibitor, M-MuLV reverse transcriptase supplied with RevertAid First Strand cDNA Synthesis kit (Thermo Scienti c, CA, USA). The reaction for cDNA synthesis was set up using the program: 25˚C for 5 min, 50˚C for 60 min, and 70˚C for 15 min. The cDNA sample was diluted 1:4 (v:v) with DNase⁄RNase-free water. Before subjecting for qPCR reactions, each of the cDNA samples was ampli ed using GAPDH in a semi-quantitative PCR. This step was done to ensure the quality of all the 37 rst strand cDNA synthesized from PBMCs of 6 livestock species. The ampli ed products were checked on 2.5% agarose gel to ensure speci c ampli cation. A total of 10 potential candidate RGs viz., GAPDH, ACTB, RPS9, RPS15, RPS23, B2M, EEF1A1, RPL4, UXT and HPRT1 were evaluated in this study. The purpose of evaluating the stability ranking of these 10 RGs was to provide most appropriate panel of RGs in each of six livestock species of Leh-Ladakh so that any future transcriptional data could be normalized accurately. All relevant details like gene name, primer sequences, melting temperature etc. are tabulated in Table 1.
The qPCR reactions were performed in a nal volume of 10 µL containing 4 µL diluted cDNA combined with 6 µL of master mix composed of 5 µL Maxima SYBR Green/ROX qPCR master mix (2X) (Fermentas Thermo, USA), 0.4 µL each of 10 µM forward and reverse primers, and 0.2 µL DNase/RNase free water. All the reactions were performed in duplicate along with sixpoint standard curve along with non-template control with following ampli cation conditions; 2 min at 50ºC, 10 min at 95 ºC, 40 cycles of 15 s at 95 ºC (denaturation) and 1 min at 60 ºC (annealing+extension) in a Step one plus real time PCR instrument (ABI, California). For standard curve of each primer pair, vefold serial dilution was made using pooled cDNA samples. The qPCR expression data for each gene was extracted in the form of crossing points and data was subjected for subsequent analysis.

Identi cation of reference genes and statistical analysis
In order to evaluate the expression stability of RGs in individual species, 10 candidate genes viz., GAPDH, ACTB, RPS9, RPS15, RPS23, B2M, EEF1A1, RPL4, UXT and HPRT1 from different functional categories were selected. four independent statistical approaches viz. geNorm 35 , Norm nder 36 , BestKeeper 37 and RefFinder were used to identify most stable RGs.
The geNorm software measure the expression stability as M value which is based on overall pairwise comparison among the reference genes. The M value is inversely correlated to gene expression stability and ranks the RGs accordingly. In addition, pair wise variation analysis (V values) was also carried out using geNorm software to select optimal number of RGs to be used for normalization of target gene data. NormFinder algorithm determined the optimal RGs and the combination of two genes for a two-gene normalization factor with its corresponding stability value. The BestKeeper analysis is based on pairwise comparisons of raw cycle threshold (Ct), values of each gene. The result of BestKeeper analysis is displayed as standard deviation (SD) and coe cient of variance (CV). BestKeeper software calculated the descriptive statistics of every candidate gene and excludes the genes having standard deviation (SD) greater than 1, lower the standard deviation more is the stability of genes.
The data was analysed by direct comparing the Ct values in geNorm and NormFinder. The relative Ct values based on comparative Ct-method were the input data for geNorm and Norm nder 35,38 wherein, the average Ct value of each duplicate reaction was converted to relative quantity data [transformed using comparative Ct method as E ciency ( minimum Ct − sample