Primer designing strategy for amplification and sequencing of the complete mitochondrial genome of Semnopithecus hypoleucos


 The mitochondrial genome is highly informative for evolutionary analysis of organism lineages and phylogenetic studies. The availability of robust primers for amplifying complete mitochondrial genomes is a crucial step in current mitogenome studies. However, organism specific characteristics such as variable transition to transversion substitution ratios seen in some groups pose challenges for the development of universal, or at least broadly applicable, primer pairs for this purpose. This study reports on a strategy of primer design and optimization (PDO) where regions of known mtDNA genescan be used for choosing primers for amplification, sequencing and assembly of entire mitochondrial genomes of several closely-related species. In brief, taking advantage of the circular organization of mtDNA, primers are first designed for amplification of “long” products using the 5’ region of one conserved gene and a 3’region from another conserved gene. Additional primers are then used to amplify “short” regions to fill in gaps to allow for complete assembly of the genome. We show how we were able to use this approach to successfully amplify entire mitochondrial genomes from a non-human primate species (Semnopithecus hypoleucos), and also how this provided data useful for annotation of the assembled genome data.


Background
A thorough understanding of genetic diversity is an important step for developing appropriate conservation plans for any group of organisms (Oakenfull et al. 2000). Mitochondrial (mt) DNA has become popular for these studies since it provides rich sets of information relevant to evolutionary biology, population genetics and phylogenetics through its maternal inheritance and relatively high mutational rates (Avise and Saunders 1984;Avise 1986;Dasmahapatra et al. 2010;Nabholz et al. 2019). Moreover, the high copy number and circular nature of mtDNA tends to make it less prone to degradation and therefore may provide material for complete analysis compared, for example, to nuclear sequences. These qualities have shown mtDNA to an important genetic tool in tracking large scale comparative studies of evolutionary relationships among individuals, populations and species.
Construction of phylogenetic trees is a useful tool for analyzing evolutionary relationships of genes between species. Many of these studies rely exclusively on a small part of the mtDNA, such as cytochrome oxidase subunit I(COI) (Webb and Moore 2000;Kerr et al. 2009;Khedkar et al. 2016a); cytochrome b (cyt b) Khedkar et al. 2016b), or others. Such approaches are known, however, to underestimate the in uence of variation seen in the complete mitochondrial genome on evolutionary processes (Springer et al. 2012;. For example, comparative studies of protein coding genes tend show high levels of similarity compared to non-coding regions which can be more highly variable. It is also well known that certain parts of the mitochondrial genome, such as the Dloop region evolve faster than the highly conserved 16S rRNA and 12S genes (Gerber et al. 2001). This implies that phylogenetic relationships among species are better inferred from the use of the complete mitochondrial genome sequences.
Although several complete mitochondrial genome sequences have been published (Matsui et al. 2009;Li et al. 2009;Kim et al. 2009;Ma et al. 2010;Kurabayashi et al. 2010;Finstermeier et al. 2013;Zhang et al. 2017), data for several species and/or species groups is still incomplete due to technical problems related to the availability of robust primers (Ramos et al. 2011;de Freitas et al. 2018). This is especially true for closely-related species such as some of those belonging to primate clades. In some groups such as humans, for example, high mutation rates in the mtDNA can lead to a high degree of variability between individuals (Howell et al. 1996;Wilson et al. 1985). In other primates, the transition to transversion substitution ratio was found to be high in mtDNA (Brown et al. 1982).
Generally three strategies (described below) are in use for obtaining complete mitochondrial genome sequences, but each of them still include procedural challenges (Rizzi et al. 2012), i. Direct isolation of mitochondria from the tissue following the nucleic acid puri cation and direct sequencing through an NGS platform. This method requires large quantities of tissue, and even commercial kits available may not be adequate when dealing with non-invasive methods as well as old, museum samples.
ii. Obtain total genomic DNA, and then sequence the whole genome and extract the mitochondrial genome sequences through bioinformatics procedures. A challenge here is that the bioinformatics analysis demands infrastructure and expertise that may be hard to come by.
iii. Obtain genomic DNA from speci c tissues followed by enrichment of mitochondrial DNA through PCR and primer walking. This approach requires robust primer design capable of covering the entire mitochondrial genome, or at least fragments of the genome, which can be combined to cover entire region of interest. However, for some primate species, many primers do not show applicability for cross ampli cation of mitochondrial DNA from related species. This may be a reason that very few primate mitochondrial genomes have been published to date as compare to other organisms (Roos et al. 2011).
Our study reports a method for designing primers that can be effectively applied in ampli cation of entire mitochondrial genomes of S. hypoleucos an endangered primate species in India and may this strategy can be applied to closely related primate species. Primer pairs are speci cally designed for covering both large and small segments of the mitochondrial genome which demonstrate ampli cation challenges.

Ethical Statement
We did not perform experimentation directly on any animals; therefore ethical permission was nonobligatory for this study. The authors do not have con ict of interest to declare.

Experimental outline
The owchart of the primer design and optimization (PDO) protocol is provided (Fig. 1). Some of the important steps of the PDO method are discussed in the following section.

Downloaded reference mitochondrial sequences
For the initial design of robust primer pairs, 25 whole mitochondrial genome sequences were downloaded from NCBI Genbank and other reference sequence databases (Table 1). Among 25 species studied here, 16 belonged to Colobinae family, two are from the Ponginae, two are from the Homininae, two are from Cercopithecinae, and one each from the Cebinae, Gorillinae and Hylobatidae.

Alignment of Sequences
The mt DNA sequences of these primate species were aligned using CodonCode aligner. Aligned regions longer than typical primer sequences were selected to represent conserved sequences, and forward and reverse primers were designed from them.

Primer design and testing its applicability in primate clade
Primer design is a critical part of any PCR based study. Considerations for primer design include: (i) primer melting temperature, (ii) length and GC content of the primer, (iii) resultant PCR ampli ed product length, (iv) formations of hair pin loops or other secondary structures, (v) primer speci city. In this study, twenty four primers (12 pairs) were designed using the software program Primer3 ver. 0.4.0 (Unterssaar et al. 2012) and con rmed for their quality criteria as described above using the online tool Oligocalc (Kibbe 2007) (Table 2).

Test data
This study used pre-collected and catalogued material from S. hypoleucos from the DNA repository of Paul Hebert Centre for DNA Barcoding and Biodiversity Studies, Aurangabad for testing the e ciency of newly designed primers.

Results
Using the multiple sequence alignment, primers for amplifying mitochondrial DNA from the primate species studied here were designed ( Fig. 1; Table 2). As shown in Table 2, these primers have similar length, GC content and annealing temperature requirements.
Primers for longer regions of the mitochondrial genome Two primer pairs designated as PHCDBS 1F, 1R and PHCDBS 2F, 2R were designed to amplify larger portions of the mitochondrial DNA. A combination involving other primer sets such as PHCDBS 3F and PHCDBS 14R were also used to cover a region of 10kb (Table 2). Another primer combination (PHCDBS 14F + PHCDBS3R) was used to cover the remaining mitochondrial region of 7kb ( Primer each. The PCR thermal cycling program set as follows: 98 0 C (30 s); 35 cycles of 98 0 C (10 s), 50 0 C (30 s), 72 0 C (6 min) followed by nal extension at 72 0 C (2 min). As shown in the Fig. 3, the designed primer sets were successfully ampli ed by PCR.

Tests of sequence coverage
To test the sequence coverage of different primer pairs, the larger fragments (I and II) were sequenced using a next generation sequencer (Illumina HiSeq 2500). Small fragments were sequenced bidirectionally on a Sanger Sequencing Platform (Genetic Analyzer ABI 3730 xl) using standard operating protocols. Sequences obtained were analyzed using bioinformatic curation methods, and mitochondrial assemblies were obtained. A graphical representation of actual primer positions and the regions covered are depicted in Fig. 4.
This genome assembly was also compared to reference genomes and found to be fully aligned in respect to gene order and genome coverage.

Discussion
Studies have shown that datasets derived from complete mitochondrial genome sequences appear to offer more consistent information about evolutionary relationships among species of higher taxa such as primates, and that these can be used effectively to establish the timescale of their evolution (Finstermeier et al. 2013;Kurabayashi and Sumida 2013). In contrast, studies using single or small numbers of genes to analyze evolutionary relationships have often reported rapid radiations or unresolved relationships, largely because the conclusions are based on the use of relatively small numbers of informative sites (Matsui et al. 2009). Phylogenies generated using complete mitochondrial genomes have also been shown to have considerably higher levels of statistical support when compared to analyses based on single genes (Liedigk et al. 2014). Therefore, the use of these larger datasets also has the potential to raise even a weak phylogenetic signal to a level above that of random noise (Hillis and Bull 1993).
However, owing to factors such as differing transition to transversion substitution ratios between even closely related species, it is often challenging to nd primers suitable for comparative studies of complete mitochondrial genomes. More speci cally, for many primate species, even for closely related species, attempts to use the same pair of primers for cross species ampli cation often fails.
The present study was planned with the goal of studying evolutionary questions related to primate phylogeny that are yet to be resolved in general for several species , and in particular for resolution of relationships among several primate species found in India. For this goal, a new approach was developed to obtain complete mitochondrial genome sequences from a collection of closely related primate species. The approach we have used is novel compared to methods used and proposed by others (Wu et al. 2004;Chuang et al. 2006;Chen et al. 2009;Yang et al. 2009;Yang et al. 2010).
The protocol shown in Fig. 1 describes the method that relies rst on the use of conserved regions identi ed from alignments of published primate mitochondrial genomes. These alignments reveal several conserved regions where primer design algorithms are then used to identify primers for ampli cation beginning at the 5' end of one region (such as PHCDBS 3F) and the 3' end primer of another region (such as PHCDBS14R). This single primer pair can amplify approximately half (6726 bp) of the entire mitochondrial genome. In a similar manner, another primer pair using PHCDBS 14F as the 5' prime end primer and PHCDBS 3R as a 3' prime end primer was used to cover another large segment (9837 bp) of the mitochondrial genome (Fig. 5).
One of the potential challenges of using this method is the possibility of poor coverage in certain regions ( Fig. 4, Table 2). This may be due to uncertain rates of substitutions or the possible existence of pseudogenes inserted into the nuclear genome, as suggested by various authors (Thalmann et al. 2004;Raaum et al. 2005;Finstermeier et al. 2013). To address this, apart from the primers used above to amplify large portions of mitochondrial genome, twelve other primer pairs were also designed for the ampli cation of fragments covering smaller segments of the genome. Most of these smaller ampli cation products represent the conserved regions of individual genes. These smaller products can also be used to detect ampli cation of any pseudogene copies of mitochondrial genes that may have inserted into the nuclear genome (Chiou et al. 2011). These primers were also optimized for annealing temperatures to minimize the possibility of non-speci c ampli cation (Figs. 1 & 2). Even at annealing temperatures 6 0 C lower, non-speci c ampli cation was not observed (Schoenbrunner et al. 2017). The primers used to successfully amplify the primate mitochondrial genome of S. hypoleucos along with their resultant sequence analysis are shown in the supporting data ( Supplementary Fig. S1; Supplementary Table S1).
Overall, this strategy may help in minimizing sequencing costs using Sanger sequencing platforms (Ughade et al. 2019) and for validation of NGS based data in genome assemblies. The primer design also ensures that there is su cient overlap of the different ampli ed fragments in order to obtain the complete genome sequences, including the primer sites and anking nucleotides (Fig. 3).
Applying the strategy mentioned in Fig. 1 of designing primers for ampli cation of both long and short segments of the mitochondrial genome can be applied to characterization of the entire mitochondrial genome of many different closely-related species to S. hypoleucos. Beginning with a download of the entire mitochondrial genomic sequences of a species within a given family (from Genbank or other sources) our algorithm to design appropriate primers (Fig. 4) can easily be implemented. Subsequently, the designed primer sets are used to validate successful PCR ampli cation and build the genome assembly representing the entire mitochondrial genome from species with mitochondrial genomes that have not yet been adequately characterized and analyzed.

Conclusion
Mitochondrial DNA represents one of the most informative molecules for evolutionary studies.
Ampli cation of the entire mitochondrial genome requires the use of robust primers. This study suggests a method of primer design and optimization (PDO) where rst long ampli cation products are produced using 5' primers from the conserved region of one gene and 3' primers from conserved region of another gene. Additional primer sets representing shorter segments of the genome are also used to ll in gaps in order to complete the mitogenome sequencing. Using this strategy, the mitochondrial genome of S. hypoleucos was successfully ampli ed and sequenced. Applying this strategy of designing primers using conserved regions of known mtDNA sequences may be utilized for ampli cation and characterization of the entire mitochondrial genome sequences from many other species where groups of closely related species are known to exist.

Declarations Acknowledgement
Authors are thankful to University Grants Commission, New Delhi, India for providing Junior Research Fellowship to Vipin Hiremath. Non-invasive samples were provided by Director, Pilikula Biological Park, Mangalore is highly acknowledged. Also we are thankful to Dr. Bharathi Prakash for her assistance in sample collection. We sincerely thank all staff member and students at Paul Hebert Centre for DNA Barcoding and Biodiversity Studies, Aurangabad for their assistance in completing this work.

Figure 1
Page 15/17 Flow chart of primer design and PDO for mitochondrial genome studies

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.