Reappraisal of the sub species hypothesis for Gangetic dolphins based on mitochondrial genomics

The Gangetic dolphin, P. gangetica, is one of the oldest known creatures, and its taxonomic status has been debated for over two centuries with no agreement. The clarication of the systematic relationships of Platanista taxa has been listed as among the highest priority for all cetaceans. Of the several taxonomic schemes proposed, all have been very limited and do not provide sucient weight of evidence to defend any taxonomic classication. Consequently, to resolve this case, a new mitochondrial genome record for P. gangetica gangetica was created for quantitative evaluation of the Ganges and Indus River dolphin populations to estimate species boundaries. This has aided in the resolution of the taxonomic status of the Gangetic dolphin, P. gangetica by showing that the subspecies P. gangetica gangetica and P. gangetica minor are indistinguishable based on the molecular differences and do not follow the standards for molecular species boundaries. Also geographic historical reviews show that these two populations were one during the Miocene era, and that segregation occurred during the geosyncline effect of the Bay of Bengal. However, with the present set of evidences, we propose to abandon the subspecies hypothesis for these P. gangetica taxa.


Introduction
In addressing the changing taxonomic status of these dolphins, Rice [11] pointed out that most of the past studies were challenged by severe methodological aws in terms of sampling, lack of statistical support and poor evidence to defend any particular taxonomic classi cation scheme. He concluded, however, that these taxa comprised a single species consisting of two subspecies, and this taxonomic classi cation has been used by the Society for Marine Mammalogy [12]. However, some authors have argued that the comparisons made for these taxa on various levels of morphology alone do not provide su cient weight of evidence for a de nitive classi cation scheme [13][14][15][16][17][18][19][20][21][22][23].
At the same that the arguments over the sub-speci c status for the Ganges and Indus dolphins have continued, these issues have also been brought to a higher priority by the need for a conservation policy [24]. As of now, both dolphins have seriously suffered from drastic declines in their populations across their entire ranges in both the rivers. In 1996 the IUCN changed their status from 'Vulnerable' to 'Endangered', in part because of events across the distribution of their ranges that severely affected their habitats and brought them to the verge of extinction. To resolve such controversies, especially for cases such as the Gangetic river dolphins, the use of alpha level taxonomy, combined with recognition of evolutionarily signi cant units within species, can be important components of conservation strategies directed towards the preservation of genetic diversity and evolutionary potential of these species.
The need to address challenges and limitations of resolving issues relating to the status of these P. gangetica species have prompted others to use molecular approaches [25][26][27][28][29]. Insights potentially gained from these approaches may also aid in reconstructing phylogenetic histories to delineate systematic relationships. In this study, we present the rst quantitative evaluation of both the Ganges and Indus River dolphin and estimated species boundaries using a new set of complete mitochondrial genome data. Furthermore, our study also includes an analysis of the phylogenetic relationships among river dolphin clades as well and their relationships to other marine dolphins. The new data set generated through this mitochondrial genomics approach in combination with historical insights has advanced the understanding of the status of subspecies for P. gangetica.

Ethical statement
Tissue sample were obtained under permissions letter no. 4488, Dated 16.12.14 from the Principal Chief Conservator of Forests, Patna, Bihar (India). All methods were performed in accordance with the CPCSEA guidelines and regulations (http://cpcsea.nic.in/Content/55_1_GUIDELINES.aspx). We did not perform experimentation directly on live animals.
Sample collection, DNA extraction and ampli cation Tissue samples from dead animals provided to us were washed and preserved in ethanol (95%), transported to the laboratory and stored at -50 0 C until further processing. Genomic DNA was puri ed using the Promega wizard genomic DNA kit following manufacturer's instructions. Puri ed DNA was stored at -20 0 C for downstream applications. For PCR ampli cation of the mitochondrial genome, two sets of primer were designed using Primer3 and published reference genome sequence information (Accession No. NC_005275.1) [30]. The complete mitochondrial genome was ampli ed using GoTaq Long PCR Master mix (Promega, Madison, USA) in two amplicons using newly designed primers (Supporting table S1). The reaction components were added following manufacturers' protocols, and thermal cycling conditions as follows: initial denaturation at 95 0 C for 3 min, followed by 30 cycles at 94 0 C for 30 s, annealing at 52-56 0 C for 30 s, extension at 65 0 C for 13 min, and a nal extension at 72 0 C for 10 min.

Cluster Generation and Sequencing
After quality checks, the libraries were loaded for cluster generation and sequencing on a NextSeq. Illumina platform (Illumina Inc. USA). Paired-end sequencing was used to have the template fragments sequenced in both the forward and reverse directions following manufacturer protocols.

De novo assembly and Annotation
High quality paired end data was generated denovo and assembled using CLC genomics workbench 6.0. Similarity searches were carried out against NCBI's Nucleotide (NT) database using BLASTN algorithm to identify scaffolds [31]. Scaffold #4 was used for annotation by the MITOS2 webserver because it showe signi cant similarity against the P. minor (AJ554058) reference mitochondrial genome [32]. Intergenic regions between the trnP and trnF genes was found to be 857 bp, and thus we suspected it as a control region. This was annotated using pairwise similarity searches (BLAST) against the control regions of reference mitochondrial genomes. Annotation of t-RNAs was con rmed using the MITOS 2 webserver and tRNAscan-SE v 2.0 [32][33]. The intergenic spacers and overlapping regions between genes were counted manually. The CG view webserver was used for preparing a circular map [34]. The relative synonymous codon usage (RSCU) of the protein coding genes was obtained using CodonW (Peden, n.d.).

Selection of taxa, Sequence alignment and Data partition
We selected data from 32 species of cetaceans as an in group and three species of Artiodactyla as an out group for rooting the phylogenetic tree (Supporting Table S2). For multiple sequence alignments, we used MAFFT version 7.273, which implements consistency based algorithms [35]. The L-INS-I algorithm was used for alignment. Partition Finder version 2.1.1 was used for data partition and choosing best substitution model [36]. The model search for DNA sequences were set to "MrBayes". The greedy algorithm was used, with branch lengths estimated, to search for the best-t partitioning model. The branch lengths were set as linked.

Comparative mitogenomics
We compared the annotated mitochondrial genome of P. gangetica gangetica with mitochondrial genomes of 31 species (Supporting Table S2). Mitochondrial genomes were downloaded from NCBI refseq database [37]. The base composition of the mitogenomes in all taxa under study was calculated

Phylogenetic analysis
Phylogenetic relationships were constructed using the Bayesian inference method (BI) implemented through MrBayes v3.2.557 [44]. Here we used the GTR+ I+ G as an evolutionary model for 2 partitions and the HKY+ I+ G model for 1 partition. Four simultaneous Markov chains were run for 10 million generations, with tree sampling occurring every 1000 generations and a burn-in of 25% trees.
The Maximum Likelihood (ML) phylogeny was obtained using RAxML-HPC2 on the XSEDE (8.2.10) tool of the CIPRES Science gateway web portal based on the nucleotide sequences of the 13 protein-coding genes [45]. In the ML analyses, we used the GTRGAMMA for nucleotide sequences. For each analysis, 200 runs were conducted to nd the highest-likelihood tree, followed by analysis of 1000 bootstrap replicates. NJ relationships were deduced with 1000 bootstrap in MEGA 7.0.

Mitochondrial genome characteristics
Overview of the mitochondrial genome This study reports on rst annotated sequence of the mitochondrial genome of the Ganges river dolphin, P. gangetica gangetica. The sequence data obtained here have been deposited in GenBank under accession MF 990206. The annotated genome is double stranded molecule of 16563 bp (Fig. 1). It contains 37 genes which includes 13 protein coding genes (atp6, atp8, cytb, cox1 to cox3, and nad1 to nad6), two rRNA genes (rrnL, rrnS), 22 tRNAs and a D loop region (Table 1). Here we noticed 28 genes located on positive strand and nine genes on negative strand ( Fig. 1) as well as two origins of replication region, OL (origin of "L" strand) which is of 31 bp and OH (origin of "H" strand) of 303 bp (Table 1). tRNA Structure Twenty two tRNA genes were observed which ranged from 66-77 bp in size. Of these, 14 tRNAs were located on positive strand while eight were on the negative strand (Table 1 and Fig. 1). Also, all 22 of these tRNAs used unique antisense codons (Table 1).

Intergenic nucleotide arrangement
A total of 73 bp of non-coding intergenic nucleotides, ranging from 1 to 39 bp in size, were observed at 14 locations. The longest AT rich (61.53%) intergenic region of 39 bp was found between the cox1 and trnS2 genes. A total of 13 genes and one control region were found to be adjacent with no intergenic nucleotides. In addition, 10 genes were found to have overlaps ranging from 1 bp to 40 bp in length ( Table 1). The largest overlap of intergenic nucleotides (40 bp) was observed between atp8 and atp6 while, the smallest overlaps (one bp) were noticed at four locations between OL-trnC; atp6-cox3; cox3-trnG and trnT-trnP respectively. Similarly, other gene overlaps were observed between trnI and trnQ (3 bp); nad2 and trnW (2 bp); nad4l and nad4 (7 bp) ( Table 1).

Nucleotide composition
The nucleotide composition of the P. gangetica gangetica mitochondrial genome revealed that it is AT rich with strand asymmetry as re ected by AT and GC skew values. The AT skew value was found to be zero while the GC skew was -0.40. The nucleotide composition and GC percentage values of four river dolphins and 27 other marine cetacean species are depicted in Table 2.

Initiation and termination Codons
The initiation and termination codon bias for protein coding genes (PCG) was analysed and compared for four river dolphins and 27 other marine cetacean species (Table 3).

Codon usage
Among all protein coding genes, the Leu, Thr, Ile, Ser1, Pro, Ser2 amino acids were the most used whereas CUA, AUA, ACA and AUC were the most frequently used codons.

Gene order comparisons
The gene order predicted for P. gangetica gangetica was compared to other cetaceans. No signi cant changes in gene order were seen, but there were slight differences in some gene lengths (Table 1; Fig. 1).

Evolutionary rate of protein coding genes
To examine the evolutionary rate among the mitochondrial PCGs in cetaceans, values of nonsynonymous substitutions (Ka), synonymous substitutions (Ks), and the ratio of Ka/Ks, (ω) were calculated for each PCG in the river dolphin species.
i. Evolutionary rate of PCGs of river dolphins and other cetaceans The evolutionary rate was lowest (0.02) for the COX1 gene in the four river dolphin species. This can be compared the rate for the ND6 gene (0.32), which was highest (Fig. 2). Other cetacean species compared here showed the lowest evolutionary rate for the ND 5 gene (Fig. 2). Overall, all cetaceans have shown a relatively low evolutionary rate (<1), possibly suggesting that greater selection pressure, leading to the purifying selection, is acting on all PCGs.

Comparative accounts
To deduce the phylogenetic relationships of P. gangetica gangetica in the river dolphins by clades, a comparative study was conducted by including previously published mitochondrial genomes of four river dolphin species and of 27 marine cetacean species (Supplementary Table S2).
Data partitioning resulted in three partitions as best suited for the evolutionary model. The rst partition, GTR+I+G, included nine genes (NAD2, ATP8, ATP6, NAD4l, COB, NAD1, NAD4, NAD5 and NAD3). The second partition, GTR+I+G, included three genes (COX1, COX3 and COX2) while the third partition, HKY+I+G, included only a single gene ND6. Results of tests for tree topology are depicted in Table 4. Phylogenetic relationships of the river dolphins with other marine dolphin species was inferred using concatenated protein coding gene datasets, and this revealed that river dolphins are paraphyletic (Fig. 3). Also, the relationships of P. gangetica gangetica and four river dolphins revealed through analysis of the COI and CytB gene, is depicted in Fig. 4. This clearly resolves other river dolphin species but shows a mixed clade for P. gangetica gangetica and P. gangetica minor.
Comparison of genetic distance for species boundary Genetic distances between P. gangetica gangetica and other four river dolphin species, as well as marine dolphins, were compared using various markers and genomic regions ( Table 5). These values clearly differentiate all dolphin species (except P. gangetica minor) where intraspeci c values were < 3%. Overall, this supports the applicability of mitochondrial genome analysis in detecting molecular differentiation between species. To further clarify species boundaries, a pairwise interspecies comparison was conducted for the four river dolphins (Table 6).

Discussion
Taxonomic assignments are of great importance for many reasons, yet their delimitation is often fraught with both conceptual and methodological di culties [46]. The taxonomy of the Gangetic dolphins are one such example. Taxonomy issues for these dolphins have been debated for almost two centuries, largely because usable morphological characters as well as ambiguous fossil records were found to be largely confusing [14]. Resolution of such issues requires a comprehensive hypothesis that can be used to examine taxonomy through molecular and biogeographical explorations.
Over the past two decades, several evolutionary biologists have given considerable attention to studying the evolution of complete mitochondrial genome as a phylogenetic tool for systematic and comparative analysis and for resolving evolutionary relationships of a broad range of mammals and other organisms [47][48][49]. However, the necessary molecular data (nuclear and mitochondrial DNA sequences) to apply this approach to the Gangetic dolphins from India was lacking. The availability of such data on this species can serve as important genetic resource for advancing our knowledgebase about these and other taxa [13,27].
To move in this direction, a new genetic resource of mitochondrial genome has been created here for P. gangetica gangetica from India. This species is similar to other cetaceans, particularly P. gangetica minor, in terms of several genomic features [50] (Table 1, 2 and 3). Using this genomic resource, the relationship of the river dolphin to the marine dolphin has been clari ed based on BI and ML analysis (Fig. 3). This revealed that river dolphins are paraphyletic in their origin, consistent to ndings by others [13][14][15][51][52] including past morphological studies noting that the Platanista are not closely related to the river dolphins [53,16]. Also, as compared to other river dolphins, speci cally the Ziphidae, the Platanista show a basal position in the phylogeny (Fig. 3), indicating that P. gangetica has more ancestral characters than other river dolphins [54].
The evolutionary rate (ω) estimated here for the river dolphin species, in general, is substantially higher than that seen for marine dolphin species (Fig. 5). This is consistent with the suggestion that marine organisms are less diverged as compared to other aquatic and terrestrial taxa at the level of mitochondrial DNA, for ex. Bloom [55]. In the river dolphin, the COI gene showed the lowest evolutionary rate (0.03). In marine dolphins, the lowest rate was seen in the ND5 gene (0.01). This suggests that both genes can be potentially be used for in molecular taxonomic studies of differentiation of the species (Fig.  2) [56][57]. However, these ndings also raised concern about the inferences made in previous studies about the river dolphin/marine dolphin relationships using genes like Cyt B that show higher evolutionary rates (>0.1) [58,[26][27][28][29] or other segments such as hyper variable D-loop regions or non-coding regions [58,27].
In such cases, phylogenetic inferences may re ect soft polytomy as suggested for the case of the Gangetic dolphin [27]. Here, based on the lowest evolutionary rates (Fig. 2, Table 5 and 6), it can hypothesized that the river dolphin species can be best discriminated using both intraspeci c and interspeci c distances based on COI gene sequences. Marine dolphin species, however, can be discriminated using these same distances derived from the ND5 gene [57,[59][60]. These observations emphasize the importance of sequencing complete mitochondrial genomes to explore alternative hypotheses for molecular phylogenetics and species differentiation [61].
At the intraspeci c level, the data in tables 5 & 6 show that taxonomic assignments for all river dolphin species could be resolved based on the criteria of a 3% threshold for species discrimination using molecular data as proposed by [42]. Here, these genes do not differentiate between P. gangetica gangetica and P. gangetica minor because genetic difference values of only 0.3% were observed, well within the species boundary suggested by [42]. The intraspeci c distances at four genetic locus viz. the complete mitochondrial genome (1.4%), COI gene (0.34%), Cyt B (0.6%) and 13 protein coding genes (2.43%) were also all below this threshold value (Table 6). This clearly supports the idea that P. gangetica gangetica and P. gangetica minor are not two separate biological species, but represent instead a single biological species. These ndings are highly relevant to suggestions by Anderson [5] and Kasuya [62]. Furthermore, the phylogenetic analysis supports the same conclusion with high statistical support (Fig. 4) con rming that P. gangetica gangetica and P. gangetica minor as also being indistinguishable. These results should help bring an end to the lengthy debate on species status of P. gangetica which has been ongoing for the past two centuries [1][2][3][4][5][6][7][8][9][10][11][12][64][65]. Similar low intraspeci c values have also been observed by others for these two taxa [14,27], supporting the observations made here. In summary, it can be concluded that two populations of P. gangetica analysed here in fact belong to one biological species, and the concept of subspecies proposed as by Kasuya [8] is refuted because the mitochondrial genomics do not do not support the sub species concept for this case [66].
The conclusion of the Commitee on Taxonomy [12], accepting that the Gangetic dolphin P. gangetica is comprised of two subspecies, P. gangetica gangetica and P. gangetica minor, a valid parameter that might have been taken into consideration was the presence of two different river habitats (the Ganges and the Indus). The term subspecies was introduced by Eugenius Esper to represent the lowest possible taxonomic unit to enhance understanding of geographic varieties and speciation within animals [66][67][68].
However, the two P. gangetica populations (the Gengetic and Indus) studied here also do not show any morphological difference that may correlate with their evolutionary histories [69][70] or genetic differences to justify them being recognized as geographic varieties [67], except perhaps their unique habitats [68]. Thus our observations can add important insights into the debate on concept of subspecies [68,[71][72][73][74] since Linnaean taxonomy, including the identi cation of subspecies, has been predominantly based on morphological data and has operated under the frameworks of the morphological and biological species concepts [68,75].
For these dolphins, although several studies have been undertaken, none can establish evidence consistent with either species concept [1--12, 63, 65-66]. It is well known that dependence on morphology to determine subspecies status can be challenging because morphological differences could simply be due to phenotypic plasticity and not unique evolutionary histories [76]. Additionally, as in the case of cryptic species, a lack of morphological difference does not necessarily indicate a lack of unique evolutionary histories [76][77]. Finally, since the use of the subspecies designation is rather subjective nature [76, 78-79], it does not seem to be appropriate to describe the two populations of P. gangetica as sub species simply based on geographical isolation [68], even though others have emphasized the prominence of geography in making this distinction [80][81][82].
For any further discussion of this issue, it will be necessary to consider the history of the origin of the Indus, Brahmaputra and the Ganges rivers. According to Pascoe [83] (1919), during the Eocene, within the Indian subcontinent, a gulf extended from Sind northwards to Afghanistan, then turned eastwards and south-eastwards through the Kohat and the Punjab extending a path to a great river, the head-waters of which consisted of the portion of the Brahmaputra owing through Assam (Fig. 6). This river streamed west and north-westwards along the foot of the Himalayas as far as the North-West Punjab, where it turned southwards along a line of the modern Indus, and emptied to the Arabian Sea. This shows that the Assam Brahmaputra was once the origin of the river Indus, and it also means that the current two populations of P. ganagtica both originated from one source-the Brahmaputra (Fig. 6). This is clear evidence that the Ganges and Indus populations became segregated during the geosyncline effect of the Bay of Bengal when these three river basins took their shape [83]. At the present, the two populations of the Gangetic dolphins still do not show any evidence of remarkable evolutionary adaptations in their morphologies [6][7][8][9]. This may be because, as an endothermic organism, the rate of evolution must be prohibitively low [84]. Furthermore, both the Ganges and Indus rivers are con ned to a narrow latitudinal gradient (24° 45' 19.1282" to 25° 32′ 15" North) and do not differ much in terms of the climatic conditions, their origin, ow rate, basin width and depths, water availability, food and competition, etc. This might have allowed for similar selection pressures on both dolphin populations so that organisms inhabiting these two river systems would show similar evolutionary rates as has clearly been evidenced from mitochondrial genome data presented here (Tables 1,2,. Accordingly we are proposing a hypothesis as depicted in Fig. 6, to apprise the sub species concept for the Gangetic dolphin. Ball and Avise [91] have suggested that it is appropriate to de ne subspecies based on populations with major divisions in genetic diversity or that are phylogenetically distinct [92]. In our analysis, we rejected the molecular support to this subspecies concept when we conducted COI and Cytb gene based phylogenetic analysis for river dolphins, and we found that the two subspecies are phylogenetically undistinguishable and tightly cluster in a single clade with high boot strap support (Fig. 4 a & b). These nding are highly relevant to the arguments to de ne subspecies based on congruence amid data such as morphology, genetics, and ecology [75,[93][94]. Similarly, several studies have reported taxonomic nonconsensus based on morphological versus molecular data [72-73, 76, 95]  Based on this, we are proposing to amend the taxonomic status of P. gangetica. Finally, although we are of opinion that the two P. gangetica populations are genetically indistinguishable and should not be described as subspecies, we do suggest that further scienti c efforts be directed toward corroborating the biological species concept through breeding experimentations involving these two geo types to con rm the evolution of any reproductive isolation among these two populations. However, for the present, based on the evidence described here, we propose to abandon subspecies designation for P. gangetica.

Conclusion
We propose here a resolution of the debate occurring over two centuries on the taxonomic status of the Gangetic dolphin, P. gangetica, through the use of mitochondrial genomics. The study has provided a molecular basis to con rm that the subspecies P. gangetica gangetica and P. gangetica minor, recognized as geographic varieties, do not follow the criteria for species boundary and are in fact indistinguishable based on the molecular differences. Also, historical reviews show that these two populations were part of a single population during the Miocene era, and that they became segregated during the geosyncline effect of the Bay of Bengal when the river basins separated along with these dolphin populations. Further, through phylogenetic analysis it was established that river dolphins are paraphyletic and no relationship prevails between P. gangetica and other river dolphin species. Based on various comparisons and this historical review, we propose to abandon the subspecies designations used for P. gangetica. We sincerely thank the PCCF, Patna providing tissue samples of the Gangetic dolphin. We are also thankful to Professor David Haymer for manuscript revision that has substantially improvised the content. We sincerely thank all staff member and students at Paul Hebert Centre for DNA Barcoding and Biodiversity Studies, Aurangabad for their assistance in completing this work.

Declarations
Authors Contribution GDK, CDK, NP conceiving research idea, CJ collected samples and conducted laboratory experiments, GDK, CDK analysed data and GDK, CDK, BP wrote manuscript.
Competing interests: The authors do not have con ict of interest to declare. Circular map of Platanista gangetica gangetica mitochondrial genome. Map shown in gure represents the location of genes and other mitochondrial genome features. Outer circle represents genes in positive (+)/Heavy strand, second inner circle part represents genes in negative (-)/Light strand, third inner circle represents GC content and fourth inner circle shows GC skewness. Transfer RNAs are shown in red; Ribosomal RNAs in pink; protein coding genes in blue, control region in gray, GC content in black, GC skew(+) in green and GC skew(-) in purple. OH represents origin of "H" strand replication and OL represents origin of "L" strand replication.