2.1. Duplicated gene order identified in mitogenomes of analysed Palaeognathae taxa
Using an appropriate PCR strategy (Fig. 2), the diagnostic CR1/CR2 fragments were obtained for Struthio camelus (Fig. S1a in Additional file 1), Rhea pennata (Fig. S1b in Additional file 1), Rhea americana (Fig. S1c in Additional file 1) and Crypturellus tataupa (Fig. S1d in Additional file 1). Only two out of 16 or 48 reactions failed in taxa for which species-specific primers were designed based on the previously published sequences of complete mitogenomes (Struthio camelus and Rhea species) (Table S1 in Additional file 2). In the case of Crypturellus tataupa, amplicons were obtained only for six out of 12 tested reactions. This was caused by the fact that primers dedicated for this species were designed based on the Eudromia elegans mitogenome [26]. As in the published Crypturellus tataupa genomic sequence [62], the control region and adjacent genes were missing. Sequencing and annotation of the obtained amplicons revealed the presence of tRNA-Pro/ND6/tRNA-Glu fragments between two control regions for Struthio camelus, Rhea pennata, Rhea americana and Crypturellus tataupa (Fig. 1). The duplicated fragment obtained for Struthio camelus differed only in one nucleotide from the homologous region present in the previously published mitogenome (Fig. S2a in Additional file 1). These fragments in rheas showed 100% identity with corresponding homologous regions (Fig. S2b and Fig. S2c in Additional file 1).
Although the high identity strongly indicates a mitochondrial origin of the amplified CR1/CR2 fragments, additional diagnostic reactions were designed to exclude a possibility of nuclear mitochondrial DNA inserts (NUMTs) amplification. Based on the obtained sequences of ND6 genes, appropriate primers were created to amplify ND6-1/ND6-2 regions. Sequencing of the obtained PCR products revealed the ND6/tRNA-Glu/CR/tRNA-Pro/ND6 gene order for all analyzed species. The corresponding CR/tRNA-Pro/ND6 regions overlapped the appropriate CR1/CR2 diagnostic fragments and showed 100% identity. Additional PCR reactions (see Materials and methods and Fig. 2) were run to complete the missing parts of CRs and to reveal the order of genes preceding the first control region. Finally, the complete mitogenomic fragments containing the duplicated regions were obtained by assembling four overlapping fragments (Fig. 2). Their length was: 8,554 bp for Struthio camelus, 8,254 for Rhea Americana, 8,360 bp for Rhea paennata and 7,044 bp for Crypturellus tataupa (Table 1; Fig. S3 in Additional file 1). In all cases the same gene order was found (GO-I; Fig. 1e, Table 1, Fig. S3 in Additional file 1), which was previously annotated only for two Passeriformes species, Notiomystis cincta and Turdus philomelos [41]. This gene order differs from the most complete known avian duplication (GO-FD; Fig. 1d) in the lack of the second copies of cytb and tRNA-Thr genes, expected between CR1 and tRNA-Pro2 gene. The presence of identical copies of tRNA-Glu gene (Fig. S2a-d in Additional file 1) enabled us to position precisely the 5' ends of both control regions. The 3’ ends of the CR2s precede tRNA-Phe genes as in all other avian gene orders including two potentially functional control regions. The number of nucleotides between the tRNA-Glu copies and appropriate poly-C sequences located at the 5' ends of CRs vary from 2 bp (Rhea americana, Rhea pennata and Crypturellus tataupa) to 26 bp for Struthio camelus (Table S2 in Additional file 2). The CR2 in Rhea pennata and Crypturellus tataupa is longer than CR1, which obey the rule observed in 13 crane species [12]. The tandem duplications found in the mitogenomes of Struthio camelus, Rhea americana, Rhea pennata and Crypturellus tataupa make them longer compared with their previous genomic versions assuming the typical avian gene order.
Table 1
Palaeognathae species analyzed in this study in terms of duplicated regions as well as gene orders found within their mitogenomic fragments, which were amplified and sequenced. The sequences are presented in Figure S3.
Sources of samples: ZOO WRO – Zoological Garden in Wrocław, Poland; ZOO WAW – Zoological Garden in Warsaw, Poland; ZOO KAT - Zoological Garden in Katowice. L – tRNA for leucine, ND5 - NADH dehydrogenase subunit 5, cytb – cytochrome b, T – tRNA for threonine, P – tRNA for proline, ND6 – NADH dehydrogenase subunit 6, E – tRNA for glutamic acid, CR – control region, F – tRNA for phenylalanine, 12S – 12S rRNA, V – tRNA for valine, 16S – 16S rRNA.
Species
|
Sample type
|
Source
|
Accession number
|
Length (bp)
|
Fragment
|
Struthio camelus
|
Blood
|
ZOO WRO
|
MH264503
|
8 554
|
L/ND5/cytb/T/P1/ND6-1/E1/CR1/P2/ND6-2/E2/CR2/F/12S/V/16S
|
Rhea americana
|
Blood
|
ZOO KAT
|
MK696563
|
8 254
|
ND5/cytb/T/P1/ND6-1/E1/CR1/P2/ND6-2/E2/CR2/F/12S/V/16S
|
Rhea pennata
|
Blood
|
ZOO WAW
|
MK696564
|
8 306
|
ND5/cytb/T/P1/ND6-1/E1/CR1/P2/ND6-2/E2/CR2/F/12S/V/16S
|
Casuarius casuarius
|
Blood
|
ZOO WAW
|
-
|
-
|
-
|
Dromaius novaehollandiae
|
Blood
|
ZOO WAW
|
-
|
-
|
-
|
Crypturellus tataupa
|
Blood
|
ZOO WAW
|
MK696562
|
7 044
|
ND5/cytb/T/P1/ND6-1/E1/CR1/P2/ND6-2/E2/CR2/F/12S
|
2.2. Probable presence of mitochondrial CR1/CR2 fragments in Casuarius casuarius and Dromaius novaehollandiae nuclear genomes
In the case of two other Palaeognathae species, Casuarius casuarius and Dromaius novaehollandiae, an attempt to amplify the CR1/CR2 fragment was also made. Similar to other taxa, species-specific D-F and D-R primers (Fig. 2; Table S1 in Additional file 2) were designed using the sequences of previously published complete mitogenomes (AF338713.2 and AF338711.1). In contrast to the results obtained for the other Palaeognathae species, most PCR reactions failed to amplify the expected fragments. In Dromaius novaehollandiae, amplicons were obtained only for 3 out of 25 tested reactions (Fig. S4a in Additional file 1, Table S1 in Additional file 2). Analogously, PCR products were obtained only for 4 out of 56 reactions for Casuarius casuarius (Fig. S4b in Additional file 1, Table S1 in Additional file 2). Moreover, single DNA fragments were not obtained for any of these seven reactions, although different annealing temperatures were applied (Fig. S4 in Additional file 1). Taking into account the heterogeneity of the obtained DNA fragments as well as the fact that most of the tested reactions failed, we can conclude that the PCR products presented in Figure S4 in Additional file 1 were not amplified on the mitochondrial genome template. The D-F and D-R primers as well as the applied PCR are highly specificity and diagnostic for the presence of control region duplication in parrots [23], cranes [12] as well as black-browed albatross, ivory-billed aracari and osprey [4]. Therefore, the seven positive amplicons most likely represent mitochondrial DNA fragments located in the nuclear genomes, i.e. NUMTs. It means that Casuarius casuarius and Dromaius novaehollandiae or their ancestors had mitogenomes comprising two control regions, which were then transferred into the nucleus during evolution.
2.3. Reannotation of Eudromia elegans mitochondrial gene order
The GO-I gene order found in this study for four Palaeognathae taxa differs from that of the published mitogenomic sequence of Eudromia elegans [26]. This rearrangement appears to be a degenerated form of GO-I gene order as it lacks the first copy of ND6 and tRNA-Glu genes as well as the second copy of tRNA-Pro gene. This fact prompted us to search for a potential tRNA-Pro pseudogene hidden within the last 122 nucleotides of the first control region of Eudromia elegans mitogenome. In fact, the comparison of CR1 sequence with the functional tRNA-Pro sequence of this species revealed a very significant similarity (E-value = 1.2⋅10− 6; 81% identity) between these sequences along the 84-bp alignment (Fig. S5a in Additional file 1), which supports the presence of the tRNA-Pro pseudogene in the Eudromia mitogenome in the position between 16,272 bp and 16,349 bp. After reannotation of this pseudogene, the length of CR1 reduced to 1,352 bp. The newly annotated Eudromia gene order was defined as GO-P1 in Fig. 1e.
2.4 Reannotation of mitochondrial gene order in the mitogenomes of Anomalopteryx didiformis, Emeus crassus and Dinornis giganteus
Our analysis of 5’ spacers, i.e. fragments of control regions located between the tRNA-Glu gene and poly-C motif, revealed that they are much longer in annotated Anomalopteryx didiformis, Emeus crassus and Dinornis giganteus mitogenomes than in other Palaeognathae species. These spacers of the most Palaeognathae taxa are from 2 bp to 33 bp in length (Table S2 in Additional file 2). Surprisingly, the lengths of Anomalopteryx didiformis, Emeus crassus and Dinornis giganteus spacers are 133 bp, 157 bp and 150 bp respectively. Additionally, all three fragments contain a purine-rich insertion (Fig. S5b in Additional file 1) analogous to that in parrot ND6 pseudogenes (Fig. S5c in Additional file 1) [23]. In the Psittaciformes mitogenomes (Probosciger aterrimus goliath, Eolophus roseicapilla and Cacatua moluccensis), this insertion is preceded by a fragment (433–450 bp) almost identical with the first ND6 copy followed by a highly degenerated region. This similar sequence pattern prompted us to search for potential ND6 pseudogenes within the 5’ spacers of control regions in Anomalopteryx didiformis, Emeus crassus and Dinornis giganteus. The comparison of 5’ CR sequences with appropriate ND6 genes of these species revealed a significant similarity between the aligned sequences (Table S3 in Additional file 2). Those from Anomalopteryx didiformis were identical in 71% with E-value = 0.13 (Fig. S5d in Additional file 1) and from Emeus crassus in 73% with E-value = 0.0015 (Fig. S5e in Additional file 1). The alignment of Dinornis giganteus sequences was much more significant with E-value = 5.8⋅10− 106 and the sequences showed 83% identity (Fig. S5f in Additional file 1). The obtained identity and E values are in the range of those obtained for ND6 pseudogenes and their functional copies annotated in other avian species, i.e. 65% − 96% and 0–0.23, respectively (Table S3 in Additional file 2).
Assuming the presence of ND6 pseudogenes in Anomalopteryx didiformis, Dinornis giganteus and Emeus crassus mitogenomes, the length of their CR is reduced to 1,347 bp, 1,360 bp and 1,346 bp, respectively. The CR sequences show 71–81% identity at 5’ spacers on the length 165 bp (Fig. S5b in Additional file 1). The new avian gene order present in these reannotated mitogenomes is indicated as GO-P2 in Fig. 1e.
2.5. Comparison of the duplicated regions of Palaeognathae mitogenomes
The GO-I gene order found in four Palaeognathae species (Fig. 1, Table 2) is characterized generally by a high similarity between paralogous sequences, i.e. copies found within the same mitogenome. The second copies of tRNA-Pro, ND6 and tRNA-Glu are identical with the first ones in the case of Struthio camelus, Rhea americana and Rhea pennata (Table 3). The second copy of tRNA-Glu is also identical with the first one in Crypturellus tataupa mitogenome. However, the first copies of tRNA-Pro and ND6 genes of this species differ from their paralogous sequences in three nucleotides (Table 3). Two control regions of analyzed species show a slightly greater variation in identity, from 94.4% (Rhea pennata) to 97.8% (Crypturellus tataupa). The difference is mainly located at their 3' ends, except for Rhea taxa, whose control regions differ also at their 5' ends (Fig. S2 in Additional file 1).
Table 2
Avian mitochondrial genomes analyzed in this study. GO-I, GO-P1 and GO-P2 indicate gene orders with the duplicated region. GO-TA means the typical avian gene order without duplication. Incomplete mitogenomes are marked with an asterisk and their unknown gene order is marked with question mark.
Order
|
Species
|
Accession
|
Length [bp]
|
Gene Order
|
Struthioniformes
|
Struthio camelus
|
AF338715.1
|
16,595
|
GO-I
|
Rheiformes
|
Rhea americana
|
AF090339.1
|
16,704
|
GO-I
|
Rheiformes
|
Rhea pennata
|
AF338709.2
|
16,749
|
GO-I
|
Casuariiformes
|
Casuarius casuarius
|
AF338713.2
|
16,756
|
GO-TA
|
Casuariiformes
|
Casuarius bennetti
|
AY016011.1*
|
12,348
|
?
|
Casuariiformes
|
Dromaius novaehollandiae
|
AF338711.1
|
16,711
|
GO-TA
|
Aepyornithiformes
|
Aepyornis sp.
|
KY412176.1
|
16,688
|
GO-TA
|
Aepyornithiformes
|
Aepyornis hildebrandti
|
KJ749824.1*
|
15,547
|
?
|
Aepyornithiformes
|
Mullerornis agilis
|
KJ749825.1*
|
15,731
|
?
|
Apterygiformes
|
Apteryx mantelli
|
KU695537.1
|
16,694
|
GO-TA
|
Apterygiformes
|
Apteryx owenii
|
GU071052.1
|
17,020
|
GO-TA
|
Apterygiformes
|
Apteryx haastii
|
AF338708.2
|
16,980
|
GO-TA
|
Tinamiformes
|
Crypturellus tataupa
|
AY016012.1*
|
12,205
|
GO-I
|
Tinamiformes
|
Eudromia elegans
|
AF338710.2
|
18,305
|
GO-P1
|
Tinamiformes
|
Tinamus guttatus
|
KR149454.1
|
16,750
|
GO-TA
|
Tinamiformes
|
Tinamus major
|
AF338707.3
|
16,701
|
GO-TA
|
Dinornithiformes
|
Anomalopteryx didiformis
|
AF338714.1*
|
16,716
|
?
|
Dinornithiformes
|
Anomalopteryx didiformis
|
MK778441.1
|
17,043
|
GO-P2
|
Dinornithiformes
|
Emeus crassus
|
AF338712.1*
|
16,662
|
?
|
Dinornithiformes
|
Emeus crassus
|
AY016015.1
|
17,061
|
GO-P2
|
Dinornithiformes
|
Dinornis giganteus
|
AY016013.1
|
17,070
|
GO-P2
|
Table 3
Comparison of two copies of selected genes as well as control regions (CRs) in mitogenomes from five Palaeognathae taxa.
Species
|
Copy
|
Length (bp)
|
Percent of residues identical between two copies and number of aligned residues (in parentheses)
|
tRNA-Pro
|
ND6
|
tRNA-Glu
|
CR
|
tRNA-Pro
|
ND6
|
tRNA-Glu
|
CR
|
Struthio camelus
|
1st
|
70
|
522
|
68
|
1035
|
100 (70)
|
100 (522)
|
100 (68)
|
96.9 (1023)
|
2nd
|
70
|
522
|
68
|
1036
|
Rhea americana
|
1st
|
70
|
525
|
69
|
1118
|
100 (70)
|
100 (525)
|
100 (69)
|
94.4 (1076)
|
2nd
|
70
|
525
|
69
|
1118
|
Rhea pennata
|
1st
|
70
|
525
|
69
|
1103
|
100 (70)
|
100 (525)
|
100 (69)
|
94.1 (1076)
|
2nd
|
70
|
525
|
69
|
1183
|
Crypturellus tataupa
|
1st
|
70
|
522
|
69
|
1059
|
95.7 (70)
|
99.4 (522)
|
100 (69)
|
97.8 (1059)
|
2nd
|
70
|
522
|
69
|
1196
|
Eudromia elegans
|
1st
|
73
|
−
|
−
|
1352
|
80.6 (84)
|
−
|
−
|
98.2 (1252)
|
2nd
|
78 (ψ)
|
522
|
70
|
1350
|
In contrast to GO-I gene order, the newly defined Eudromia elegans GO-P1 rearrangement is characterized by the presence of only one ND6 and tRNA-Glu gene each (Fig. 1). Moreover, the second copy of tRNA-Pro is a pseudogene, which has substantially diverged from its functional version (Fig. S5a in Additional file 1). Therefore, it seems that the GO-P1 rearrangement is a degenerated form of the GO-I gene order, in which two genes were removed and one gene was pseudogenized. Surprisingly, despite the high degree of degeneration in comparison with other analysed Palaeognathae species, two control regions of Eudromia elegans maintain the highest sequence identity (Table 3). However, the alignment of these regions clearly shows the presence of several deletions/insertions (Fig. S6 in Additional file 1).
The comparison of paralogous control regions in Palaeognathae taxa revealed that the second control regions (CR2s) are much longer only in two species, i.e. Rhea pennata and Crypturellus tataupa (Table 3). Such a difference in the length of CRs seems to be a rule in most avian mitogenomes with a duplicated region [23]. Interestingly, CRs in Rhea americana are identical in length, while those in Struthio camelus and Eudromia elegans differ only in one and two nucleotides, respectively (Table 3).
2.6. Phylogenetic relationships within Palaeognathae based on mitogenomes
The three phylogenetic methods resulted in a consistent topology (Fig. 3). The earliest diverging lineage of Palaeognathae was Struthio camelus (representing Struthioniformes) and the next Rheiformes (Rheidae). Dinornithiformes (Dinornithidae + Emeidae) is grouped with Tinamiformes (Tinamidae), whereas Casuariiformes (Dromaiidae + Casuariidae) is sister to Aepyornithiformes (Aepyornithidae) + Apterygiformes (Apterygidae). Almost all nodes are very well supported. The least significant are two nodes: one clustering Casuariiformes, Aepyornithiformes and Apterygiformes, and the other encompassing the palaeognath lineages separated after the divergence of Struthio and Rhea. Nevertheless, these two nodes obtained the highest posterior probability in MrBayes analysis, i.e. 1.0 and support in the Shimodara-Hasegawa-like approximate likelihood ratio test (SH-aLRT) equal to 93 and 78, respectively.
In order to eliminate a potential artefact related with the compositional heterogeneity in the third codon positions of protein-coding genes, we created phylogenetic trees based on the RY-coding alignment (Fig. 4). The tree topology produced by the three methods was the same as that for the uncoded alignment. The posterior probability of the two controversial nodes was still very high in MrBayes tree, i.e. 0.99 and the SH-aLRT support was 89 and 82, respectively.
Moreover, we performed phylogenetic analyses based on ten alignments, from which we sequentially excluded partitions characterized by the highest substitution rate (Table S4 in Additional file 2). The calculations provided in total 16 topologies out of which 5 are worthy of mention because they were produced by many independent approaches or were obtained also in other studies (Fig. 5). The topology t1 was identical with that based on the alignments including all sites and demonstrated rheas as the sister to all other non-ostrich palaeognaths. Such a tree was produced by MrBayes, PhyloBayes and IQ-TREE using the alignment without sites characterized by the highest substitution rate, as well as by MrBayes and IQ-TREE using the alignment after removing sites with two highest rate categories. The posterior probabilities for the clade including palaeognaths other than ostrich and rheas were very high in MrBayes, i.e. 1 and 0.98, respectively, or moderate, i.e. 0.87 in PhyloBayes. In the topology t2, the Rhea clade was grouped with Casuariiformes + Apterygiformes. However, the support of this grouping was very weak and occurred only in MrBayes tree and IQ-TREE consensus bootstrap tree based on the alignments without seven and eight highest rate categories, respectively. A greater Bayesian support (0.95–0.97) was obtained by the node encompassing rheas with Casuariiformes in the topology t3 based on the alignments after removing three, four and five highest rate categories. This topology was also produced in MrBayes using the alignment without eight highest rate categories and in IQ-TREE for the alignments without four, five and six highest rate categories. However, the node support was generally weak. The topology t4 was produced only by PhyloBayes for the alignments without two, three, four, five, seven and eight highest rate categories. As in the topology t1, the Rhea clade was also sister to all other palaeognaths excluding Struthio, but Casuariiformes were clustered with the rest non-ostrich palaeognaths, not directly with Aepyornithiformes and Apterygiformes. The posterior probability values of the clade including palaeognaths sister to rheas did not exceed 0.8. The topology t5 differed from the others because Struthio camelus was placed within other Palaeognathae and the external position was occupied by Dinornithiformes + Tinamiformes, whereas Rhea was grouped with Casuariiformes. This topology was obtained for the alignments without three (IQ-TREE) and six highest rate categories (MrBayes and IQ-TREE). Nevertheless, the controversial nodes were poorly supported.
Removing the sites with the highest substitution rate eliminated the alignment positions that were saturated with substitutions, but the number of parsimony informative sites decreased, too (Fig. 6a). Therefore, the stochastic error could increase for the short alignments and the inferred phylogenetic relationships could be unreliable. After elimination of sites with two highest rate categories, the mean phylogenetic distance in the MrBayes tree decreased abruptly from 0.94 to 0.33 substitutions per site and the maximum distance dropped from 1.99 to 0.69 substitutions per site (Fig. 6a). The sharp decrease was also visible in the number of informative sites, which constituted 56% of those in the original alignment. However, the sisterhood of rheas to other non-ostrich palaeognaths was still present in the trees based on the purged alignments and the latter group was relatively highly supported (Fig. 6b). After removing sites with at least three highest rate categories, the alignment was deprived of more than half of informative sites and alternative topologies were favored, though with smaller support values (Fig. 6b).
Among the applied topology tests, the BIC approximation produced all Bayesian posterior probabilities for the alternative topologies much smaller than 0.05 indicating a strong rejection of the tested alternatives (Table S5 in Additional file 2). Moreover, the topology t4 performed significantly worse than t1 in two bootstrap tests, whereas the bootstrap probabilities for the topology t2 were 0.063, i.e. very close to the 0.05 threshold. Other tests did not reject the alternative topologies. However, Bayes factor was greater than 9 indicating an overwhelming support for the topology t1 because the commonly assumed threshold for such interpretation is 5 [63].
2.7. Comparison of Palaeognathae tree topologies
All the phylogenetic analyses imply that the relationships presented in the topology t1 describe the most probable evolutionary history between the mitochondrial genomes of palaeognaths. Such relationships, but not always on the full taxa set, were also obtained in other studies based on mitochondrial genes [55, 56], selected nuclear genes [48, 54, 57], the joined set of nuclear and mitochondrial genes [46, 52, 58] as well as the concatenated alignments of many nuclear markers [45, 59, 60]. However, the application of a coalescent species tree approach on these markers and the analysis of retroelement distribution indicated a closer relationship between rheas and the clade of Casuariiformes + Apterygiformes [45, 59, 60]. This phylogeny was also generated for selected nuclear genes [53] and in a supertree approach [47]. These relationships are presented in the topology t2 but are, however, insignificant for the mitochondrial gene set. An alternative, poorly supported topology, in which Rhea clustered with Tinamiformes or Dinornithiformes + Tinamiformes, was obtained for some nuclear genes [54, 57].
Two topologies, t1 and t2, aspire to be the real species tree but it is not easy to evaluate which one is true. Although t1 was found in many studies based on concatenated alignments of many markers, it has been criticized as a true species tree in favor of t2, which was obtained in coalescent-based approaches also based on huge data sets [59, 60]. Moreover, the most common gene tree topologies differed from the coalescent species tree. The topology t2 was supported by 229 loci and was only the 7th in the ranking of the most common gene tree topologies, whereas the topology t1 was the 2nd, supported by 280 loci [59]. The largest number of markers, i.e. 357, indicated another topology, in which Rheiformes was grouped with the clade of Dinornithiformes + Tinamiformes. This discrepancy was explained by the existence of an empirical anomaly zone resulting from incomplete lineage sorting (ILS) across short internal branches leading to the last common ancestor of Rheiformes, Apterygiformes and Casuariiformes [59, 60].
Although coalescent species trees account for ILS, simulations showed that species tree methods based on gene tree summation may not provide significantly better performance over concatenation alignment methods, which can perform even better [64]. Prum, et al. [45], who applied extensive avian taxon sampling and loci with slow substitution rates, found no single locus that was able to fully resolve the tree topology. They concluded that this lack of phylogenetic information can challenge the accuracy of a coalescent-based summary approach relative to concatenation. The multispecies coalescent models work under some conditions [65]. For example, they assume that incongruent gene trees are independently generated from a coalescence process occurring along the lineages of the species tree and there is no selection on the studied markers. Although Cloutier, et al. [60] and Sackton, et al. [59] analyzed noncoding nuclear elements, they were conserved or ultraconserved. Therefore, we cannot exclude that they were involved in essential regulatory functions and subjected to selection [66–68], which could influence the model assumptions.
Generally, nuclear markers are more prone to ILS and hidden gene paralogy [69–76] than mitochondrial genes, which are present in a haploid genome and maternally inherited [77]. Thus, the time needed to completely sort the ancestral polymorphism of mitochondrial DNA is on average four times smaller than for nuclear genes [78]. Introgression of mtDNA is another reason for the discrepancy between the gene and species trees. However, this process concerning maternally inherited mtDNA is restricted between heterogametic avian species because female hybrids are characterized by reduced viability [79–85]. In agreement with that, a survey of causes of mtDNA gene tree paraphyly in birds found that 8% of studied species had paraphyly attributable to incorrect taxonomy, and only 2% on account of incomplete lineage sorting, 1% due to introgressive hybridization, and 3% because of incomplete lineage sorting or hybridization [86]. Moreover, the mitochondrial genes are located on one molecule and are inherited together [74, 87], so they should bear a consistent phylogenetic signal. Accordingly, analyses based on complete mitogenomes have provided well-resolved phylogenies of various avian groups [9, 14, 40–42, 88–91]. Nevertheless, it is not inconceivable that the ILS effect can influence mtDNA in rapidly radiating taxa, in which on-going speciation occurs before genetic sorting [92].
2.8. Distribution of mitogenomic rearrangements in phylogenetic trees of Palaeognathae
We considered both t1 and t2 topologies to analyze the presence and absence of the mitogenomic duplication in the phylogenetic context. Using these relationships, we mapped the mitogenomic features onto these topologies and inferred ancestral states for the individual lineages using maximum parsimony (MP) and maximum likelihood (ML) methods (Fig. 7).
Two methods applied to the topology t1 clearly indicate that the last common ancestor of palaeognaths contained a duplicated region in its mitogenome (Fig. 7a and b). This state was inherited by the ostrich and rhea lineages. The ML method provided the probability P > 0.982 of this state for the last common ancestors of all palaeognaths and non-ostrich palaeognaths. The last common ancestor of the remaining groups, i.e. Dinornithiformes, Tinamiformes, Casuariiformes, Aepyornithiformes and Apterygiformes, could also contain a duplication with P = 0.899. However, the last common ancestor of Casuariiformes, Aepyornithiformes and Apterygiformes lost this duplication (P = 0.914). In turn, the last common ancestor of Dinornithiformes and Tinamiformes still had the duplicated region (P = 0.983), which was probably lost in Tinamus, whereas Anomalopteryx, Dinornis and Emeus maintained only a pseudogenized ND6.
According to the topology t2, the last common ancestors of all palaeognaths and non-ostrich palaeognaths also had the duplication in their mitogenomes with the probability of at least 0.982 (Fig. 7c and d). The duplication was preserved in the last common ancestor of Dinornithiformes and Tinamiformes (P = 0.996) as well as the last common ancestor of Rheiformes, Casuariiformes, Aepyornithiformes and Apterygiformes (P = 0.906). Among four latter orders, only Rheiformes maintained the duplication, whereas the last common ancestor of the other orders had a mitogenome without the duplicated region (P = 0.914).
One could argue that the traces of ND6 pseudogene in Anomalopteryx, Dinornis and Emeus are equivocal. Therefore, we conducted analyses in which we assumed that mitogenomes of these genera had already lost the duplication (Fig. S7 in Additional file 1). However, the general conclusion that the last common ancestor of all Palaeognathae had a mitogenome with a duplicated region did not change independently of the assumed topology. The probability of this state was 0.916 and 0.829 for topology t1 and t2, respectively. The last common ancestor of non-ostrich palaeognaths also had this feature with P = 0.889 and 0.697 for these topologies, respectively.
As discussed in the previous section, the topology t1 is highly supported by mitochondrial data, whereas t2 is supported by nuclear markers in coalescent-based approaches and is regarded as the true species tree by some authors [59, 60]. Assuming that the t1 presents the real mitogenomic history, we superimposed the mitogenome phylogeny onto the potential species tree (Fig. 8). In order to reconcile the alternative positions of rheas in these topologies, we should assume that there existed a heteroplasmy, i.e. at least two types of mitochondrial genomes, in the last common ancestor of non-ostrich palaeognaths. Both mitogenomes probably initially contained a duplication, as indicated by the inferred ancestral states. The lineages of these mitogenomes are marked in Fig. 8 as 1 and 2. Dinornithiformes and Tinamiformes inherited only mitogenome 2, whose duplicated regions had begun to fade and likely disappeared in Tinamus. However, the common ancestor of Rheiformes, Casuariiformes, Aepyornithiformes and Apterygiformes preserved two mitogenomes, but genome 2 lost the duplicated region during the course of evolution. Then, the mitochondrial lineages were segregated: mitogenome 1 with the duplication was left in rheas, whereas mitogenome 2 without the duplication was passed to Casuariiformes, Aepyornithiformes and Apterygiformes.
The assumption about the presence of heteroplasmy in some period of Palaeognathae evolution seems probable because such genomic diversity has also been reported in various avian groups: Accipitriformes [5], Bucerotiformes [6], Charadriiformes [7, 93, 94], Ciconiiformes [95], Columbiformes [96], Gruiformes [97], Passeriformes [98, 99], Pelecaniformes [16, 100], Piciformes [4], Procellariiformes [28], Psittaciformes [101], Sphenisciformes [102], Strigiformes [24] and Suliformes [25]. This period was likely very short. According to the results of molecular dating performed by Kimball, et al. [47], the time elapsed since the divergence of the clade Dinornithiformes + Tinamiformes and that including Rheiformes, Casuariiformes, Aepyornithiformes and Apterygiformes to the separation of Rheiformes from the three latter orders was only 3.6 million years.
2.9. Implications for mitogenome evolution in all Aves
The finding that a palaeognath ancestor contained a mitogenomic duplication challenges the common assumption that this feature evolved independently in individual lineages of Neognathae, i.e. the sister group of Palaeognathae [13, 22, 29–31]. Increasing data indicate that mitogenomes with a duplicated region are present in all or a vast majority of representatives of diverse Neognathae lineages, i.e. Accipitriformes [4, 5, 36–38], Falconiformes [4, 39, 103], Gruiformes [12], Pelecaniformes [4, 15, 16], Psittaciformes [23] and Suliformes [25]. It suggests that ancestors of these groups could have also had a mitogenomic duplication. The presence of this state was recently inferred for the last common ancestor of three closely related orders, Falconiformes, Passeriformes and Psittaciformes [14]. Therefore, it is interesting to consider if this feature was present much earlier in the evolution of birds or even in the last common ancestor of all known Aves.
The current data indicate that mitogenomes with duplications are distributed in 25 out of 43 avian orders (Table S6 in Additional file 2). Among 852 species with known mitogenomes, 262 have a duplication, 532 do not contain this feature and 58 are too incomplete to classify them into one of these categories. However, the number of the former group is likely underestimated because of difficulties with the amplification and sequencing of repeated regions [4]. Accordingly, reanalysis of 13 crane mitogenomes, previously annotated without the duplication, showed that all of them contain in fact the duplicated region [12]. Similarly, 15 mitogenomes of parrots from Cacatuidae and Nestoridae also revealed this character after re-examination using appropriate PCR and sequencing methods [23].
Another important example is the order Chardriiformes. So far, only two representatives have been shown to have duplication in mitochondrial genomes: Calidris pugnax (GQ255993.1) and Turnix velox (MK453380.1). However, annotation of another mitogenome from Alca torda (CM018102.1) revealed the presence of the most fully duplicated avian region (GO-FD) (Fig. 1 and Fig. S8 in Additional file 1). Therefore, we surveyed mitogenomes of 20 additional Charadriiformes annotated without duplication to identify potentially unrecognized duplication of the mitogenomes. Using diagnostic PCR reactions amplifying a fragment between two control regions, we found GO-FD gene order in 17 examined species (data not shown; paper in preparation). The obtained results indicate that underestimation of the mitogenomes with the duplication ranges from 85–100% in the case of Charadriiformes, Gruidae and Psittaciformes. It should be emphasized that the omission of GO-FD gene order can be common in the amplification and sequencing of avian mitogenomes using standard procedures due to the presence of two nearly identical copies. Applying other methods, we obtained previously omitted sequences of the GO-FD rearrangement in mitogenomes of five additional avian orders: Cathartiformes (Cathartes aura), Ciconiiformes (Ciconia nigra), Gaviiformes (Gavia arctica), Podicipediformes (Podiceps cristatus) and Sphenisciformes (Spheniscus demersus) (Fig. S9 in Additional file 1). Furthermore, ab initio annotation of Calypte anna (Apodiformes) and Puffinus lherminieri (Procellariiformes) mitogenomes deposited in GenBank database revealed the presence of GO-FD (Fig. S8 in Additional file 1). The same gene order has been identified in mitogenomes of Morus serrator (Suliformes) as well as Ketupa blackistoni and Ketupa flavipes (Strigiformes) after re-annotation of their duplicated gene rearrangements (Fig. S8 in Additional file 1).
The gathered data (Table S6 in Additional file 2) were used to reconstruct the evolution of mitogenomic duplications in all birds. We mapped the data onto the phylogenetic tree of Aves obtained recently in a supertree approach by Kimball, et al. [47]. If we assume that underestimation of mitogenomes with the duplication is only 48% (much smaller than that above-mentioned), the total number of such mitogenomes in the avian orders already containing at least one mitogenome of this type would exceed the number of mitogenomes without the duplication. It would strongly suggest that the last common ancestor of these orders also contained the mitogenomic duplication in the past. For the orders without this feature, we assumed that they were ancestrally deprived of it. Using these assumptions, we reconstructed ancestral states in the Aves tree (Fig. 9). Both MP and ML methods indicated that the last common ancestor of all Aves contained a mitogenome with the duplication with the probability of 0.776. This state was passed to the ancestors of the main lineages leading to modern orders. The probability of this feature was still above 0.5 for the last common ancestors of Palaeognathae (0.951), Neornithes (0.621), Neoaves (0.760) and later diverged clades (> 0.935). Within Neornithes, the duplication was lost five times independently in the ancestors of Galloanserae, Columbiformes, Otidiformes, Eurypygiformes + Phaethontiformes and Trogoniformes. A similar conclusion can be drawn for the tree topology obtained by Prum, et al. [45] (Fig. S10 in Additional file 1), under which the last common ancestor of all Aves could have had a mitogenome with duplication (P = 0.685), which was also inherited by ancestors of deeply diverged lineages. This state was lost at least four times in Neornithes: Galloanserae, Columbiformes + Otidiformes, Eurypygiformes + Phaethontiformes and Trogoniformes. Cuculiformes would recover the duplication again under this scenario.
This view about the ancestry of the mitogenomic duplication can be further supported by the distribution of the most fully duplicated avian region (GO-FD). It includes the repetition of Cytb/tRNA-Thr/tRNA-Pro/ND6/tRNA-Glu/CR in which only the cytochrome b gene is pseudogenized (Fig. 1). This rearrangement type occurs in 55 mitogenomes distributed among 14 bird orders (Fig. 9, Table S7 in Additional file 2). The length and complexity of this duplication suggests that it is unlikely that it occurred independently at least 8 times (see Fig. 9) because it would require the same recombination pattern and replication errors [12, 16, 25, 104]. Thus, it seems more probable that this state was inherited from shared ancestors by the avian lineages that contain this rearrangement type. In other bird groups, the duplicated regions were subjected to degenerations and lost selected elements.