3.1 D. Edwardsii mitogenome organization
The Illumina HiSeq runs resulted in 59,106,998 paired-end reads from the D. Edwardsii library with an insert size of approximately 450 bp. Selective-assembly analysis showed that 185,891 reads were assembled into a 19,858 bp circular molecule, representing the complete mitogenome of D. Edwardsii (Fig. 1 and Table 2). This length is longer than the mitogenome length of other reported Anomura species, which ranges from 15,324 bp in Neopetrolisthes maculatus to 17,910 bp in Munidaisos (Table 1). The genome encodes 37 genes, including 13 PCGs, 2 rRNA genes, and 22 tRNA genes. 28 genes are encoded on the heavy (H) strand, and nine genes are encoded on the light (L) strand (Table 2). The intergenic regions are distributed in 28 locations, and the longest region is 1,971 bp (between trnS2-tca and nad1). Meanwhile, 31 overlapping nucleotides are located in four pairs of neighboring genes for the mitogenome. These overlapping nucleotides vary in length from 1 to 18 bp, and the longest overlap is located between rrnL and trnV-gta. The D. Edwardsii mitogenome has a nucleotide composition of 32.11% A, 10.68% C, 17.16% G, and 40.05% T and an overall AT content of 72.16%. For the D. Edwardsii mitogenome, the AT skew is -0.110, and the GC skew is 0.233, which indicates bias toward T and G. This asymmetry often occurs on Anomura species[11, 33–35]. In mammals, the duration of single-stranded state of the "heavy-stranded" genes during mitochondrial DNA replication can explain this asymmetry [36]. Whether or not the same explanation works on our results remains difficult to predict at this time due to the scarcity of information regarding DNA replication of invertebrate mitochondria.
Table 2
Mitogenome organization of Diogenes Edwardsii.
Gene | Strand | Position | Length | Intergenic region | Start coden | Stop coden | Anticodon |
cox1 | H | 37-1539 | 1503 | - | ATT | TAA | |
trnL1-cta | H | 1535–1601 | 67 | -5 | - | - | CUA |
cox2 | H | 1613–2302 | 690 | 11 | ATG | TAA | |
trnK-aaa | H | 2305–2374 | 70 | 2 | - | - | AAA |
nad2 | H | 2639–3613 | 975 | 264 | ATA | TAG | |
trnD-gac | H | 3617–3685 | 69 | 3 | - | - | GAC |
atp8 | H | 3686–3844 | 159 | 0 | ATT | TAG | |
atp6 | H | 3838–4512 | 675 | -7 | ATG | TAA | |
cox3 | H | 4512–5303 | 792 | -1 | ATG | TAA | |
trnR-cga | H | 5318–5381 | 64 | 14 | - | - | CGA |
trnN-aac | H | 5391–5459 | 69 | 9 | - | - | AAC |
trnE-gaa | H | 5486–5554 | 69 | 26 | - | - | GAA |
trnT-aca | H | 5820–5884 | 65 | 265 | - | - | ACA |
nad6 | H | 5923->6418 | 496 | 38 | ATT | A | |
cob | H | 6419–7549 | 1131 | 0 | ATA | TAA | |
trnS2-tca | H | 7553–7619 | 67 | 3 | - | - | UCA |
nad1 | H | 9591–10517 | 927 | 1971 | ATT | TAA | |
trnP-cca | H | 10521–10588 | 68 | 3 | - | - | CCA |
nad4l | H | 10911–11183 | 273 | 322 | GTG | TAA | |
nad4 | H | 11300–12514 | 1215 | 116 | ATA | TAG | |
trnH-cac | H | 12521–12586 | 66 | 6 | - | - | CAC |
nad5 | H | 12620–14326 | 1707 | 33 | ATA | TAG | |
trnF-ttc | H | 14337–14407 | 71 | 10 | - | - | UUC |
trnI-atc | L | 14711–14776 | 66 | 303 | - | - | AUC |
trnM-atg | L | 14941–15009 | 69 | 164 | - | - | AUG |
trnC-tgc | H | 15395–15465 | 71 | 385 | - | - | UGC |
trnL2-tta | H | 15678–15748 | 71 | 212 | - | - | UUA |
trnG-gga | H | 15750–15815 | 66 | 1 | - | - | GGA |
nad3 | H | 15825–16169 | 345 | 9 | GTG | TAG | |
trnS1-aga | H | 16230–16295 | 66 | 60 | - | - | AGA |
trnY-tac | L | 16322–16387 | 66 | 26 | - | - | UAC |
rrnL | L | 16423–17792 | 1370 | 35 | - | - | |
trnV-gta | L | 17775–17843 | 69 | -18 | - | - | GUA |
rrnS | L | 17844–18639 | 796 | 0 | - | - | |
trnA-gca | L | 19241–19306 | 66 | 601 | - | - | GCA |
trnW-tga | L | 19649–19717 | 69 | 342 | - | - | UGA |
trnQ-caa | L | 19718–19784 | 67 | 0 | - | - | CAA |
For most families of the order Decapoda, congeners belonging to the same family possess identical gene arrangement generally [11], which has particularly been confirmed based on the reported crab mitogenomes [37–39]. However, comparing with the reported mitogenomes of the species Clibanarius infraspinatus and Dardanus arrosor belonging to the same family Diogenidae, the D. Edwardsii mitogenome shows distinctive gene arrangement[11, 40]. For instance, nad5 located between trnH-cac and trnF-ttc in the mitogenome of D. Edwardsii, instead of the common location between trnF-phe and trnH-his in the mitogenomes of Clibanarius infraspinatus and Dardanus arrosor. As a result, the subtle differences in gene arrangement between these species will present new challenges to this conclusion. The gene rearrangement found in D. Edwardsii mitogenome deepens our understanding of this genomic feature about the hermit crabs.
3.2 Protein-coding genes
The total length of all 13 PCGs of D. Edwardsii is 10,888 bp, accounting for 54.83% of the complete length of the mitogenome (Table 2). The 13 PCGs ranged in size from 159 bp (atp8) to 1,707 bp (nad5), and all 13 PCGs are encoded on the heavy strand (Table 2). Furthermore, there are two reading-frame overlaps on the same strand: atp6 and atp8, and atp6 and cox3 each share seven and one nucleotides (Table 2). The PCGs encode 3629 amino acids, among which Ser (16.19%) and Leu (9.84%) are the most and frequently used, and Met (1.64%) and Trp (1.64%) are the least frequently used, respectively. The A + T contents of PCGs and third codon positions in D. Edwardsii mitogenome are 69.08% and 72.30%, respectively. The strong AT-bias in the third codon positions is similar to many species in the infraorder Anomura, i.e., Dardanus arrosor, Shinkaia crosnieri and Pagurus longicarpus [11, 33, 34].
Codon usage bias is a phenomenon in which specific codons are used more frequently than other synonymous codons by certain organisms during the translation of genes to proteins [11]. As shown in Fig. 2, 61 available codons are used in D. Edwardsii mitogenome in total. Start and stop codons varied among the PCGs. Four of 13 PCGs use ATT (cox1, atp8, nad6 and nad1) and ATA (nad2, cob, nad4 and nad5) as start codons, and three PCGs use ATG (cox2, atp6 and cox3) and nad3 uses GTG. Seven of 13 PCGs use TAA (cox1, cox2, atp6, cox3, cob, nad1 and nad4l) as a stop codon, and five PCGs use TAG (nad2, atp8, nad4, nad5 and nad3). Finally, nad6 exhibit an incomplete (A) stop codon. The presence of an incomplete stop codon is a common phenomenon in both invertebrate and vertebrate mitochondrial genes [41–43]. It is assumed that truncated stop codons are completed via post-transcriptional polyadenylation [44]. Previous studies have shown that truncated stop codons were common in the metazoan mitogenomes and may be corrected by post-transcriptional polyadenylation [45, 46]. The relative synonymous codon usage (RSCU) values indicated that the six most commonly used codons are TTA(Leu), TCT(Ser), CCT(Pro), ACT(Thr), GCT(Ala), AGA(Ser) (Fig. 2). Based on RSCU and amino acid composition in the PCGs, comparative analyses showed that the acid composition kept similar with most hermit crabs, whereas the codon usage pattern of D. Edwardsii is unidentified with some Paguroidea species. The codon TCT(Ser) instead of TTA(Leu) was often found to be used most frequently in Paguroidea species mitogenomes, such as Dardanus arrosor, Birgus latro, Pagurus longicarpus and Coenobita rugpsus [9, 11, 34, 47], while not in D. Edwardsii mitogenome.
3.3 Ribosomal RNA and transfer RNA genes
The rrnL and rrnS genes of D. Edwardsii are 1,370 bp (AT% = 67.2) and 796 bp (AT% = 67.7) in length, respectively. The total length of rRNA genes accounts for 10.09% of the D. Edwardsii mitogenome. Both of the two genes are encoded on the light strand. The AT and GC content of rRNAs are 77.42% and 22.58%, and the AT skew and GC skew are − 0.008 and 0.010, respectively, suggesting a slight bias toward the use of T and G.
22 tRNA genes were identified in the mitogenome of D. Edwardsii (Table 2). The total length of tRNA genes is 1,491 bp, accounting for 7.51% of the complete length of the mitogenome. 15 genes (trnL1-cta, trnK-aaa, trnD-gac, trnR-cga, trnN-aac, trnE-gaa, trnT-aca, trnS2-tca, trnP-cca, trnH-cac, trnF-ttc, trnC-tgc, trnL2-tta, trnG-gga, trnS1-aga) are encoded on the heavy strand and the rest four genes (trnL-atc, trnM-atg, trnY-tac, trnV-gta) are encoded on the light strand. The AT skew of tRNA genes is 0.013, and the GC skew is 0.108, indicating bias toward A and G. In addition, the codons with A and T in the third position are used more frequently than other synonymous codons. These characteristics reflect codon usage with A and T bias at the third codon position, similar to the bias found in many metazoans [48–50].
3.4 Phylogenetic relationships
In the present study, the phylogenetic relationship of D. Edwardsii was constructed on the sequences of the concentrated 13 PCGs from the mitochondrial genomes of 16 Anomura species and two outgroups (Fig. 3). Phylogenetic analyses were carried out based on nucleotide sequences of 13 PCGs by the ML method. Anomura was divided into two main branches in our phylogenetic tree, which was identified with the previous researches [35, 51]. The first clade comprised five families (Porcellanidae, Munididae, Paguridae, Lithodidae and Kiwaidae) and the second clade contained two families (Coenobitidae and Diogenidae). It is not difficult to find that the two families of the superfamily Paguroidea belong to two branches of the phylogenetic tree, which presents challenges for the current classification based on the morphology. Even if inter-superfamiliar relationships are not fully resolved, clear trends become visible. The phylogenetic relationships of these seven families exhibit as ((((Porcellanidae + Munididae) + Paguridae) + Lithodidae) + Kiwaidae) + (Coenobitidae + Diogenidae).
Consistent with the traditional morphological classification, the phylogenetic analyses based on molecular methods showed that D. Edwardsii was clustered firstly clustered with the Diogenidae species Clibanarius infraspinatus and then clustered with another Diogenidae species Dardanus arrosor. As reported, Clibanarius infraspinatus possesses a favor of codon TAT(Ser) and owns a highest Ser content in amino acid composition[40], which showed the same pattern with D. Edwardsii. This supports the phylogenetic analysis result that these two species possess close relationship. The ML bootstrap values supporting D. Edwardsii and Clibanarius infraspinatus as well as Dardanus arrosor were not high (< 80), which may be explained by the different gene arrangements between D. Edwardsii and these two species, as discussed in part 3.1. Thus, more species mitogenomes in the family Diogenidae still need to be investigated to fulfill the systematic phylogeny within the infraorder Anomura. Our newly acquired mitogenome data and phylogenetic results can also be better used to provide a basis for studies of the mitochondrial evolution of Anomura.
3.5 Positive Selection Analysis
Comparing the nonsynonymous/synonymous substitution ratios (ω = dN/dS) has become a useful approach for quantifying the impact of natural selection on molecular evolution [26]. ω > 1, = 1 and < 1 indicate positive selection, neutrality and purifying selection, respectively [52]. We examined potential positive selection in the infraorder Anomura using CodeML from the PAML package. The ω(dN/dS) ratio calculated in branching model analysis D. Edwardsii's 13 mitochondrial PCGs was 0.02684 under the one-ratio model (M0) (Table 3), indicating that these genes undergo constrained selection pressure to maintain primary function. Then, in the comparison of the one-ratio model (M0) and the free-ratio model (M1), the LRTs indicated that the free-ratio model fit the data better than the one-ratio model (Table 3). The ω ratio of the D. Edwardsii branch was higher than that of another branch (ω1 = 0.02703 and ω0 = 0.02683), indicating divergence in selection pressure between Anomura species and other species. The higher ω ratio of the foreground branch indicated a stronger positive selection of D. Edwardsii by the infraorder Anomura.
Table 3
CodeML analyses of selection pressure on mitochondrial genes in Diogenes Edwardsii.
Trees | Models | lnL | Parameter Estimates | Model Compared | 2ΔL | LRT p-Value |
Branch models | | | | | | |
ML tree | M0 | -120734.213411 | ω = 0.02684 | | | |
| Two-ratio | -119787.8256 | | Free-ratio vs. M0 | 1892.77562 | 0.00000 |
| Free-tatio | -120734.211446 | ω0 = 0.02683 ω1 = 0.02703 | Two-ratio vs. M0 | 0.00393 | 0.95001 |
Branch-sits models | | | p0 = 0.90771; p1 = 0.03533; p2a = 0.05483; p2b = 0.00213 | | | |
ML tree | Null model | -119878.534338 | ω0 = 0.02575; ω1 = 1.00000; ω2a = 1.00000; ω2b = 1.00000 | | | |
| Model A | -119874.014382 | p0 = 0.91806; p1 = 0.03578; p2a = 0.04443; p2b = 0.00173 ω0 = 0.02593; ω1 = 1.00000; ω2a = 4.09193; ω2b = 4.09193 | Model A vs. null model | 9.03991200000746 | 0.002641484 |
Table 4
Possible sites under positive selection in Diogenes Edwardsii.
ML tree |
Gene | Codon | Amino acid | BEB values |
cox1 | 1839 | M | 0.981* |
cox2 | 2810 | S | 0.958* |
* 0.95 < BEB < 0.99 |
Moreover, a number of studies have shown that positive selection often occurs over a short evolutionary time and performed on only a few sites [53, 54]. Thus, the signals of positive selection are often overwhelmed by those for successive purifying selection that occur at most sites in the gene sequence [53–55]. In the current study, branch-site models were used to determine possible positively selected sites in D. Edwardsii (Table 4). Two residues, located in cox1 and cox2, were identified as positive selected sites in D. Edwardsii (> 95%), respectively, indicating that these two genes are under positive selection pressure. It is known that mitochondrial PCGs are crucial in the oxidative phosphorylation pathway. Cytochrome c oxidase catalyzes the oxygen terminal reduction, and the catalytic core is encoded by three mitochondrial PCGs (cox1, cox2 and cox3) [55, 56]. The positive selected sites found in cox1 and cox2 may be associated with the adaptive evolution of D. Edwardsii living in specific marine environment within the infraorder Anomura. Our study provides a basic foundation for understanding the evolution mechanisms of D. Edwardsii to obtain adequate energy in marine ecosystem at the mitochondrial level.