Characteristics of Complete Mitochondrial Genomes Reveals a Potential Division of Genus Modiolus into Three Genera

Backgrounds Mussels have been one of the most abundant and highly evolved marine biological groups with high level of genetic differentiation between and within species. The yellow-banded horse mussel, Modiolus modulaides (Röding, 1798), is an important economic species in Philippines and an endangered species in Japan, with no whole mitogenome characterized. Methods and Results In this study, we sequenced the whole mitochondrial genome of Modiolus modulaides (Röding, 1798), a medium-sized mussel collected from Guangxi, China. Result shows a 15,422 bp closed-circular DNA molecule, including 12 protein-coding genes, 22 transfer RNA genes and two ribosomal RNA genes. The genome composition is obviously A+T biased (62.5%), with 23.1% A, 39.4% T, 25.4% G and 12.0% C, making AT-skew and GC-skew being -0.26 and 0.36, respectively. Phylogenetic analysis of Modiolus modulaides and other 36 mussel species based on 12 protein-coding genes and two ribosomal RNAs of mitogenomes shows that Modiolus modulaides is most closely related with Modiolus phillippinum (Hanley, 1843). Conclusions A hypothesis that the current genus Modiolus should be separated three distinctive genera was raised based on morphological and phylogenetic characteristics, as well as gene patterns of mitogenomes. These results put forward new insight of phylogenetic relationship of Modiolus species and will help to further study the taxonomy and phylogeny of Mytilidae.


Introduction
Mitochondrial DNA (mtDNA) exhibits special characteristics that differ from the nuclear genome, such as rapid evolution, maternal inheritance, and lack of recombination [1], and has been an e cient tool and widely used in molluscan evolutionary and phylogenetic analyses [2,3]. Molluscan mtDNA is nearly a closed circular molecule, always containing 13 protein-coding genes (NADH dehydrogenase subunits 1-6 and 4L, ATPase subunits six and eight, cytochrome oxidase subunits 1-3, and cytochrome b), 22 transfer RNA genes and two ribosomal RNAs (rrnS and rrnL) [4,5], with missing ATPase subunits eight, variable tRNAs and extra rrnS in some species [6][7][8]. Different mtDNA composition would result in functional diversity, as mtDNA carries genetic information essential to mitochondrial function [9], thus it's interesting to explore the difference of mtDNA related to adaptive evolution. Nonetheless, the gene organization and arrangement of mitochondrial protein-coding genes in closely related species are relatively conservative [10] and can be used to analyze phylogenetic relationships between and above genera [11]. Though development of sequencing technology and the establishment of mitochondrial genome database provides convenience and comprehensive information for molluscan phylogeny studies [12], quantity of current mitogenomes seems far from adequate in comparison with enormous amounts of Mollusca species.
As one of the most abundant and highly evolved groups in Mollusca, mussels (Bivalvia, Pteriomorphia, Mytilida) exhibit a highly variable morphological characteristics and adapt to multiple habitats from fresh water, intertidal zone, to deep sea, resulting in diversity of mitogenomes [13,14]. To date, mitochondrial genomes of only 42 mussel species belonging to 18 genera of 9 subfamilies are available from NCBI.
The yellow-banded horse mussel, Modiolus modulaides (Röding, 1798), known as an important economic species in Philippines [15] and an endangered species in Japan [16], is widely distributed in tropical, subtropical and temperate regions of Indo-West Paci c [17] Sequence assembly, annotation and analyses A5-miseq v20150522 and SPAdesv3.9.0 were used for De Novo assembly of high-quality data. The complete mitochondrial genome sequence was uploaded to the MITOS2 web server (http://mitos2.bioinf.uni-leipzig.de/index.py) for functional annotation. The genetic code selection was set to 05, and the other settings were set according to the default parameters. The boundaries of proteincoding genes (PCGs) were determined by ORF Finder (https://www.ncbi.nlm.nih.gov/or nder). The transfer RNA genes (tRNA) were identi ed using tRNAscan-SE with the default search mode and the invertebrate mitochondrial genetic code source, and the two ribosomal RNA genes (rRNA) were identi ed by sequence comparison with previously reported Modiolus rRNA genes. The mitochondrial genome circle map was drawn by CGView Server.
Nucleotide base composition was calculated using MEGA 7.0. The following formula were used to calculate AT-skew and GC-skew: AT-skew = (A -T)/ (A + T) and GC-skew= (G -C)/ (G + C), in which A, T, G and C represented the frequencies of the four bases. Codon usage for each PCG was estimated using the invertebrate mitochondrial code in the web server Codon Usage (http://www.bioinformatics.org/sms2/codon_usage.html). Values of nonsynonymous substitutions per nonsynonymous site to the number of synonymous substitutions per synonymous site by pairwise comparison between Modiolus modulaides and other species of Modiolus were calculated by DnaSP v5 .

Phylogenetic analysis
The complete mitogenome of Modiolus modulaides and other 46 Mytilidae mitochondrial genome sequences available from GenBank (Table 1) were included to estimate phylogenetic relationships, with Crassostrea gigas (Ostreidae) and Anadara sativa (Arcidae) being outgroups. The nucleotide sequences of 12 PCGs and two rRNA genes of all mitogenomes were aligned using MAFFT v7.313. The nucleotide sequences of 12 PCGs was then imported into MACSE v2.03 for optimization. Gblocks v.0.91b was used with default settings to obtain conserved regions of the aligned sequences. The best-t substitution model for the data set were detected using ModelFinder. Phylogenetic relationships were reported using maximum likelihood (ML) with IQ-TREE v.3.2.6 and Bayesian inference (BI) by MrBayes v.3.2.6. The ML tree consisted of 5,000 bootstrap replicates and automatic algorithm. The BI analyses were performed using 4 parallel Markov chain Monte Carlo (MCMC) chains for 2,000,000 generations, sampling every 100 generations and discarding the rst 25% generations as burn-in. The phylogenetic trees were viewed and edited by iTOL (https://itol.embl.de/).

Mitochondrial genome characterization
The mtDNA of Modiolus modulaies is a closed circular DNA molecule with a length of 15,422 bp (GenBank: OL853493). It contains 36 genes, including 12 PCGs, 2 rRNA genes and 22 tRNA genes ( Table  2). With the exception of trnT, all of these genes are encoded on the heavy strand (H strand). The nucleotide formation of the complete mitogenome is equal to 23.1% A, 39.4% T, 25.4% G and 12.0% C. The AT content and GC content are 62.5% and 37.4%, respectively. A signi cant AT bias was detected. ATskew of mitochondrial genome of Modiolus modulaies is negative and GC-skew is positive, yielding -0.26 and 0.36, respectively. There are ve base overlaps between adjacent genes, ranging in size from 1 bp to 8 bp, summing to a full 16 bp gene overlap. And the longest base overlap locates between rrnL and trnS2, whereas the smallest one is between cob and nad4l. Additionally, the mtDNA of Modiolus modulaides contains 1209 bp intergenic spacer regions at 27 locations ranging from 1 bp to 478 bp. The longest intergenic spacer lies between trnI and nad2.

PCGs and codon usage
The overall length of the PCGs is 10,873 bp, and it occupies 74.9% of the whole mitochondrial genome of Modiolus modulaides. The base composition of PCGs was 21.3% A, 40.52% T, 25.75% G and 12.43% C, yielding an AT content of 61.82%. The 12 PCGs are NADH dehydrogenase subunit 1-6 and 4L (nad1-6 and nad4l), cytochrome c oxidase subunit -(cox 1-3), cytochrome b (Cytb) and one ATPase subunits six (atp6), which are all encoded on heavy strand. The size of 12 PCGs ranged from 276 bp (nad4l) to 1695 bp (nad5). Seven PCGs start with ATG, except for gene nad6 (TTG), nad4 and cox3 (ATT) and atp6 (GTG), while six PCGs (cox1, nad3, nad6, cob, nad5, cox2) stop with TAA and ve PCGs (atp6, nad4, nad2, nad4l, nad1) stop with TAG. The cox3 of Modiolus modulaides has incomplete stop codon, which is T for TAA. Each amino acid is encoded by at least two codons in the PCGs of Modiolus modulaides. The most frequently used codons are TTT (Phe, N = 339 times used, 6.598%). CGC (Arg, N = 10 times used, 0.195%) and ACC (Thr, N = 10 times used, 0.195%) are the least frequently used codons. The most frequent amino acids in the PCGs of Modiolus modulaides is Leucine and the least used amino acid is Histidine, which is consistent with other ve Modiolus species. The predominant amino acids (with frequency above 6%) are Leucine, Serine, Valine, Phenylalanine (Fig. 1a). The amino acid codons ended with A/T are more frequent than those terminated with G/C, re ecting that a high A+T content occurred. According to the ratio of nonsynonymous (Ka) to synonymous (Ks) substitutions, we con rmed that average values of all the protein-coding genes in the mitogenomes of Modiolus species are less than 1, indicating puri cation selection pressures, with only Ka/Ks values of atp6 and nad6 are higher than 0.25 (Fig. 1b).

Ribosomal RNAs and transfer RNAs
The rrnS and rrnL of Modiolus modulaides are 771 bp and 1144 bp, respectively. The rrnL genes of Modiolus species are uniformly arranged between trnF and trnS, but the rrnS genes are arranged between different genes (Fig. 1c). The rrnS of both Modiolus modulaides and Modiolus philipinarum is located between trnS and trnM, while between trnK and atp6 in Modiolus nipponicus, and between trnK and trnQ in other three species (Fig. 1c). The rrnS and rrnL exhibit a AT-skew -0.087 and -0.088, respectively. The mitogenome of Modiolus modulaides has 22 tRNAs of 1441 bp, ranging from 60 bp (trnS2) to 70 bp (trnI and trnQ) ( Table 2), of which trnL and trnS have one replication. The number of tRNAs in Modiolus modulaides is exactly identical to other Modiolus species except Modiolus comptus since it has two copies of trnQ. All tRNAs have a typical clover structure and are encoded in H-chain, except that trnS (TGA) lacks the entire dihydrouridine (DHU) arm and trnT is encoded in L-chain. Difference of quantity and arrangement of tRNA occurs clockwise between nad1 and nad4 within Modiolus, along with the extra atp8 in Modiolus modiolus (Fig. 1c).

Phylogenetic analysis
The topological structure of ML and BI phylogenetic tree constructed by 14 mt-genes (12 PCGs and 2 rRNAs) is almost consistent, except for the position of Bathymodiolus japonicus and Bathymodiolus childressi, with most branches being highly supported, i.e., bootstrap support=100, posterior probability=1 (Fig. 2). Our result of the phylogenetic trees support that all Mytilidae species are divided into two clades, of which clade1 includes the subfamilies of Mytilinae, Musculinae, Arcuatulinae, Septiferinae, Brachidontinae, and Mytiliseptinae and clade 2 includes the subfamilies of Bathymodiolinae, Modiolinae and Arcuatulinae (Limnoperna fortunei). However, the phylogenetic relationships are not in accordance with the existing subfamily-level classi cation that several species that were commonly supposed to belong to Mytilinae actually should occupy different oppositions. Genera Perna and Mytella are closely related with Arcuatula, and genera both Semimytilus and Perna perna are sister to genera from subfamily Brachidontinae. Limnoperna fortunei is more closely related with Modiolinae and Bathymodiolinae species than other Arcuatulinae species.
Within genus Modiolus, the phylogenetic trees shows that three groupings are highly supported and can be easily recognized. Modiolus modulaides is the most closely related with Modiolus philippinarum, making up to grouping 1, which is then gathers with grouping 2 (Modiolus comptus + Modiolus nipponicus). Grouping 3 (Modiolus kurilensis + Modiolus modiolus) has further genetic distance with the other two groupings.

Characteristics of mitogenomes of Mytilidae
The constitution of protein-coding genes of Modiolus modulaies is in accordance with most known Mytilidae mitogenome, except for additional gene atp8 in Mytilus, Bathymodiolus, Brachidontes exustus [18,19], and even Modiolus modiolus from the same genus [20]. Song et al. [21] hypothesized that the absence of gene atp8 resulted from the adaptive evolution of environment, but it was detected in both fresh water and seawater bivalves. An anthropogenic factor might be the false annotation of the proteincoding gene due to the structural properties (such as short length) and extreme variability of atp8 among bivalves [22]. Boore & Brown [23] proposed that gene rearrangements are infrequent and generally remain unchanged during many years of evolutionlong periods of evolutionary time, thus retaining the signal of ancient common ancestor. Nonetheless, the constitution and arrangement of tRNAs is more variable within Modiolus, which presents questions about the previous taxonomy of Modiolus. Here, based on the considerable translocation of tRNA, we think new genus and subgenus should be raised from the previous Modiolus. In addition, we nd the replication and translocation of tRNA genes occurs clockwise between nad1 and nad4 within Modiolus, which makes it a highly active region. Duplication of tRNA is common in mollusks, for example, Mytilus edulis, Muilus galloprovincialis and Mutilus trossulus have a second trnM [24,25], and Mutilus trossulus has an additional copy of trnQ [25]. The secondary structure of trnS2 lacks a DHU arm, forming a simple loop, which is conserved across metazoans [26]. The missing of the DHU arm in trnS2 can be functional in the structural compensation mechanism between other arms [27]. There are ve base overlaps between adjacent genes in the mitogenome of Modiolus modulaides, possibly promoting the miniaturization of mitochondrial genome in response to the natural selection [28].
The mitogenome of Modiolus modulaies exhibits an evident AT bias, identical to all known Mytilidae species [29,30]. This is also re ected in the differential codon usage patterns and amino acid compositions. The initiation codon ATG is a typical start codon in metazoan [31] as well as in Modiolus modulaides. The incomplete stop codon of cox3 here in Modiolus modulaides is presumed to be completed by post-translational polyadenylation [32]. We found that the frequency of amino acid usage is conserved in Modiolus species, strongly correlates with GC content and perhaps, expression level of genes [33]. As mitochondria are energy producing organelles that play a central role in amino acid homeostasis [34], the homogeneous amino acid usage may be a sign of congener similarities in environmental suitability. The Ka/Ks (nonsynonymous/synonymous substitution rates) ratio was commonly used to estimate the effect of selection pressure on the diversity molecular evolution of related species [35]. Result indicates that gene atp6 of Modiolus modulaides is under the least puri cation selection pressure and suffered a rapid, recent evolution [36]. It can provide valuable information into various adaptive mechanisms [37]. Gene atp6 has the highest Ka/Ks value and receives the least puri cation selection. Meanwhile, continuous puri cation selection of cox1 genes means the genes are strongly constrained by evolution to maintain their functional stability and thus often used in species identi cation [16].

Phylogenetic relationships in Mytilidae
The recognition of subfamilies in the family Mytilidae or genera in the subfamily remains inconsistent in different taxonomic systems [11]. The two-clades model of subfamilies in Mytilidae based on mitochondrial characters was also proposed by Lee et al. [3], though detailed division was different from our study. The phylogenetic result in this study strongly recommends that genus Perna (including Perna canaliculus and Perna viridis) should be rearranged into subfamily Arcuatulinae, as it is closest related with Mytella strigata from Arcuatulinae, which can be supported by the similar shell morphology as well [38,39]. Additionally, the sister relationship between Perna and Arcuatula species were proposed by recent phylogenetic studies based on not only anatomical but also molecular features [3,11,14]. The genus Semimytilus under subfamily Mytilinae was described for the rst time by Soot-Ryen [40] to include Mytilus algosus and only this one Semimytilus species (synonymised as Semimytilus patagonicus) has been identi ed till now [41]. And no one doubted the subfamily status of this species before. Here in our study and research of Lubosny et al. [42], Semimytilus is clustered with Perumytilus from subfamily Brachidontinae and then sister to Mytilisepta from subfamily Mytiliseptinae. Therefore, the genus Semimytilus is more likely to belong to subfamily Brachidontinae than Mytilinae based on the phylogenetic relationship in this study, though more morphological, anatomical data and molecular evidence should be considered. Subfamily Mytiliseptinae was rstly proposed by Morton et al. [43] to contain Mytilisepta species. But we insist that Mytilisepta should be retained within the Brachidontes as Mytilisepta keenae is placed among representatives of the Brachidontinae. As for genus Perna, Lee. et al. [3] argued the same manifestation in this study that Perna perna was not clustered with two congeneric species, Perna canaliculus and Perna viridis, but with Brachidontes species from the subfamily Brachiodontinae instead, with no unanimous conclusion. We agree with a polyphyletic model for Perna genus and that the current taxonomy of this genus may be questionable [44].
Within clade 2, Modiolinae is clustered with Bathymodiolinae, implying their close evolutionary relationships, and then sister to a freshwater mussel Limnoperna fortunei, which agrees with previous studies [11,45], but contradicts phylogenetic models based on 18S and anatomical characteristics [14,46]. The gene arrangements also shows that the Modiolinae and Bathymodiolinae are closely related [3]. Modiolus is a large and diverse genus and no recent review is available [47]. As previous studies proposed, subfamily Modiolinae was polyphyletic [11,46]. Interestingly, our phylogenetic tree shows more complicated relationships among three Modiolus species that three groupings could be easily recognized. Although the same protein-coding gene pattern is shared by all six species, the differences in tRNA arrangement cannot be neglected, as gene order is rather highly conserved at lower taxonomic levels in Bivalvia [1]. Consequently, based on the phylogenetic relationships and corresponding to the considerable translocation of tRNA, we harbor the idea that the current genus Modiolus should be separated into three genera, including species of (Modiolus modulaides + Modiolus philippinarum), (Modiolus comptus + Modiolus nipponicus) and (Modiolus kurilensis + Modiolus modiolus), respectively. Morphological characteristics summarized by Wang [48] provided evidences for this hypothesis. Both Modiolus comptus and Modiolus nipponicus are small or medium sized, while the other four species are bigger, with adult shells up to 80 mm or more. The dorsal horn of Modiolus modulaides and Modiolus kurilensis is obvious and high, and the hair is very thin, making them easily confused. Besides, both Modiolus philippinarum and Modiolus modiolus have low dorsal angle and thick hair. The common features within groupings are easily observed and concluded, supporting the three genera hypothesis.

Con ict of interest
The authors have no con icts of interest to declare that are relevant to the content of this article.

Ethical approval
Not applicable Consent to participate