Genome assemblies and data matrix
We sequenced genomic DNA libraries from eleven species of heterobranch gastropods and extracted mitochondrial sequences (Table 1, Supplementary Table 1). Despite high coverage, a single contiguous mitochondrial genome was recovered for only two of the eleven taxa. All of the newly sequenced mt genomes were incomplete to some degree, possibly due to difficulties in sequencing through secondary structures associated with the 16S rDNA (which was absent from or incomplete in several of our assemblies) and the control region, but the vast majority of the mitochondrial protein-coding genes was obtained for all species. Alignments of amino acid sequences were produced for the thirteen protein-coding genes obtained from the newly sequenced taxa and publicly available heterobranch mt genomes on NCBI (Supplementary Table 1). After trimming each alignment to remove ambiguously aligned positions, the concatenated data matrix totaled 4,735 amino acid sites with 31.3% gaps across 104 taxa (17 outgroups and 87 heterobranchs).
Table 1
Mitochondrial genomes sequenced in the present study and associated sources of samples
Taxon | mt contig length (bp) | Protein coding genes (13) | tRNAs (22) | rRNAs (2) | Complete | Missing genes | Sample accession | Providence |
Acochlidium fijense | 14098 | 13 | 20 | 2 | Nearly | trnS2, trnK | ZSM Mol-20130988 | Zoologische Staatssammlung Münchhausenstr |
Clione limacina | 14599 | 12 | 17 | 1 | Nearly | trnl, atp8, trnS1, trnD, rrnL, trnY, trnR | ZSM Mol-20081086 | Zoologische Staatssammlung Münchhausenstr |
Illbia ilbi | 13944 | 12 | 21 | 2 | Nearly | atp8 | MRG10019-03 | Shoreham Beach 18March 2017, collected by A Falconer |
Microdiscula charopa | 12965 | 13 | 21 | 2 | Yes | | MRG828-06 | Dutton Way, Portland 2Mar 2017, approx 2 m deep, collected by A Falconer |
Omalogyra atomus | 12413 | 10 | 20 | 1 | No | atp8, rrnS, trnY, nad4l, trnV | ZSM Mol-20142011 | Zoologische Staatssammlung Münchhausenstr |
Psilaxis radiatus | 12154 | 12 | 15 | 1 | No | atp8, trnN, trnS1, tncC, trnF, trnY, rrnL, trnQ, trnV | ZMBN 94175 | Museum of Zoology at the University of Bergen |
Ringicula conformis | 14017 | 12 | 22 | 2 | Nearly | nad4l | Donation; none | Museum of Zoology at the University of Bergen |
Rissoella morrocayensis | 11085 | 12 | 5 | 1 | No | trnS1, rrnL, trnI, trnN, trnC, trnF, trnY, trnQ, | ZMBN 99933 | Museum of Zoology at the University of Bergen |
Runcina ornata | 13862 | 13 | 22 | 2 | Yes | | ZMBN 87949 | Museum of Zoology at the University of Bergen |
Tylodina cf. corticalis | 14614 | 13 | 21 | 1 | Yes | trnK, rrnL | USNM 1442311 | Smithsonian Institution National Museum of Natural History |
Valvata cf. cristata | 14495 | 13 | 21 | 1 | Nearly | trnR, rrnL | ZSM Mol-20170210 | Zoologische Staatssammlung Münchhausenstr |
*Yes = genome circularizes via overlapping ends, missing genes likely missed by annotator *Nearly = all genes listed on a single scaffold, but scaffold does not contain a full mitochondrion *No = mitochondrion either exists on two discontinuous scaffolds or is missing multiple protein coding genes |
Maximum likelihood analyses
Maximum likelihood (ML) analysis of this data matrix (Supplementary Fig. 1) resulted in a tree with Valvata cf. cristata (Valvatoidea) recovered as sister to a clade containing all other heterobranchs with successive branching of Microdiscula charopa (Orbitestelloidea), a clade composed of Clione limacina (Gymnosomata), Psilaxis radiatus (Architectonicoidea), Omalogyra sp. (Omalogyroidea), and Rissoella morrocayensis (Rissoelloidae), and then Rhopalocaulis grandidieri (Veronicelloidea), which was sister to all remaining Heterobranchia. All members of the clade containing Clione, Psilaxis, Omalogyra, and Rissoella exhibited extremely long branches relative to the other heterobranchs and it is well-established that Gymnosomata is nested within Euopisthobranchia. Thus, we strongly suspected that this clade was the result of long-branch attraction. This, combined with very low backbone support values, led us to re-run the analysis with the following unstable (see Supplementary Table 3) and long-branched taxa removed: Clione limacina, Psilaxis radiatus, Omalogyra sp., Rissoella morrocayensis.
The matrix with unstable and long-branched taxa removed totaled 4447 amino acid sites with 28.7% gaps across 99 taxa. In the resulting ML tree (Fig. 1), Valvata was again recovered as the sister taxon to a clade composed of all other heterobranchs (Euthyneura; bootstrap support, bs = 100) followed by the successive branching of Microdiscula (bs = 62) and Rhopalocaulis (bs = 100). Most major clades of Heterobranchia were recovered with high support: Acteonoidea, Nudipleura, Cephalaspidea, Runcinida, Aplysiida, Siphonariida, Sacoglossa, and Stylommatophora were all recovered with 100% bootstrap support, and Systellommatophora with 99% bootstrap support. Of the family- and order-level taxa, Ellobioidea is the only one that was recovered as non-monophyletic, with Pedipes pedipes and Myosotella myosotis falling well outside of the clade containing the rest of Ellobioidea. Support for relationships among most higher-level heterobranch clades was generally weak and a number of higher-level groupings within Heterobranchia including Tectipleura, Euopisthobranchia, Panpulmonata, Eupulmonata, Systellommatophora, and Amphipulmonata (sensu [19]) were not recovered monophyletic. However, Nudipleura (Nudibranchia + Pleurobranchomorpha) and a clade composed of Aplysiida + Umbraculoidea were recovered monophyletic with maximal support.
Bayesian inference analysis
Because of poor support for most nodes of interest in our maximum likelihood analyses, we also performed a Bayesian inference analysis with the CAT + GTR + G4 model on the same datasets, but only the analysis of the dataset with unstable and long-branched taxa removed reached convergence. Of the six chains that were run for over 60,000 generations, three converged (bpcomp maxdiff value = 0.29), yielding a tree with a topology that is much more consistent with the current understanding of heterobranch relationships (Fig. 2). Valvata and Microdiscula were recovered in a polytomy with a clade that comprised all other heterobranchs, which received maximal support. This clade consisted of a polytomy of Nudipleura + Acteonoidea, Ringicula, and the remaining heterobranchs. Nudipleura + Acteonoidea was weakly supported but Nudipleura again received maximal support.
The largest recovered subclade within Heterobranchia, Tectipleura, consisted of Euopisthobranchia (Cephalaspidea, Runcinida, Aplysiida, and Umbraculoidea) and Panpulmonata (Siphonariida, Sacoglossa, Hygrophila, Ellobioidea, Systellommatophora, and Stylommatophora), which were recovered reciprocally monophyletic and both clades received maximal support. Within Euopisthobranchia, Cephalaspidea and Runcinida formed a (weakly supported) clade sister to a clade of Aplysiida + Umbraculoidea, the latter of which was strongly supported (posterior probability, pp = 0.98).
Within Panpulmonata, Siphonariida was recovered as the sister taxon to the rest of the clade with Sacoglossa sister to all other taxa within that clade. The remaining panpulmonates formed two clades. One consisted of Stylommatophora, Systellommatophora, and Ellobioidea, although neither Systellommatophora nor Ellobioidea were recovered monophyletic. Rhopalocaulis (Systellommatophora) was recovered as the sister taxon of Stylommatophora (pp = 0.71) and this clade was recovered in a polytomy with the ellobioids Pedipes and Myosotella that was maximally supported. Sister to this polytomy was a moderately well-supported clade (pp = 0.98) in which the remainder of Systellommatophora (Onchidiidae) was sister to the remaining Ellobioidea. Sister to the Stylommatophora-Systellommatophora- Ellobioidea clade was a clade comprising Hygrophila (pp = 1.00), Pyramidella dolabrata (Pyramidelloidea), Salinator rhamphidia (Amphiboloidea), and Acochlidium fijiensis (Acochlidia). Salinator and Pyramidella formed a well-supported clade (pp = 0.99) but otherwise, higher-level relationships in the Hygrophila-Pyramidelloidea-Amphiboloidea-Acochlidia clade were weakly supported.
Mitochondrial gene order evolution
A somewhat diagnostic gene arrangement exists for heterobranchs relative to other gastropod clades, but many heterobranch taxa and subclades have differences in both gene organization and orientation in their mitochondrial genomes (Fig. 3). Caenogastropods encode all mitochondrial genes in the same orientation, while all members of the clade comprised by Neritimorpha and Vetigastropoda share a single inversion of [12S rRNA, 16S rRNA, nad1, nad6, cytB, nad4L, nad4, nad5]. Across the diverse taxa used as outgroups in this study, no individual deviations in gene arrangement were found.
In contrast to this consistency, the taxa at the base of our heterobranch tree all have remarkably different mitochondrial gene arrangements from one another. Mitochondrial gene order within most of the “lower Heterobranchia” is variable: Psilaxis radiatus (Architectonicoidea), Omalogyra sp. (Omalogyridae), Rissoella morrocayensis (Rissoellidae), and Valvata cf. cristata (Valvatidae) all have distinct gene orders from one another and from remaining heterobranchs, including changes in both order and orientation. Microdiscula charopa also has an entirely unique gene order.
In the remaining heterobranchs, the clade spanning Acteonoidea, Nudipleura, and the subclade including Runcinida, Cephalaspidea, Umbraculoidea, Aplysiida, Siphonariida, Sacoglossa, Amphiboloidea, Pyramidellidae, Hygrophila, Acochlidia, Systellommatophora, Ellobioidea, and Stylommatophora, a relatively mitochondrial gene order and orientation exists, referred to here as the common heterobranch gene order [cox1-16SrRNA-nad6-nad5-nad1-nad4L-ctyB-cox2-atp8-atp6-12SrRNA-nad3-nad4-cox3-nad2, with atp8-nad3 and cox3 both reversed in direction from cox1]. All members of Nudipleura examined adhere to this common gene order except Hypselodoris festiva, in which a single gene (nad4) changed position, and all members of Acteonoidea adhere to the same order as well. Variation exists within the Cephalaspidea, with a shared rearrangement of cytB, nad1, nad4L, and cox2 shared among ¾ members, and Sagaminopteron nigropunctatus containing further rearrangements. Aplysiida adheres to the stable arrangement with the exception of Aplysia kurodai, in which the orientation of the 12S rRNA gene is reversed though its position remains the same.
Both representatives of Siphonariida have different internal mitochondrial gene rearrangements: Siphonaria gigas reversed the positions of nad4 and nad3, while Siphonaria pectinata inserts cox2 between nad4L and cytB. All sacoglossans shared a common gene order, as does Pyramidella dolabrata and Achochlidium fijensis. The majority of Ellobioidea are consistent, excepting Myostella myosotis and Pedipes pedipes, which have different single gene transpositions than one another. Additionally, the mt genome of Pedipes pedipes is expanded, with more intergenic space than other closely related taxa. Interestingly, these two taxa are those that fall together in a different part of the phylogeny than the remaining members of Ellobioidea, making this clade paraphyletic. The clade comprising the Hygrophila has high support, and all members within it share the common heterobranch gene order except Physella acuta, which has a completely unique gene arrangement.
Within Stylommatophora, both members of genus Achatinella shared a single gene (cox2) moved to a different position. Likewise, the taxa Cylindrus obtusus, Cepaea nemoralis, and Helix aspersa all share a single gene (nad4) inserted at a different location in the mitochondrial genome. Arion rufus has the 12S rRNA placed prior to atp8-atp6 instead of after it, but all other members of Stylommatophora shared the common heterobranch gene order.