Genome assemblies and data matrix
We sequenced genomic DNA libraries from eleven species of heterobranch gastropods and extracted mitochondrial sequences (Table 1, Supplementary Table 1). Despite high sequencing coverage, a single contiguous mitochondrial genome was recovered for only two of the eleven taxa. All of the newly sequenced mt genomes were incomplete to some degree, possibly due to difficulties in sequencing through secondary structures associated with the 16S rDNA (which was absent from or incomplete in several of our assemblies) and the control region, but most of the mitochondrial protein-coding genes were obtained for all species. Alignments of amino acid sequences were produced for the thirteen protein-coding genes obtained from the newly sequenced taxa and publicly available heterobranch mt genomes on NCBI (Supplementary Table 1). After trimming each alignment to remove ambiguously aligned positions, the concatenated data matrix totaled 4,735 amino acid sites with 31.3% gaps across 104 taxa (17 outgroups and 87 heterobranchs).
Maximum likelihood analyses
A partitioned maximum likelihood (ML) analysis of this data matrix using the best-fitting model for each gene (Supplementary Figure 1; See supplementary data on FigShare for more information) resulted in a tree with Valvata cristata (Valvatoidea) recovered as the sister taxon to a clade containing all other heterobranchs with successive branching of Microdiscula charopa (Orbitestelloidea), a clade composed of Clione limacina (Gymnosomata), Psilaxis radiatus (Architectonicoidea), Omalogyra atomus (Omalogyroidea), and Rissoella morrocayensis (Rissoelloidae), and then Rhopalocaulis grandidieri (Veronicelloidea), which was the sister taxon of all remaining Heterobranchia. All members of the clade containing C. limacina, P. radiatus, O. atomus, and R. morrocayensis exhibited extremely long branches relative to the other heterobranchs and it is well-established that Gymnosomata is nested within Euopisthobranchia. Thus, we strongly suspected that this clade was the result of long-branch attraction. This, combined with very low backbone support values, led us to re-run the analysis with the following unstable and long-branched taxa removed: C. limacina, P. radiatus, O. atomus, and R. morrocayensis (see Supplementary Table 3).
The matrix with unstable and long-branched taxa removed totaled 4,447 amino acid sites with 28.7% gaps across 99 taxa. In the resulting partitioned ML analysis using the best-fitting model for each gene (Figure 1), Valvata was again recovered as the sister taxon to a clade composed of all other heterobranchs (bootstrap support, bs = 100) followed by the successive branching of Microdiscula (bs = 62) and Rhopalocaulis (bs = 100). Most major clades of Heterobranchia (Euthyneura sensu lato) were recovered with high support: Acteonoidea, Nudipleura, Cephalaspidea, Runcinida, Aplysiida, Siphonariida, Sacoglossa, and Stylommatophora were all recovered with 100% bootstrap support, and Systellommatophora with 99% bootstrap support. Of the family- and order-level taxa, Ellobioidea is the only one that was recovered as non-monophyletic, with Pedipes pedipes and Myosotella myosotis falling well outside of the clade containing the rest of Ellobioidea, albeit with very low support. Support for relationships among most higher-level heterobranch clades was generally weak and a number of higher-level groupings within Heterobranchia including Tectipleura, Euopisthobranchia, Panpulmonata, Eupulmonata, Systellommatophora, and Amphipulmonata (sensu [20]) were not monophyletic. However, Nudipleura (Nudibranchia + Pleurobranchomorpha) and a clade composed of Aplysiida + Umbraculoidea were recovered monophyletic with maximal support.
To explore the impact of different partitioning schemes on tree topology, and to determine whether other models better mitigated the long-branch attraction of C. limacina, P. radiatus, O. atomus, and R. morrocayensis, we ran several additional analyses. A partitioned analysis with a mixed model (LG+C60+G+F) yielded a tree (Supplementary Figure 2) with the same clade of long-branched taxa as sister to all remaining heterobranchs except Valvata and Microdiscula and did not vary significantly in any other way from the original ML tree (Supplementary Figure 1). A ML analysis with Lanfear clustering (Supplementary Figure 3) and a fully partitioned ML analysis with resampling within partitions (Supplementary Figure 4) both produced similar trees to the initial partitioned ML analysis (Supplementary Figure 1), but with the two members of Ellobioidea that did not form a clade with the rest of Ellobioidea (Myosotella and Pedipes) falling outside Hygrophila and thus even farther from the remaining ellobioids. An analysis with an edge-unlinked model altered the relationships within the long-branched clade, with O. atomus and P. radiatus as sister to R. morrocayensis and C. limacina, and, while in previous analyses R. morrocayensis had a much longer branch than the other taxa in this clade, in the edge-unlinked tree, all four taxa had similarly elongated branches (Supplementary Figure 5). In this edge-unlinked analysis, the positions of Hygrophila and the pair of ellobiods were recovered with relationships similar to those of the initial partitioned ML tree (Supplementary Figure 1). We also analyzed the set of all taxa except C. limacina to assess whether removal of this single rapidly-evolving taxon would ‘release’ the other long-branched taxa, which are traditionally considered to be lower heterobranchs, from this putative LBA artifact. The other three long-branched taxa remained in the same location with long branches (Supplementary Figure 6).
Bayesian inference
Because of poor support for most nodes of interest in our maximum likelihood analyses, we also performed a Bayesian inference with the CAT+GTR+G4 model on the same datasets, but only the analysis of the dataset with unstable and long-branched taxa removed reached convergence. Of the six chains that were run for over 60,000 generations, four converged according to the PhyloBayes bpcomp maxdiff criterion (maxdiff value = 0.29), yielding a tree with a topology that is much more consistent with the current understanding of heterobranch relationships (Figure 2). Valvata and Microdiscula were recovered in a polytomy with a clade that comprised all other heterobranchs, which received maximal support. This clade consisted of a polytomy of Nudipleura + Acteonoidea, Ringicula, and the remaining heterobranchs. Nudipleura + Acteonoidea was weakly supported but Nudipleura again received maximal support.
The largest recovered subclade within Heterobranchia, Tectipleura, consisted of Euopisthobranchia (Cephalaspidea, Runcinida, Aplysiida, and Umbraculoidea) and Panpulmonata (Siphonariida, Sacoglossa, Hygrophila, Ellobioidea, Amphiboloidea, Systellommatophora, and Stylommatophora), which were recovered reciprocally monophyletic and both clades received maximal support. Within Euopisthobranchia, Cephalaspidea and Runcinida formed a (weakly supported) clade sister to a clade of Aplysiida + Umbraculoidea, the latter of which was strongly supported (posterior probability, pp = 0.98).
Within Panpulmonata, Siphonariida was recovered as the sister taxon to the rest of the clade (pp = 1) with Sacoglossa sister to all other taxa within that clade (pp = 0.96). The remaining panpulmonates formed two clades. One consisted of Stylommatophora, Systellommatophora, and Ellobioidea, although neither Systellommatophora nor Ellobioidea were recovered monophyletic. Rhopalocaulis (Systellommatophora) was recovered as the sister taxon of Stylommatophora (pp = 0.71) and this clade was recovered in a polytomy with the ellobioids Pedipes and Myosotella that was maximally supported. Sister to this polytomy was a moderately well-supported clade (pp = 0.98) in which the remainder of Systellommatophora (Onchidiidae) was sister to the remaining Ellobioidea. Sister to the Stylommatophora-Systellommatophora- Ellobioidea clade was a clade comprising Hygrophila (pp = 1.00), Pyramidella dolabrata (Pyramidelloidea), Salinator rhamphidia (Amphiboloidea), and Acochlidium fijiensis (Acochlidia). Salinator and Pyramidella formed a well-supported clade (pp = 0.99) but otherwise, higher-level relationships in the Hygrophila-Pyramidelloidolyea-Amphiboloidea-Acochlidia clade were weakly supported.
Mitochondrial gene order evolution
A somewhat diagnostic gene arrangement exists for heterobranchs relative to other gastropod clades, but many heterobranch taxa and subclades have differences in both gene organization and orientation in their mitochondrial genomes (Figure 3). Caenogastropods encode all mitochondrial genes in the same orientation, while all members of the clade comprising Neritimorpha and Vetigastropoda share a single inversion of [12S rRNA, 16S rRNA, nad1, nad6, cytB, nad4L, nad4, nad5]. Across the diverse taxa used as outgroups in this study, no individual deviations in gene arrangement were found.
In contrast to this consistency, the taxa at the base of our heterobranch tree all have remarkably different mitochondrial gene arrangements from one another. Mitochondrial gene order within most of the “lower Heterobranchia” is variable: Psilaxis radiatus (Architectonicoidea), Omalogyra atomus (Omalogyridae), Rissoella morrocayensis (Rissoellidae), and Valvata cf. cristata (Valvatidae) all have distinct gene orders from one another and from remaining heterobranchs, including changes in both order and orientation. Microdiscula charopa also has an entirely unique gene order.
In the remaining heterobranchs, the clade spanning Acteonoidea, Nudipleura, and the subclade including Runcinida, Cephalaspidea, Umbraculoidea, Aplysiida, Siphonariida, Sacoglossa, Amphiboloidea, Pyramidellidae, Hygrophila, Acochlidia, Systellommatophora, Ellobioidea, and Stylommatophora, a relatively stable mitochondrial gene order and orientation exists, referred to here as the common heterobranch gene order [cox1-16SrRNA-nad6-nad5-nad1-nad4L-ctyB-cox2-atp8-atp6-12SrRNA-nad3-nad4-cox3-nad2, with atp8-nad3 and cox3 both reversed in direction from cox1]. All members of Nudipleura examined adhere to this common gene order except Hypselodoris festiva, in which a single gene (nad4) changed position, and all members of Acteonoidea adhere to the same order as well. Variation exists within the Cephalaspidea, with a shared rearrangement of cytB, nad1, nad4L, and cox2 shared among three-fourths of its members, and Sagaminopteron nigropunctatum containing further rearrangements. Aplysiida adheres to the stable arrangement with the exception of Aplysia kurodai, in which the orientation of the 12S rRNA gene is reversed though its position remains the same.
Both representatives of Siphonariida have different internal mitochondrial gene rearrangements: Siphonaria gigas reversed the positions of nad4 and nad3, while Siphonaria pectinata inserts cox2 between nad4L and cytB. All sacoglossans shared a common gene order, as do Pyramidella dolabrata and Acochlidium fijiensis. The majority of Ellobioidea are consistent, excepting Myosotella myosotis and Pedipes pedipes, which have different single-gene transpositions than one another. Additionally, the mt genome of P. pedipes is expanded, with more intergenic space than other closely related taxa. Interestingly, these two taxa are those that fall together in a different part of the phylogeny than the remaining members of Ellobioidea, making this group paraphyletic. The clade comprising Hygrophila was strongly supported, and all members within it share the common heterobranch gene order except Physella acuta, which has a completely unique gene arrangement.
Within Stylommatophora, both members of genus Achatinella shared a single gene (cox2) moved to a different position relative to other members of the clade. Likewise, the taxa Cylindrus obtusus, Cepaea nemoralis, and Cornu aspersum (syn. Helix aspersa) all share a single gene (nad4) inserted at a different location in the mitochondrial genome. Arion rufus has the 12S rRNA placed prior to atp8-atp6 instead of after it, but all other members of Stylommatophora shared the common heterobranch gene order.