Mitogenome features
The complete mitogenome sequences of nine Ctenidae species range from 13,035 bp in Ctenus hibernalis to 14,724 bp in Phoneutria boliviensis. This variation in the size of base pairs of these mitogenomes was due to the process of extraction and assembly from transcriptomes available in databases performed in this work. For example, in Ctenidae species, our results show the nucleotide sequence absence of several tRNAs, identifying between 9 and 20 genes (out of a total of 22 tRNAs). In addition, the absence of the sequence corresponding to the nad4 L gene (in Ctenus hibernalis, Ctenus exlineae and Cupiennius salei) and the nad3 gene (Anahita punctulata) was also detected.
According to Forni et al. 30, although the procedure to reconstruct complete mitochondrial genomes from RNA-Seq experiments is efficient, certain mitochondrial regions, such as tRNAs, D-loops and a fragment of the atp6 gene, have low read representatively and therefore often generate small gaps during the assembly process. In this same work, the authors indicate that the percentage of reads reached during the process of extraction and assembly of mitogenomes from transcriptomes varies from 16 to 57% using both intrafamilial and congeneric starting references, filtering for possible contaminants and nuclear gene matches 30.
Previous studies showed that, in addition to the low coverage (˂4x) in the aforementioned mitochondrial regions, other characteristics of the RNA-Seq experiments (such as depth of coverage or platform type) or the use of reference mitogenomes (closest evolutionary) are determining factors in obtaining the most complete mitogenome possible 30,31. These characteristics would explain why of the 51 transcriptomes analyzed in this work, only 18 mitogenomes satisfied our inclusion criteria; in some of them, small gaps in the previously mentioned mitochondrial regions were found. Because of this, the mitochondrial genomes of Ctenus hibernalis and Cupiennius salei were not taken into account in the phylogenetic analysis, as they did not have more than 80% of the coding region. The map of the nine mitochondrial genomes of Ctenidae family species is summarized in Figure 1. Therefore, only the genetic order of P. boliviensis and Leptoctenus byrrhus were considered for CREX analysis.
The other nine mitogenomes of Araneae spieces belonging to three different taxonomic families were obtained in this work, ranging from 12,991 bp (Aphonopelma hentzi, Family Theraphosidae) to 14,749 bp (Phidippus johnsoni, Family Salticidae) in size. The difference in size in base pairs between these mitogenomes is mainly due to the absence of the D-loop region sequence not only in those obtained in this work but also in those deposited in the databases. Despite this, the relatively small mitogenome sizes (<15 kb) are well within the observed range of other Araneae mitogenomes available from the GenBank database 4,11,13−15 compared to those of other Arthropod species 7. In addition, truncated tRNA genes may explain why spider mitogenomes are smaller than those of other arthropods 32
The GC content of nucleotides was between 21.6% (Cupiennius salei, family Ctenidae) and 34.7% (Calileptoneta cokendolpheri, family Leptonetidae). The highest GC content was observed in the PCG (22.3-36.3%), followed by tRNAs (21.8-30.6%), rRNAs (18.9-30.5%), and D-loops (17.2-34.5%). All eighteen Araneae mitochondrial genomes presented the structure of 37 genes (13 PCGs, 22 tRNAs and two rRNAs) and an A+T-rich D-loop region, where a total of 573 genes were annotated. However, some of these species did not have this complete gene set. The features of the eighteen spider mitochondrial genomes obtained in this work are summarized in Table 1 and Supplementary Table S2.
As in most Araneae mitogenomes, the nucleotide composition shows a low G+C, with contents of less than 35% and small variations between species 13,14, similar to those observed in Metazoan mitogenomes 33. In addition, the majority strand contains 22 genes and a minority contains 15 genes, similar features to those presented in other spider mitogenomes 4,14.
Variation in the D-loop region in spiders
The D-loop region was detected and annotated in 11 of the 18 mitochondrial genomes obtained in this work. Similarly, this region was observed in 66 of the 76 spiders, located between trnM and trnQ in all Opisthothelae spiders and between rrnS and trnI in Mesothelae spiders (Supplementary Table S2). The position of the D-loop region is consistent with that observed in previous work in several spider species 4,14. In addition, previous studies indicate that the D-loop region remained totally or partially assembled, mainly because it is composed of tandem repeats, which are very difficult to sequence and de novo assemble properly; likewise, their expression in RNA-Seq experiments is low or completely lacking 30. Moreover, we obtained complete or partial D-loop regions in 61% of them when mitochondrial genomes of congeneric species were used as references.
Our results show that the D-loop of the 66 spider region is characterized by an average size of 766.2 bp, an average AT % of 75.1% and a low nucleotide identity of 18.7%, showing high variation in this species. For example, the two spiders of the suborder Mesothelae (family Liphistiidae) and the five species of the suborder Mygalomorphae (families Nemesiidae (1), Theraphosidae (3), Dipluridae (1)) have a D-loop region with an average of 388.5 bp, while the species of the suborder Araneomorphae have a D-loop region with an average of 800.3 bp. The largest D-loop region was observed in Argyroneta aquatica (family Cybaeidae) at 2047 bp, where both 5' and 3' AT tandem repeat expansions were detected. Likewise, the AT content was variable in all species, ranging from 66.3–82.6%. However, most of the D-loop regions belonging to the Liphistiidae, Theraphosidae, Dipluridae, Dysderidae, Hypochilidae, Sicariidae, Pholcidae, Oecobiidae, Araneidae, Nephilidae, Eutichuridae and Selenopidae families showed positive AT/GC skews, indicating an obvious bias toward the use of A and G, similar to those reported in Lyrognathus crotalus of the Theraphosidae family 4 and other spiders 32. In contrast, our analyses show that mainly Entelegynae spiders have a negative AT/GC skew, indicating a bias toward the use of T and G.
Regarding the presence of repetitive regions, our results show three categories of D-loop regions: a) sequences with a size in pb of ≤500 bp (n= 14) that do not have tandem repeat elements; b) D-loop regions between 500 and 740 bp (n= 25), that do not have tandem repeats or that are of low complexity (AT-rich regions with less than 20 nucleotides and less than 2 copies); and c) D-loop regions above 740 bp (n= 27), that have tandem repeats of higher complexity (AT-rich regions with more than 20 nucleotides and more than 3 copies). The molecular characteristics of the D-loop region in the spider species analyzed are summarized in Supplementary Table S3.
Although these molecular features (AT/GC skew and presence of tandem repeat elements could be discriminant between Haplogynae and Entelegynae spiders), it is necessary to analyze a larger number of complete D-loop regions to confirm this postulate.
Gene annotation errors and rearrangement level in the Araneae mitochondrial genome
According to the annotation of the 58 Araneae mitogenome mitochondrial genomes found in the NCBI database, there were a total of 203 cases of 2,166 genes annotated to be in arrangements differing from the ancestral gene order of the scorpion mitochondrial genome Buthus occitanus. Of all these reorganizations, 17 were identified as gene annotation errors, which means an error rate of 8.4% (17/203), indicating that 21% (12/57) of mitochondrial genomes have at least one gene badly annotated (Supplementary Table S2). For example, the mitochondrial genome with the highest number of annotation errors (5) was that of Amaurobius fenestralis (family Amaurobiidae), in which the order trnS, trnR, trnE, -trnL and -trnF, without annotation of trnA and trnN genes, was reported. Our analysis indicates that the -trnL2, trnN, trnA, trnS1, trnR, trnE and -trnF genes are found in this region, confirming the annotation error. In addition to false inversions, our analyses indicate a high number (40) of unannotated genes, but our results confirm their presence position in the genome. Most of these unannotated genes were tRNAs (21), followed by the D-loop region (19) (Supplementary Table S2). Examples of false inversions and deletions in the spider mitochondrial genome are shown in Supplementary Figure S1.
Previously, it has been reported that thousands of mitogenomes deposited in the database have shown a high rate of annotation errors, most of which are easily detectable 34,35. Earlier studies indicate that such errors are frequent in the Hexapoda mitogenome, with an annotation error rate of 5.5% concentrated mainly in tRNA genes 36. Although our results confirm that annotation errors in the spider mitogenome are concentrated in tRNAs, the error rate is almost double that observed in Hexapoda, generating false gene orders in this taxonomic group.
Phylogenetic analysis
A phylogenetic tree from 74 Araneae species and an outgroup species based on the nucleotide sequence of 13 mitochondrial protein-coding genes using maximum likelihood (ML) was obtained (Figure 2). Our findings provide strong support for the monophyly of two suborders (Opisthothelae and Mesothelae) in Araneae with high supporting values, demonstrating that the two species of the Liphistiidae family (Mesothelae) are ancestral sequences with respect to the Opisthothelae species. In addition, the subdivision of Opisthothelae into two monophyletic groups, Mygalomorphae and Araneomorphae infraorders, was supported in the phylogenetic tree; however, Araneomorphae was recovered paraphyletically in this analysis. The estimated tree in this study also supports the monophyly of Haplogynae and Entelygyne taxes. Likewise, species of the Lycosidae, Oxyopidae and Ctnenidae families formed a phylogenetically closely related clade (Figure 2). Although most of the data support a classical taxonomic classification, our phylogenetic analysis separates two species of the Pisauridae family (Dolomedes angustivirgatus and Pisaura mirabilis) into distinct monophyletic groups, showing that the phylogenetic position of the family Pisauridae is unstable and needs further study. The estimated ML tree from the mitochondrial genome is consistent with those observed in previous analyses of 13 PCG mitochondrial genes 4,32 and multilocus phylogeny 37. However, more mitochondrial genome data of Mesothelae, Mygalomorphae and haplogyne taxa would cover the consistent sign of deep phylogenetic relationship between all Araneae species.
Spider mitochondrial genome architectures and CREX analysis
By comparing the mitochondrial gene order rearrangement scenario of the Aranae order with the putative ancestral scorpions (Buthus occitanus) inferred by CREx analysis, three tRNAs (trnY, trnL2, and trnI) and a Dloop region were found to transposite and one TDRL event in block trnN-trnA-trnS1-trnR (Figure 2).
Compared to the putative ancestral scorpion architecture of Buthus occitanus (A), spiders belonging to the infraorders Mesothelae (primitive spiders) present only one translocation of the trnY gene in the trnW-trnY-trnC region (architecture B). Mygalomorphae and Haplogynae spiders differ from Mesothelae by a tandem duplication and random loss (TDRL) event in the trnN-trnA-trnS1-trnR region, a trnL2 gene translocation event in the same region, a trnT gene translocation event in the cob-trnS2 region and a translocation of the D-loop region (architecture C). Only the three species of the Dysderidae family vary in Mesothelae spiders by translocation of the trnI gene in the trnM-nad2 region (architecture D). In addition, all Entelegynae differ from most Haplogynae spiders (architecture C) by translocation of the trnI gene in the nad6-cob region (architecture E and F). However, two of the three species of the Tetragnathidae family have a different architecture. Tetragnatha maxillosa varies in architecture F by translocation of the trnG gene in the trnQ-D loop region (architecture E1). In addition, Tetragnatha nitens (architecture E2) has a new translocation of the trnW gene in the trnQ-Dloop region, evidencing a generic order trnQ-trnW-trnG-Dloop. Our analysis indicates that Leucauge celebesiana has the gene order of architecture F (Supplementary table S2 and Figure 2).
Mitogenomes have been recently applied, particularly to studies regarding the phylogeny and evolution of spiders 14,38. The gene rearrangements and mechanisms inducing the different gene orders or architectures of the spider mitogenomes are available in public databases, displaying an accelerated gene rearrangement in Araneomorphae compared to Mygalomorphae 4,13,32,38. However, despite the low number of species analyzed, these gene rearrangements are overrepresented due to the large number of annotation errors, generating up to nineteen mitochondrial gene rearrangement types and 12 architectures among 44 spiders 38, while in our analysis of 76 spider species, nine mitochondrial gene rearrangement types (8 translocation and one TDRL event) and six architectures were confirmed (see Supplementary Table S2 and Figure 2).