Genomic richness enables worldwide invasive success

doi:10.21203/rs.3.rs-3902873/v1

Download PDF

Biological Sciences - Article

Genomic richness enables worldwide invasive success

https://doi.org/10.21203/rs.3.rs-3902873/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Biological invasions are a major threat to biodiversity. Therefore, monitoring genomic features of invasive species is crucial to understand their population structure and adaptive processes. However, genomic resources of invasive species are scarce, compromising the study of their invasive success. Here, we present the reference genome of Styela plicata, one of the most widespread marine invasive species, combined with genomic data of 24 individuals from 6 populations distributed worldwide. We characterized large inversions in four chromosomes, accounting for ~ 15% of the genome size. These inversions are polymorphic through the species’ distribution area, and are enriched with genes enhancing fitness in estuary and harbor environments. Nonetheless, inversions mask detection of S. plicata population structure. When these structural variants are removed, we successfully identify the main oceanographic barriers and accurately characterize population differentiation between and within ocean basins. Several genes located in chromosome 3 are showcased as the main adaptive drivers between biogeographic regions. Moreover, we recover three major mitogenomic clades, involving structural rearrangements leading to cyto-nuclear coevolution likely involved in mitochondrion distribution during cell division. Our results suggest that genomic and structural variants contribute to S. plicata population structuring and adaptation processes, potentially enhancing the species success when colonizing new habitats.

Biological sciences/Genetics/Genome

Biological sciences/Ecology/Invasive species

Biological sciences/Evolution/Population genetics/Genetic variation/Structural variation

Biological sciences/Molecular biology/Chromosomes

Biological sciences/Genetics/Functional genomics

We are facing a global biodiversity crisis due to climate change ^1,2. Species worldwide become extinct, adapt to the changing environments or migrate to suitable areas. These contemporary processes provide unique opportunities to observe evolution in action over relatively short periods of time in a plethora of different organisms ³. Consequently, the study of current biodiversity shifts is of the utmost interest, as it opens up the possibility of assessing adaptation to rapidly changing environments, being genomes the printed legacy of the evolutionary signals that witness selective constraints ⁴. Recent advances in genomics have boosted research for wildlife management in non-model organisms ⁵, and initiatives have emerged to provide reference genomes for all species worldwide ⁶. Nevertheless, a single reference genome is insufficient to capture the species-specific diversity in terms of structural and sequence variants ⁷. The inclusion of multiple genomes of the same species along its distribution range to obtain the species’ genome diversity is needed to fully understand evolutionary forces in action ^8,9. Thus, genome-wide intraspecific studies offer an unprecedented opportunity to address, with improved accuracy, evolutionary questions that are critical for wildlife management efforts ¹⁰. Additionally, structural variants (SVs) ^11–13 have been shown to play a key role in adaptive evolution ¹⁴ or cryptic population structure among other evolutionary processes ¹⁵. Consequently, an approach based on intraspecific genome-wide comparative genomics is of particular interest in globally distributed species, as dispersal events have determined their current distribution and they may have evolved through a variety of strategies to overcome a wide range of different environments.

In the context of the ongoing global biodiversity crisis, invasive species are one of the most threatening factors worldwide, considered to be one of the main causes of species extinction ^1,2. When settled in a new habitat, invasive species outcompete autochthonous species, displacing them to suboptimal habitats where they eventually perish ¹⁶. Invasive portuary species are perfect model organisms to study evolutionary processes following colonization, as most have repeatedly invaded multiple harbors worldwide over the last few millennia ¹⁷. In the last decade, molecular studies have played an important role in endangered wildlife conservation ^18–20 and will become a game changer in invasion biology research, providing invaluable information for managing invasive species while improving environmental welfare ²¹. The solitary tunicate Styela plicata is considered one of the most successful invasive marine species worldwide ^22,23. It inhabits tropical and subtropical regions, thriving in artificial structures such as ports, aquaculture facilities, and ship hulls. It can reproduce throughout the year ²⁴, being their larvae fast settlers that outcompete other species in early fouling stages ²⁵. However, little is known about the genetics of this invasive species, and no population structuring was found with the mitochondrial and nuclear markers COX1 and ANT ²³. The distribution and biological characteristics of Styela plicata make it an excellent candidate to test the potential of genomics to assess population structuring, phylogeography and adaptation in order to gain a deep understanding of the evolutionary mechanisms underlying invasive processes ²³.

Here, we generated a high quality reference genome assembly for Styela plicata from an individual from the port of Barcelona (Spain), and combined it with whole genome sequencing data of 24 individuals of 6 key populations across its distribution range ²³ (Extended Data Table 1, Fig. 1a). We identified major genomic variants in both mitochondrial and nuclear genomes and performed population structure analyses at different levels. Furthermore, we assessed the biological functions of genomic regions potentially driving species adaptation and resilience (Fig. 1b). With this study, we aim to take a step forward in our understanding of invasive species’ evolution and adaptation.

We generated a high-quality reference genome assembly for Styela plicata (Supplementary information 1). After SNP calling and filtering whole genome sequencing data of the 24 individuals from 6 localities we kept a total of 2,676,716 SNPs (Extended Data Fig. 1a), showing fairly stable nucleotide diversity (π) along the genome (Extended Data Fig. 1b). Interestingly, Tajima's D values were generally positive and high along the genome (range − 0.412–4.501, mean 2.151, Extended Data Fig. 1c). High positive Tajima’s D values indicate a scarcity of rare alleles that result from balancing selection or changes in population size ²⁶. Considering the human-driven invasion of Styela plicata, these high values are likely to reflect recent independent founder events, producing bottlenecks and subsequent expansion on each colonized area ²⁷.

Chromosomal inversions, spanning 14.7% of the genome assembly, were identified as genomic blocks with divergent genotypes in chromosomes 2, 4, 11, and 16 (Fig. 2, Extended Data Fig. 2). The inversion boundaries were assigned to the position with mean genotype shift from heterozygous to homozygous genotypes, or the other way around. Two peaks of homozygosity were observed in chromosomes 11 and 16 (Fig. 2, Extended Data Table 2), with a plateau characterized by homozygous genotypes in between, with a slightly different pattern for individuals of North Carolina (Fig. 2). In contrast, chromosomes 2 and 4 showed from 4 to 6 peaks of homozygosity, indicating inversion breaks consistent with overlapping chromosomal inversions or adaptive loci within inversions maintained despite gene flux ²⁸ (Fig. 2). We compared Illumina short reads of the reference individual against the assembled genome and its genotyping indicated that it was homokaryotypic for all chromosomes (Extended Data Fig. 3). The four chromosome inversions showed statistically significantly superior Tajima’s D values when compared to the values of non-inverted genomic regions (Wilcoxon test: W = 100,322,610 and p-value = < 2.2e-16) (Extended Data Fig. 4). In this case, higher Tajima’s D values confirm that inversions are maintained by balancing selection.

A correct assessment of population genomic analyses is a crucial element for management and conservation. The population structure of Styela plicata obtained by including all genomic information was inconsistent with the geographic distribution of the samples (e.g. samples from distant locations clustered together and samples from the same locations appeared separated, Fig. 2), although the first axis separated North Carolina from all other localities. Interestingly, when excluding the chromosomes containing inversions from the analysis, we recovered a clear geographic pattern of genomic structure along the first five axes of the MDS, clustering individuals from the same locality (Fig. 2). The first axis separated North Carolina from the rest, the second axis separated Pacific and Atlantic localities, the third axis West from the East Atlantic localities, the forth North and South east-Atlantic localities, and finally the fifth axis split the East and West Pacific localities. Chromosomal arrangements have been shown to have an adaptive role by maintaining co-adaptive gene complexes ^14,29. The alleles found within each inversion variant seem to remain mostly unchanged across large geographic regions ³⁰ thus masking the population structure of S. plicata at the individual whole genome level. Finally, when analyzing each chromosome containing inverted regions independently (Extended Data Fig. 5), individuals from distant localities cluster into three groups along the first axis. This pattern reflects the three possible genotypes and is typically found when single inversions are present (homokaryotypes for the reference inversion, heterokaryotypes, and homokaryotypes for the alternative inversion) ³¹. Previous chromosome inversion detection studies were based on segregating morphological, phenotypic, or ecological traits, which allowed straightforward pairwise comparisons for F_ST analyses to detect inversion boundaries ³². In our case, we were able to simultaneously identify four polymorphic inversions, their boundaries, and each individual karyotype without previous knowledge based on sliding windows of average genotype information (Fig. 2). Chromosomal inversions are known to reduce recombination between chromosomes in heterokaryotypes, therefore promoting the differentiation of the allele sets in the inverted region ²⁸. Consequently, it is not surprising to detect that genomic differentiation driven by the inversions yields a stronger signal than those produced by geographical distribution (Fig. 2). Our data demonstrate that the genome is not a uniform entity, and that different regions may be subject to different constraints and thus reflect different evolutionary processes. Consequently, it is crucial to uncover in advance sequence and structural variants, such as chromosomal inversions, for reliable studies of population genomics. We encourage future genomic studies to initially test for potential inversions indicated by divergent genotypic blocks with the help of reference genomes, or indicated by high Tajima’s D values. In addition, our work evidences the need to assess genome diversity by analyzing multiple genomes to uncover hidden genomic features and their importance for proper and reliable interpretations of the results in molecular studies ^13,33.

The inclusion of both mitochondrial and nuclear data is of utmost interest for population genomics studies, as analyses of both molecular markers can provide a more complete picture of the species’ evolution due to the lack of recombination in mtDNA and differences in mutation rates between the two genomic compartments ³⁴. All mitochondrial genomes were successfully circularized with the exception of NC18. All orthologous genes were fully recovered, and therefore phylogenies could be conducted appropriately. Three main mitochondrial clades were phylogenetically inferred, using protein coding gene sequences, with individual NC18 as sister clade of mitogroup C (Fig. 3a). The clade mitogroup A, coincided with those individuals identified as ‘haplogroup 1’ in previous studies ²³, while former ‘haplogroup 2’ was split in mitogroups B and C. Mitogroup B was only found in individuals from North Carolina, and mitogroup C included all individuals from the other populations of the former ‘haplogroup 2’ (Fig. 3a). These three mitogenome clades had structural and sequence differences (Fig. 3b). Mitogroup A is differentiated from mitogroups B and C by a mean genetic distance of 2.4% and 2.6%, respectively, while B and C had a distance of 1.3% (Fig. 3b). In terms of structural variation, mitogroup C is differentiated from mitogroups A and B by a large insertion of approximately 1,000 bp which includes two partial copies of the gene cytochrome oxidase b, one partial copy of the gene cytochrome oxidase 1 and 3 full tRNA copies (Fig. 3b). The insertion is confirmed when the reads of each mitogenome are mapped against the three assembled sequences of the three mitogroups (Extended Data Fig. 6). Interestingly, NC reads almost cover the insertion of Mitogroup C with some additional duplications (Extended Data Fig. 6). As found in previous studies²³ the differentiation of mitochondrial DNA of S. plicata does not match its population geographical distribution, as individuals from two of the more differentiated clades (A and C) are globally sympatric. Thus, highly divergent individuals in terms of the mitochondrial genome, are similar in terms of nuclear DNA in the same populations, since our results with the nuclear genome excluding chromosomes with inversions indicate a clear geographic pattern of population structure (Fig. 2). Discrepancies between mitochondrial and nuclear markers are not uncommon and are attributed to ancient splits that gave rise to different mitogenomic clades, with a later secondary contact homogenizing the nuclear content ³⁵. The fact that two mitogenomic clades are present in almost all populations suggests that the secondary contact occurred before the worldwide colonization of the ascidian. Mitogroup B, on the other hand, is present only in individuals from NC that are highly genetically divergent at the nuclear level, potentially indicating that NC is a relict population that never experienced admixture with other genetic lineages after the origin of the clade.

Interactions between the nuclear and the mitochondrial genomes are largely unknown, although their correct functioning may be essential for the survival and adaptation of species. We found a few nuclear markers associated with each mitogroup (A and C) according to the F_ST outlier values, mostly scattered along the genome but with significant aggregations in chromosomes 12 and 14 (Fig. 3c). High values of nuclear divergence associated with the mitochondrial group provide evidence of interactions between these two genetic compartments ³⁶ that drive the segregation of nuclear alleles within the same populations depending on the mitogenome of the individuals. These interactions are not surprising since, the mitochondrial and the nuclear genome are known to coevolve to maintain overall functionality and, consequently, the viability of individuals ^37,38.

The analyses performed on the different genomic compartments revealed several candidate nuclear regions as drivers of Styela plicata resilience and adaptation, which can be key in explaining its invasive success.

1. Inversions confer resilience

We found that S. plicata inversions were polymorphic in all populations (except for North Carolina in chromosomes 2, 4 and 16) (Fig. 2, Fig. 4). The maintenance of different inversion arrangements within populations is known to be beneficial, as it has been reported that chromosomal inversion polymorphism plays a role in survival to harsh environmental changes ^29,39. The rich inversion polymorphism and the high level of heterokaryotypic individuals found in S. plicata could be adaptive and have implications for its invasive capabilities ³⁹. In North Carolina, the fixation of the inversions (except the one in chromosome 11) in most individuals may indicate different evolutionary processes in this population. Possibilities include NC being a relict ancient population, a recent colonization of few homokaryotypic individuals, or under different selective pressures that favored homokaryotypes. Furthermore, NC genomic uniqueness might also point to a population that never became invasive. In other invasive tunicates, it has been described that some genotypes have invasive potential while others do not ^27,40, which could also be the case for S. plicata. In any case, the presence of polymorphic inversions in all populations but NC suggests that invasive populations of S. plicata benefit from these inversions to spread worldwide (Fig. 4) ⁴¹.

Based on our genotype method of inversion detection, we compared the two different inversion homokaryotypes with F_ST. The results corroborated the presence of the inversions highlighted by our method, although the identified shifts were more internal (Fig. 4, Extended Data Table 2). This result indicates that inversions prevent recombination in heterokaryotypes, although low recombination rates can extend beyond inversion breakpoints as found in the average genotype analysis, making it difficult to identy inversion haplotype boundaries ²⁸. A comparison between genes found inside each inversion relative to the rest of the genome, using both boundary values, resulted in significantly enriched Gene Ontology (GO) terms in all four inversions (Fig. 4, Supplementary Spreadsheet 1). Interestingly, on each chromosome inversion, only five main functions encompassed more than 75% of the enriched functions (Fig. 4, Supplementary Spreadsheet 1). It is worth noting that in S. plicata each chromosomal inversion encompasses different enriched processes, suggesting that each inversion may play a specific role in the species’ success. Most of these functions can be associated with fitness enhancers in estuary and harbor environments, such as the immune system (chromosomes 11 and 16) ⁴², reproduction (chromosomes 2 and 16 )⁴³, response to pollutants and other stressors (chromosomes 2, 4 and 16) ^44,45, or cell division and cellular organization processes (chromosome 11), among others. The concentration of enriched functions in inversions suggests that inversions may play an important role in adaptation, facilitating the establishment and invasion of S. plicata in colonized areas. Consequently, inversions could have been maintained worldwide as polymorphic as these were adaptive to cope with environmental shifts when colonizing new habitats. For example, the reference arrangement (the one found in the reference genome) can be adaptive under given environmental conditions, whereas the alternative arrangement can be favorable under different external pressures. By studying the functional enrichment within inversions and its adaptive value, the invasive potential of any species can be assessed, which might catalyze the activation of early surveillance action plans for early monitoring of these species to prevent future colonization events.

2. Local adaptation across oceans

Given the apparent pattern of population structuring of S. plicata excluding chromosomes with inversions, we assessed local adaptation signals considering the main groups found in the MDS with the first two axes (Fig. 2). When comparing North Carolina with the remaining localities, F_ST values were very high along the genome (Extended Data Fig. 7). Potential signals of local adaptation were found between the Atlantic and Pacific populations, although F_ST values were overall lower (Extended Data Fig. 8a). The top 1% F_ST values were found along the genome, although a region in chromosome 3 provided the strongest signals of potential local adaptation from position 26,182,501 to position 26,490,000 (Extended Data Fig. 8a). Genes included in the region of high F_ST values in chromosome 3 differentiating the two biogeographic regions (Extended Data Fig. 8a) yielded 49 GO terms. Metabolic processes and response to stimulus comprised 33% of the functions that could drive biogeographic adaptation (Extended Data Fig. 8b). Interestingly, one of the main functions implicated in geographic adaptation was associated with ion transport. Given the fact that the Atlantic Ocean salinity is slightly higher than the Pacific, individuals from both areas may need to osmoregulate appropriately in each ocean, driving local adaptation, as similarly reported in fish ⁴⁶.

3. Cyto-nuclear coevolution signals

We found two F_ST outlier clusters in chromosome 12 (21,830,001–21,840,000bp) and 14 (8,772,501–8,782,500bp) linked with the mitogenome clusters A and C, which could be signaling nuclear genes coevolving with mitogenomic structures. Interestingly, no genes involved in cellular respiration were found (Extended Data Fig. 9, Supplementary Spreadsheet 1).

One of the genes found in the highest F_ST peak in chromosome 14 blasted against the gene ‘UDP-galactose/UDP-glucose transporter 7-like’. This gene is involved in the correct transport of galactose and glucose to the Golgi vesicles ⁴⁷, which promotes glycosylation of several proteins to enhance their performance⁴⁸. Some studies have indicated that glycosylated proteins might play important roles in mitochondria by modifying and regulating the functions of the non-glycosylated original isoforms ⁴⁹. The surrounding genes affected by the cyto-nuclear interaction include many other functions involved in development and cellular organization, related to the cytoplasmic membrane and other organelles (Extended Data Fig. 9, Supplementary Spreadsheet 1).

No gene was found by BLAST when assessing the genes of the highest F_ST peak on chromosome 12, suggesting that the divergent genetic signal is driven by an unknown gene, is related to gene regulation, or is in linkage with nearby regions under selection. Genes in the coevolving region of chromosome 12 were involved in cellular organelle organization and Sm-like proteins (Extended Data Fig. 9, Supplementary Spreadsheet 1), the latter playing an important role in alternative splicing ⁵⁰. Interestingly, functions found in the coevolving region of chromosome 12 could be highly dependent on those found in chromosome 14, as Sm-like proteins activity occurs inside the Golgi vesicles. Golgi elements are tightly associated with mitochondria in many animals during oocyte development, generating a structure named the Balbiani body. This structure is associated with ribonucleoproteins with a core domain ruled by Sm-proteins ^51,52. Thus, in the context of potential coevolution between mitochondrial and nuclear genes of S. plicata, organelle organization and the synthesis of protein isoforms might enhance the performance of the invasive tunicate S. plicata. Surprisingly, cyto-nuclear coevolution found in S. plicata was not related with respiration, but with processes potentially involved in cell division during meiosis and reproduction (Extended Data Fig. 9). We propose cyto-nuclear interactions as a relevant mechanism for S. plicata, although future studies are needed to confirm the interplay between the two genetic compartments and its impact on mitochondrial distribution during meiosis.

The generation of reference genomes has become a milestone in genomics ⁶. The more we advance in the knowledge of genomes of living organisms, the more we understand their complexity and the importance of analyzing multiple genomes within species to capture the species-specific diversity ³³. Our approach, combining a high quality reference genome assembly with genome re-sequencing of several individuals from the worldwide distribution of the global invader S. plicata, has revealed that the genome of this species cannot be considered as a single entity. On the contrary, different genome regions (from both the nucleus and the mitochondrion) have been shaped by different evolutionary forces, with some degree of coevolution between different genomic compartments. Consequently, by dissecting the different components of the genome and analyzing them independently, we have been able to disentangle multiple molecular traits of this invasive species.

Not considering genome architecture diversity may yield partial or spurious results, by either focusing on specific elements of the genome or mixing the confounding signals from different genomic regions. By detecting inversions beforehand using a method based on genotypes, we were able to disentangle different nuclear entities before starting population structure analyses, which were followed by the identification of local adaptation and cyto-nuclear interactions. Consequently, we endorse this approach as a game changer in the field of genomics, being the logical next step following the sequencing and annotation of reference genomes for a thorough understanding of the processes underlying molecular evolution ³³.

Reference genome sampling and sequencing

Two Styela plicata individuals were collected in the harbor of Barcelona in September 2020 (Extended Data Table 1, Fig. 1). One of the individuals was kept alive until DNA extraction to best preserve DNA integrity, while the other was immediately preserved in RNAlater for RNA extractions. Once in the laboratory, two gill folds were excised for DNA extraction and genome size estimation, while 25 mm² each of gill, mantle, and gonad tissues were selected for RNA extractions.

DNA extraction followed a protocol based on that of Mayjonade et al. (2016) ⁵³. RNA extractions were performed according to the protocol of Ghangal et al. (2009) ⁵⁴. The quality and concentration of both extractions were assessed using the TapeStation 2200 (Agilent Technologies) and the Qubit Fluorometer with the appropriate Qubit dsDNA/RNA BR Reagents Assay Kit (Thermo Fisher Scientific, Waltham, MA).

A SMRTbell library was constructed following the instructions of the SMRTbell Express Prep kit v2.0 with Low DNA Input Protocol (Pacific Biosciences, Menlo Park, CA). One SMRT cell sequencing run was performed in CLR mode on a Sequel System II with the Sequel II Sequencing Kit 2.0. Additionally, a DNA extract of the same specimen was shipped to Novogene (UK) for Illumina Short Reads (SR) Whole Genome Sequencing (WGS). One genomic library (insert size: 350 bp) was prepared and 150 bp paired-end reads were sequenced on an Illumina NovaSeq 6000 platform (San Diego, CA) targeting 30 Gb output. Omni-C short reads were obtained by building the corresponding libraries following Dovetail^Ⓡ Omni-C kit manufacturer's instructions with an insert size of 350 bp. The library was sequenced on the NovaSeq 6000 platform using a 150 paired-end sequencing strategy at Novogene (UK). Finally, the RNA extractions from gill, mantle and gonad tissues were pooled at equal concentrations and sent to Novogene (UK) for Illumina paired-end 150 bp RNA-seq of a cDNA library (insert size: 350 bp) with an expected output of 30 Gb (Extended Data Table 3).

Genome size estimations

Genome size was estimated following a flow cytometry protocol with propidium iodide-stained nuclei described in Chueca et al. 2021 ⁵⁵. Fresh mantle and gill tissues of S. plicata were separately chopped with a razor blade in independent Petri dishes containing 2 ml of ice-cold Galbraith and Otto buffer, respectively ^56,57. As internal reference standards we used Acheta domesticus (female, 1C = 2 Gb) and chicken nuclei (Gallus gallus, 1C = 1 Gb) (Biosure). The suspension was filtered through a 42 μm nylon mesh and stained with the intercalating fluorochrome propidium iodide (PI, Thermo Fisher Scientific) and treated with RNase II A (Sigma-Aldrich), each with a final concentration of 25 μg/ml. The mean red PI fluorescence signal of stained nuclei was quantified using a Beckman-Coulter CytoFLEX flow cytometer with a solid-state laser emitting at 488 nm. Fluorescence intensities of 5,000 nuclei per sample were recorded. The software CytExpert 2.3 was used for histogram analyses. The total quantity of DNA in the sample was calculated as the ratio of the mean red fluorescence signal of the 2C peak of the stained nuclei of the S. plicata sample divided by the mean fluorescence signal of the 2C peak of the reference standard. Three replicates for each treatment were measured on 3 different days to minimize instrument error.

Reference genome assembly

PacBio CLR subreads raw sequence files (BioProject = PRJEB67507) were transformed from bam to fastq format using samtools v.1.10 ⁵⁸ and their quality assessed with Nanoplot v.1.28.1 ⁵⁹. Nanofilt v.2.6 ⁵⁹ was used to apply a length threshold filter of 25 kb to remove potential bacterial contamination and mitogenomic sequences shown as secondary GC content peaks in fastQC plots (Extended Data Fig.10). Both Omni-C and WGS-SR data (BioProject = PRJEB67507) were trimmed and filtered using Trimmomatic v.0.39 ⁶⁰ and their quality measured with fastQC v.0.11.9 ⁶¹.

We used Flye v.2.8 ⁶² to assemble the filtered Pacbio CLR with three iterations of self polishing, followed by three polishing rounds with the WGS-SR data using Pilon v.1.23 ⁶³. Finally, the assembly draft was deduplicated using purge_dups.py v.1.2.6 ⁶⁴. For genome scaffolding we used Omni-C data and uploaded the polished assembly to the program Juicer v.1.6 ⁶⁵, defining ‘none’ as the restriction enzyme since Omni-C libraries use an endonuclease with random cleavage sites. The function run-asm-pipeline.sh of the program 3d-dna v.201008 ⁶⁶ was called to obtain the contact map, which was manually curated using the interface of Juicebox v.1.5 ⁶⁷. The final chromosome-level assembly was recovered by running the script juicebox_assembly_converter.py of the same program.

Coverage along the genome assembly and mean coverage were tested by mapping the initial CLR and WGS-SR back to the polished assembly using the programs Minimap2 v.2.24 and BWA-mem v.0.7.17, respectively, and visualized with Qualimap v.2.2.1 and multiQC v.1.8 ^68–71 implemented in backmap.pl v.0.4 ⁷²https://github.com/schellt/backmap. Contiguity statistics were obtained using QUAST, base-level accuracy (qv) was assessed with mercury v.1.3 ⁷³, and genome completeness was corroborated by detecting universal single copy orthologs of metazoa using BUSCO v.5.2.2 ⁷⁴We checked for DNA contaminations by comparing our sequences to those in NCBI using BLAST v.2.12 ⁷⁵. The resulting quality features values were graphically represented using the software BlobTools v.2.0 ⁷⁶. As a result, we generated a high-quality chromosome-level reference genome assembly for subsequent analyses, submitted to ENA (BioProject = PRJEB67507).

Reference genome annotation

We downloaded from repbase ⁷⁷ previously reported transposable elements (TE) of the model species Ciona intestinalis type A (Ciona robusta), as it was the closest related species to Styela plicata with available TE data. Furthermore, we also downloaded the reference genome of the congener Styela clava ⁷⁸ in order to aid in our TE annotation. We used RepeatModeler v.2.0 ⁷⁹ in both S. plicata and S. clava to generate de novo predictive TE models for the genus Styela. The resulting models obtained for both species were combined with the TE of C. robusta. The model file including the TEs of all three species was used to soft- and hard-mask the chromosome-level assembly with RepeatMasker v.4.1.2 ⁸⁰. We plotted each TE family abundance and Kimura2 substitution levels profile using the program RepeatLandscape.pl, included in RepeatMasker.

To annotate genes, a protein set of Stolidobranchia was downloaded from UniProt ⁸¹. In addition, RNA-seq data from gill, mantle, and gonads (BioProject = PRJEB67507) previously filtered with Trimmomatic v.0.39 was assembled into transcripts with Trinity v.2.11 ⁸² and posteriorly merged and clustered by 99% similarity using CD-HIT v.4.8 ⁸³ to reduce redundancy. Both the Stolidobranchia protein set and our RNA-seq data were used to annotate the hardmasked genome using blast and exonerate, both implemented in MAKER v.2.31.10 ⁸⁴. Gene ‘ab-initio’ predictions were conducted by AUGUSTUS v.3.4.0, GeneMark-EP v.4.71 and SNAP v.2013-11-29, as implemented in MAKER ^85–87. Three rounds of protein modeling were carried out in order to refine the genome annotation. For the first modeling round, BUSCO genes were used to generate a gene model with SNAP, whereas AUGUSTUS and GeneMark were based on RNAseq. After the first modeling, an annotation draft was obtained with MAKER. For AUGUSTUS and SNAP, second and third model rounds were carried out using as a training set those genes obtained by the most updated annotation draft. The new training models were used to reannotate the genome assembly. Additionally, in the last annotation round, tRNAscan-SE v.2.0 ⁸⁸ and snoscan v.0.9.1 ⁸⁹ were activated in MAKER to annotate tRNA and small non-coding RNA (snoRNA). As the last step, we predicted long non-coding RNA (lncRNA) with FEELnc v.0.0.1 ⁹⁰ and CPC2 ⁹¹. lncRNA shared between both softwares which were not overlapping with any protein coding gene were selected as reliable lncRNA candidates and ultimately included in the S. plicata annotation. Finally, genes were compared against Pfam databases using eggNOG-mapper v.2 ⁹² using default thresholds in order to assign Gene Ontology terms (GO terms), and the best match was recorded in the final annotation file.

Worldwide genome sequencing and genotyping

To assess the genome variation worldwide, we sequenced 24 additional Styela plicata individuals from different harbors previously studied ²³: California, USA (n=4), Santa Catarina, Brazil (n=4), North Carolina, USA (n=4), Ferrol, Spain (n=4), Port Elizabeth, South Africa (n=4), and Misaki, Japan (n=4) (Extended Data Table 1, Fig. 1a). Since the cytochrome oxidase I (COXI) haplotypes of these samples were already known ²³, we chose whenever possible a balanced proportion of individuals in every population with the two previously described haplogroups (Extended Data Table 1). DNA extractions were sent to Novogene for WGS-SR library preparation and sequencing aiming for 5 Gb of 150 bp paired end reads for each individual.

Global nuclear genome diversity and population differentiation

Illumina WGS-SR data of the 24 worldwide sampled individuals (Extended Data Table 4) were filtered with Trimmomatic v.0.39, and deposited in ENA (BioProject = PRJEB67519). Filtered reads of these individuals were mapped to the newly generated chromosome-level reference genome using BWA-mem and genotypes were called using the mpileup function in BCFtools v.1.10.2 ⁵⁸. VCFtools v.4.2 was used for filtering variants ⁹³. We retained those biallelic SNPs present in 100% of the individuals with a minimum allele frequency of 10% and a minimum coverage of 5 reads. We finally removed loci with a mean coverage across samples above 25 reads, found as the upper whisker value for loci mean coverage, to avoid duplicates.

Sliding windows of 10,000 bp were used to calculate global nucleotide diversity (π) and global Tajima’s D using VCFtools, with a step of 2,500 bp for the π. Tajima’s D and global nucleotide diversity (π) Manhattan plots were drawn with the R package ‘CMplot’ v.3.1.3 ⁹⁴. We recodified all genotypes using the function --012 from VCFools v.4.2, where 0 represents homozygous genotypes for the reference SNP, 1 means heterozygous genotypes and 2 indicates homozygous genotypes for the alternative SNP. We used sliding windows of 10,000 bp with a 2,500 bp step in R to calculate the mean value of the recodified genotypes along the chromosome for each individual, in order to detect linkage blocks, and identify putative chromosomal inversions. Results from all the individuals were plotted using the R package ‘ggplot2’ v.3.4.1 ⁹⁵ chromosomal inversions were identified as linked blocks with constant genotypes. Thus, individuals with mean genotype values close to 0 indicated homozygotes for the reference SNPs (the same allele as the reference genome), individuals with mean genotype values close to 2 indicated homozygotes for the alternative SNP, and individuals with mean genotype values in-between indicate heterozygotes bearing both possible alleles. We defined as putative inversion’s boundaries the points in which the genotypes across individuals shift from heterozygous to homozygous on all individuals in the sliding window analysis. We tested if Tajima’s D values were significantly different within the inversions in relation to the remaining genome using the Wilcoxon test with the function ‘wilcox.test’ implemented in R, setting the paired parameter as false.

To have an alternative definition for the boundaries of the inversions, we selected individuals from both homokaryotypes according to the genotypes plotted in sliding windows, and we performed an F_STanalysis comparing the individuals with the two chromosomal arrangements with VCFTools. The F_STwas calculated using a 10,000 bp sliding window and a step of 2,500 bp. Values were graphically viewed using CMplot as a Manhattan plot. F_STboundaries suggesting inversion breakpoints were defined as the window in which the F_STvalues shifted from close-to-zero to close-to-one values.

A Multi-Dimensional Scaling (MDS) was obtained using the ‘cluster’ function of PLINK v.2.0 ⁹⁶, which is based on Identity By State (IBS) pairwise distances. The initial dataset included all the SNPs obtained after genotyping, and different subsets were also used for subsequent analyses: a dataset excluding all the chromosomes where we detected putative inversions, and four additional datasets, one per chromosome with inversions. For each dataset, the first five axes of the MDS were plotted, by correlative pairs, using the ‘ggplot2’ package.

In order to detect signals of regional adaptation, F_ST values were calculated between the Atlantic and Pacific biogeographic regions (see results). We kept as outliers those F_STvalues falling in the top 1%. Manhattan plots were obtained for the F_ST values across the genome with the package ‘CMplot’ and outlier values were highlighted.

Global mitochondrial genome diversity

We used the filtered WGS-SR data of all individuals (including the individual of the reference genome and the 24 individuals sampled worldwide) to assemble individual mitogenomes with NOVOplasty v.4.2 ⁹⁷ using a publicly available Styela plicata mitogenome as reference (NC_013565.1), a COI sequence as a seed (HQ916426.1) and setting k=33. Mitogenomes were annotated using MITOS2 ⁹⁸, as implemented in the galaxy portal ⁹⁹, with default parameters and the mitochondrial ascidian genetic code, followed by manual curation with the browser Geneious Prime (https://www.geneious.com/). The mitochondrial genomes of the ascidian species Styela clava (NC_037072), Botryllus schlosseri (NC_021463), and Ciona robusta (NC_034372) were downloaded for comparison. All protein-coding genes were extracted from the mitogenomes and independently aligned with MAFFT ¹⁰⁰. To identify phylogenetic relationships between individual mitogenomes, independent gene trees were obtained by maximum likelihood approaches using IQ-TREE2 v.2.2.0 ¹⁰¹, performing 50,000 iterations of ultra-fast bootstrap (-b), without codon position partitioning, and using an evolutionary model GTR+I+G for all of them. The consensus gene tree (supertree) based on the independent gene trees and its node support values were obtained using ASTRAL v.5.5.7 ¹⁰². A second approach was conducted by concatenating all genes in a single matrix (supermatrix), and we ran IQ-TREE2 using the same parameters as previously mentioned. After phylogenetic reconstruction, different mitogroups were identified and analyses between them were performed. Alignment of the whole mitochondrial sequences was performed using the option LINS-i from MAFFT, in order to identify structural rearrangements. MEGA11 ¹⁰³ was used to calculate the whole mitogenome genetic distance among main clades by pairwise comparison with pairwise deletion of indel regions. To identify cyto-nuclear interactions, F_ST values between the two main mitogenomic groups were calculated with nuclear SNPs for all chromosomes (see results). In the analysis we included only individuals of the localities with an equal number of representatives of the two mitogenomic groups in order to reduce the population effect. Manhattan plots were obtained for the F_ST values across the genome with the package ‘CMplot’.

Functional analyses of candidate adaptive genome regions

All chromosomal regions potentially driving Styela plicata’s adaptive success (inversions, regional adaptation and cyto-nuclear interactions) were selected to identify their gene functions. As a first step we used eggNOG-mapper ⁹² in our annotated genes to obtain their corresponding Gene Ontology Terms (GO terms), using default parameters.

1. Inversions: We aimed for a screening on functional enrichment of genes in the inverted regions. For each inversion we generated two gene lists in which one contained the gene IDs found inside the inversion and the other included the gene IDs found in all chromosomes but excluding the IDs inside the same inversion. Boundaries detected by genotypes and F_ST values were considered independently for gene list construction. To detect functional enrichment, FatiGO ¹⁰⁴ was used to compare the GO terms in the two gene lists (inside and outside each inversion) based on genotype and F_ST boundaries. Enriched GO terms were treated with the REVIGO website ¹⁰⁵. A table with the resulting clusters of biological functions was provided. The absolute number of most abundant biological functions was graphically represented with ‘ggplot2’ package. Finally, the locations of the genes with enriched GO terms were visualized within the F_ST Manhattan plot.

2. Regional adaptation: We focused on all genes within the region with the highest and most abundant top 1% F_ST values differentiating Atlantic and Pacific populations, in chromosome 3 (see results). We selected the GO terms associated with the genes included in the region and treated them with REVIGO. A ‘TreeMap’ of the biological functions was plotted with the R package “treemap v.2.4-4” (https://cran.r-project.org/web/packages/treemap/index.html).

3. Cyto-nuclear interaction: We selected all the GO terms associated with the genes included in the region with highest and more abundant top 1% F_ST values for chromosomes 12 and 14 (see results), and treated them with REVIGO. In this case, we selected not only the biological function, but also the Cellular components of the GO terms to assess if the genes produced in the nucleus were active in the mitochondrions. TreeMaps of both classes were plotted with the “treemap” package. Additionally, we extracted the sequence of the genes falling in the highest F_ST windows with gffREAD v.0.12.8 ¹⁰⁶ and identified them using BLAST against the NCBI database.

Competing interests

The authors declare no competing interests.

Author Contributions

All authors contributed to the conceptualization of the study. CGC, DB, AB, and CG conducted labwork. CGC, TS, and CP carried out genome assembly and annotation. CGC and CP performed genomic analyses with worldwide sampled individuals. CGC curated the data, carried out graphic design and generated the first draft of the manuscript. XT, MP, CG, and CC got funding, supervised the work being conducted and significantly revised the manuscript.

Acknowledgments

We thank the Genome Technology Center (RGTC) at Radboudumc for the use of the Sequencing Core Facility (Nijmegen, The Netherlands), which provided the PacBio SMRT sequencing service on the Sequel II platform. The samples used for the pangenome study came from a previous study, and we thank Mari Carmen Pineda and Susanna López-Legentil for lending them. We thank the Reial Club Nàutic de Barcelona for allowing access to their facilities to sample the individual used for the reference genome assembly. We thank Fabio Vitale for providing a picture of Styela plicata diplayed in Fig. 1a. This research was funded by the project MarGeCh [PID2020-118550RB] from the Spanish Government and funded by the Centre for Translational Biodiversity Genomics (LOEWE-TBG) through the program LOEWE–Landes-Offensive zur Entwicklung Wissenschaftlich-ökonomischer Exzellenz of Hesse's Ministry of Higher Education, Research, and the Arts (HMWK). CGC held a predoctoral contract [PRE-2018-085227] by the Spanish Ministry of Science, Innovation and Universities, and by ERDF ‘‘A way of making Europe”. The authors CGC, CP, MP, and CC are members of the research group SGR2021-01271, and XT is member of the research group 2021 SGR 00405, both funded by the Generalitat de Catalunya (AGAUR).

Data availability

All raw data and both reference genome and mitochondrial genomes assemblies have been uploaded to ENA (BioProjects PRJEB67507 and PRJEB67519). All custom scripts, all data used this study, and nuclear genome and mitogenomes annotations have been made available at GitHub (https://github.com/CGaliaCamps/Splicata_genomes/). All tools and their corresponding versions used for this study are available in the “Methods” section.

IPBES. Summary for policymakers of the global assessment report on biodiversity and ecosystem services. Preprint at https://doi.org/10.5281/ZENODO.3553579 (2019).
Roy, H. E. et al. IPBES Invasive Alien Species assessment: Summary for Policymakers. Preprint at https://doi.org/10.5281/ZENODO.7430692 (2023).
Hoberg, E. P. & Brooks, D. R. Evolution in action: climate change, biodiversity dynamics and emerging infectious disease. Philos. Trans. R. Soc. Lond. B Biol. Sci. 370, (2015).
North, H. L., McGaughran, A. & Jiggins, C. D. Insights into invasive species from whole-genome resequencing. Mol. Ecol. 30, 6289–6308 (2021).
Theissinger, K. et al. How genomics can help biodiversity conservation. Trends Genet. (2023) doi:10.1016/j.tig.2023.01.005.
Formenti, G. et al. The era of reference genomes in conservation genomics. Trends Ecol. Evol. 37, 197–202 (2022).
Valiente-Mullor, C. et al. One is not enough: On the effects of reference genome for the mapping and subsequent analyses of short-reads. PLoS Comput. Biol. 17, e1008678 (2021).
Eisenstein, M. Every base everywhere all at once: pangenomics comes of age. Nature 616, 618–620 (2023).
Pegueroles, C., Pascual, M. & Carreras, C. Going beyond a reference genome in conservation genomics. Trends Ecol. Evol. (2023) doi:10.1016/j.tree.2023.11.009.
Hohenlohe, P. A., Funk, W. C. & Rajora, O. P. Population genomics for wildlife conservation and management. Mol. Ecol. 30, 62–82 (2021).
Sirén, J. et al. Pangenomics enables genotyping of known structural variants in 5202 diverse genomes. Science 374, abg8871 (2021).
Ebler, J. et al. Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes. Nat. Genet. 54, 518–525 (2022).
Wang, T. et al. The Human Pangenome Project: a global resource to map genomic diversity. Nature 604, 437–446 (2022).
Jones, F. C. et al. The genomic basis of adaptive evolution in threespine sticklebacks. Nature 484, 55–61 (2012).
Jin, S. et al. Structural variation (SV)-based pan-genome and GWAS reveal the impacts of SVs on the speciation and diversification of allotetraploid cottons. Mol. Plant (2023) doi:10.1016/j.molp.2023.02.004.
Hohnen, R. et al. Abundance and detection of feral cats decreases after severe fire on Kangaroo Island, Australia. Austral Ecol. (2023) doi:10.1111/aec.13294.
Touchard, F., Simon, A., Bierne, N. & Viard, F. Urban rendezvous along the seashore: Ports as Darwinian field labs for studying marine evolution in the Anthropocene. Evol. Appl. 16, 560–579 (2023).
Chow, J. C., Anderson, P. E. & Shedlock, A. M. Sea Turtle Population Genomic Discovery: Global and Locus-Specific Signatures of Polymorphism, Selection, and Adaptive Potential. Genome Biol. Evol. 11, 2797–2806 (2019).
Wright, B. R. et al. A demonstration of conservation genomics for threatened species management. Mol. Ecol. Resour. 20, 1526–1541 (2020).
Wolf, M., de Jong, M., Halldórsson, S. D., Árnason, Ú. & Janke, A. Genomic Impact of Whaling in North Atlantic Fin Whales. Mol. Biol. Evol. 39, (2022).
Rius, M. & Turon, X. Phylogeography and the description of geographic patterns in invasion genomics. Front. Ecol. Evol. 8, (2020).
Barros, R. Human-mediated global dispersion of Styela plicata (Tunicata, Ascidiacea). Aquat. Invasions 4, 45–57 (2009).
Pineda, M. C., López-Legentil, S. & Turon, X. The whereabouts of an ancient wanderer: global phylogeography of the solitary ascidian Styela plicata. PLoS One 6, e25495 (2011).
Pineda, M. C., López-Legentil, S. & Turon, X. Year-round reproduction in a seasonal sea: biological cycle of the introduced ascidian Styela plicata in the Western Mediterranean. Mar. Biol. 160, 221–230 (2013).
Casso, M. et al. Seasonal patterns of settlement and growth of introduced and native ascidians in bivalve cultures in the Ebro Delta (NE Iberian Peninsula). Reg. Stud. Mar. Sci. 23, 12–22 (2018).
Kloch, A. et al. High genetic diversity of immunity genes in an expanding population of a highly mobile carnivore, the grey wolf Canis lupus , in Central Europe. Divers. Distrib. 27, 1680–1695 (2021).
Casso, M., Turon, X. & Pascual, M. Single zooids, multiple loci: independent colonisations revealed by population genomics of a global invader. Biol. Invasions 21, 3575–3592 (2019).
Berdan, E. L. et al. How chromosomal inversions reorient the evolutionary process. J. Evol. Biol. (2023) doi:10.1111/jeb.14242.
Faria, R. et al. Multiple chromosomal rearrangements in a hybrid zone between Littorina saxatilis ecotypes. Mol. Ecol. 28, 1375–1393 (2019).
Simões, P., Calabria, G., Picão-Osório, J., Balanyà, J. & Pascual, M. The genetic content of chromosomal inversions across a wide latitudinal gradient. PLoS One 7, e51625 (2012).
Hollenbeck, C. M. et al. Temperature-associated selection linked to putative chromosomal inversions in king scallop (). Proc. Biol. Sci. 289, 20221573 (2022).
Huang, K., Andrew, R. L., Owens, G. L., Ostevik, K. L. & Rieseberg, L. H. Multiple chromosomal inversions contribute to adaptive divergence of a dune sunflower ecotype. Mol. Ecol. 29, 2535–2549 (2020).
Brockhurst, M. A. et al. The Ecology and Evolution of Pangenomes. Curr. Biol. 29, R1094–R1103 (2019).
Vawter, L. & Brown, W. M. Nuclear and mitochondrial DNA comparisons reveal extreme rate variation in the molecular clock. Science 234, 194–196 (1986).
Petrou, E. L. et al. Secondary contact and changes in coastal habitat availability influence the nonequilibrium population structure of a salmonid (Oncorhynchus keta). Mol. Ecol. 22, 5848–5860 (2013).
Piccinini, G. et al. Mitonuclear Coevolution, but not Nuclear Compensation, Drives Evolution of OXPHOS Complexes in Bivalves. Mol. Biol. Evol. 38, 2597–2614 (2021).
Hill, G. E. Mitonuclear Compensatory Coevolution. Trends Genet. 36, 403–414 (2020).
Nguyen, T. H. M., Sondhi, S., Ziesel, A., Paliwal, S. & Fiumera, H. L. Mitochondrial-nuclear coadaptation revealed through mtDNA replacements in Saccharomyces cerevisiae. BMC Evol. Biol. 20, 128 (2020).
Tepolt, C. K., Grosholz, E. D., de Rivera, C. E. & Ruiz, G. M. Balanced polymorphism fuels rapid selection in an invasive crab despite high gene flow and low genetic diversity. Mol. Ecol. 31, 55–69 (2022).
Hudson, J. et al. Genomics-informed models reveal extensive stretches of coastline under threat by an ecologically dominant invasive species. Proc. Natl. Acad. Sci. U. S. A. 118, (2021).
Battlay, P. et al. Large haploblocks underlie rapid adaptation in the invasive weed Ambrosia artemisiifolia. Nat. Commun. 14, 1717 (2023).
Bernheim, A. & Sorek, R. The pan-immune system of bacteria: antiviral defence as a community resource. Nat. Rev. Microbiol. 18, 113–119 (2020).
Shlesinger, T. & Loya, Y. Sexual reproduction of scleractinian corals in mesophotic coral ecosystems vs. Shallow reefs. in Coral Reefs of the World 653–666 (Springer International Publishing, 2019).
Hu, H. et al. Amborella gene presence/absence variation is associated with abiotic stress responses that may contribute to environmental adaptation. New Phytol. 233, 1548–1555 (2022).
Coffin, J. L., Kelley, J. L., Jeyasingh, P. D. & Tobler, M. Impacts of heavy metal pollution on the ionomes and transcriptomes of Western mosquitofish (Gambusia affinis). Mol. Ecol. 31, 1527–1542 (2022).
Dalongeville, A., Benestan, L., Mouillot, D., Lobreaux, S. & Manel, S. Combining six genome scan methods to detect candidate genes to salinity in the Mediterranean striped red mullet (Mullus surmuletus). BMC Genomics 19, 217 (2018).
Maszczak-Seneczko, D., Wiktor, M., Skurska, E., Wiertelak, W. & Olczak, M. Delivery of Nucleotide Sugars to the Mammalian Golgi: A Very Well (un)Explained Story. Int. J. Mol. Sci. 23, (2022).
Hadley, B. et al. Structure and function of nucleotide sugar transporters: Current progress. Comput. Struct. Biotechnol. J. 10, 23–32 (2014).
Burnham-Marusich, A. R. & Berninsone, P. M. Multiple proteins with essential mitochondrial functions have glycosylated isoforms. Mitochondrion 12, 423–427 (2012).
Scofield, D. G. & Lynch, M. Evolutionary diversification of the Sm family of RNA-associated proteins. Mol. Biol. Evol. 25, 2255–2267 (2008).
Pepling, M. E., Wilhelm, J. E., O’Hara, A. L., Gephardt, G. W. & Spradling, A. C. Mouse oocytes within germ cell cysts and primordial follicles contain a Balbiani body. Proc. Natl. Acad. Sci. U. S. A. 104, 187–192 (2007).
Jamieson-Lucy, A. & Mullins, M. C. The vertebrate Balbiani body, germ plasm, and oocyte polarity. Curr. Top. Dev. Biol. 135, 1–34 (2019).
Mayjonade, B. et al. Extraction of high-molecular-weight genomic DNA for long-read sequencing of single molecules. Biotechniques 61, 203–205 (2016).
Ghangal, R., Raghuvanshi, S. & Chand Sharma, P. Isolation of good quality RNA from a medicinal plant seabuckthorn, rich in secondary metabolites. Plant Physiol. Biochem. 47, 1113–1115 (2009).
Chueca, L. J. et al. Genome Assembly of the Raccoon Dog (). Front. Genet. 12, 658256 (2021).
Galbraith, D. W. et al. Rapid flow cytometric analysis of the cell cycle in intact plant tissues. Science 220, 1049–1051 (1983).
Otto, F. DAPI staining of fixed cells for high-resolution flow cytometry of nuclear DNA. Methods Cell Biol. 33, 105–110 (1990).
Danecek, P. et al. Twelve years of SAMtools and BCFtools. Gigascience 10, (2021).
De Coster, W., D’Hert, S., Schultz, D. T., Cruts, M. & Van Broeckhoven, C. NanoPack: visualizing and processing long-read sequencing data. Bioinformatics 34, 2666–2669 (2018).
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics vol. 30 2114–2120 Preprint at https://doi.org/10.1093/bioinformatics/btu170 (2014).
Andrews, S. FastQC: a quality control tool for high throughput sequence data. Preprint at https://github.com/s-andrews/FastQC (2010).
Kolmogorov, M., Yuan, J., Lin, Y. & Pevzner, P. A. Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol. 37, 540–546 (2019).
Walker, B. J. et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One 9, e112963 (2014).
Guan, D. et al. Identifying and removing haplotypic duplication in primary genome assemblies. Bioinformatics 36, 2896–2898 (2020).
Durand, N. C. et al. Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C Experiments. Cell Syst 3, 95–98 (2016).
Dudchenko, O. et al. De novo assembly of the genome using Hi-C yields chromosome-length scaffolds. Science 356, 92–95 (2017).
Durand, N. C. et al. Juicebox Provides a Visualization System for Hi-C Contact Maps with Unlimited Zoom. Cell Syst 3, 99–101 (2016).
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Okonechnikov, K., Conesa, A. & García-Alcalde, F. Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data. Bioinformatics 32, 292–294 (2016).
Ewels, P., Magnusson, M., Lundin, S. & Käller, M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics vol. 32 3047–3048 Preprint at https://doi.org/10.1093/bioinformatics/btw354 (2016).
Schell, T. et al. An annotated draft genome for Radix auricularia (Gastropoda, Mollusca). Genome Biol. Evol. (2017) doi:10.1093/gbe/evx032.
Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 21, 245 (2020).
Wright, S. L. Burrows-Wheeler Aligner: A Parallel Approach. (2012).
Dumontier, M. & Hogue, C. W. V. NBLAST: a cluster variant of BLAST for NxN comparisons. BMC Bioinformatics 3, 13 (2002).
Laetsch, D. R. & Blaxter, M. L. BlobTools: Interrogation of genome assemblies. F1000Res. 6, (2017).
Jurka, J. et al. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet. Genome Res. 110, 462–467 (2005).
Wei, J. et al. Genomic basis of environmental adaptation in the leathery sea squirt (Styela clava). Mol. Ecol. Resour. 20, 1414–1431 (2020).
Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl. Acad. Sci. U. S. A. 117, 9451–9457 (2020).
Smit, A. F. A., Hubley, R. & Green, P. RepeatMasker Open-4.0. 0–8 (2021).
UniProt Consortium. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 47, D506–D515 (2019).
Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nature Biotechnology vol. 29 644–652 Preprint at https://doi.org/10.1038/nbt.1883 (2011).
Fu, L., Niu, B., Zhu, Z., Wu, S. & Li, W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28, 3150–3152 (2012).
Holt, C. & Yandell, M. MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinformatics 12, 491 (2011).
Stanke, M. et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res. 34, W435–9 (2006).
Brůna, T., Lomsadze, A. & Borodovsky, M. GeneMark-EP+: eukaryotic gene prediction with self-training in the space of genes and proteins. NAR Genom Bioinform 2, lqaa026 (2020).
Korf, I. Gene finding in novel genomes. BMC Bioinformatics 5, 59 (2004).
Chan, P. P., Lin, B. Y., Mak, A. J. & Lowe, T. M. tRNAscan-SE 2.0: improved detection and functional classification of transfer RNA genes. Nucleic Acids Res. 49, 9077–9096 (2021).
Lowe, T. M. & Eddy, S. R. A computational screen for methylation guide snoRNAs in yeast. Science 283, 1168–1171 (1999).
Wucher, V. et al. FEELnc: a tool for long non-coding RNA annotation and its application to the dog transcriptome. Nucleic Acids Res. 45, e57 (2017).
Kang, Y.-J. et al. CPC2: a fast and accurate coding potential calculator based on sequence intrinsic features. Nucleic Acids Res. 45, W12–W16 (2017).
Cantalapiedra, C. P., Hernández-Plaza, A., Letunic, I., Bork, P. & Huerta-Cepas, J. eggNOG-mapper v2: Functional Annotation, Orthology Assignments, and Domain Prediction at the Metagenomic Scale. Mol. Biol. Evol. 38, 5825–5829 (2021).
Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).
Yin, L. CMplot: circle manhattan plot. https://github.com/YinLiLin/CMplot (2017).
Wickham, H. ggplot2. Wiley Interdisciplinary Reviews: Computational Statistics vol. 3 180–185 Preprint at https://doi.org/10.1002/wics.147 (2011).
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Dierckxsens, N., Mardulyn, P. & Smits, G. NOVOPlasty: de novo assembly of organelle genomes from whole genome data. Nucleic Acids Res. 45, e18 (2017).
Bernt, M. et al. MITOS: improved de novo metazoan mitochondrial genome annotation. Mol. Phylogenet. Evol. 69, 313–319 (2013).
Jalili, V. et al. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2020 update. Nucleic Acids Res. 48, W395–W402 (2020).
Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013).
Nguyen, L.-T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2015).
Zhang, C., Rabiee, M., Sayyari, E. & Mirarab, S. ASTRAL-III: polynomial time species tree reconstruction from partially resolved gene trees. BMC Bioinformatics 19, 153 (2018).
Tamura, K., Stecher, G. & Kumar, S. MEGA11: Molecular Evolutionary Genetics Analysis Version 11. Mol. Biol. Evol. 38, 3022–3027 (2021).
Al-Shahrour, F., Díaz-Uriarte, R. & Dopazo, J. FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes. Bioinformatics 20, 578–580 (2004).
Supek, F., Bošnjak, M., Škunca, N. & Šmuc, T. REVIGO summarizes and visualizes long lists of gene ontology terms. PLoS One 6, e21800 (2011).
Pertea, G. & Pertea, M. GFF utilities: GffRead and GffCompare. F1000Res. 9, 304 (2020).

There is NO Competing Interest.

3SupplementaryInformation1.docx
Supplementary information 1
4SupplementaryInformation2.xlsx
Supplementary information 2
2ExtendedData.docx
Extended data

Download PDF

Version 1

posted

You are reading this latest preprint version

Genomic richness enables worldwide invasive success

Status:

Version 1

Abstract

Figures

Main

Nuclear genome shaped by inversions

Inversions hide population structure

Mitogenomes reveal a secondary contact

Functional clusters enhance adaptation

1. Inversions confer resilience

2. Local adaptation across oceans

3. Cyto-nuclear coevolution signals

Tailoring population genomic studies

Methods

Declarations

Competing interests

Author Contributions

Acknowledgments

Data availability

References

Additional Declarations

Supplementary Files

Status:

Version 1