Candidate gene selection
To identify candidate genes with a sequence divergence pattern related to differential temperature adaptation, previously published transcriptome data (Paraskevopoulou et al., 2020; GenBank accessions number SRA: SRR10426055-76) was analysed using the following customized bioinformatic pipeline (Supplementary figure 1): De novo transcriptome assemblies were used to generate a super transcript for each species using Trinity v. 2.11.0 gene splicer modeler (Grabherr et al., 2011). This super transcript step was integrated to minimize redundancy in the data by inferring original genes and alternative splice forms (Davidson et al., 2017). A reciprocal best blast hit process was performed, using the blastn program of BLAST 2.9.0+ (Altschul et al., 1990) to detect orthologous genes shared by B. calyciflorus s.s. and B. fernandoi. To determine the best blast hit, results were filtered using a combination of E-value, which is a size corrected measure of statistical significance, and the bit score, which is a measure of matches and mismatches between two sequences (Altschul et al., 1990). When E-values of the first two or more hits were the same, the best hit was chosen using the bit score. For each search, a list was created that contained one best hit per query. Orthologous gene pairs were extracted, and open reading frames were predicted using TransDecoder v 5.5.0 (Haas et al., 2013). To calculate ratios of the rates of non-synonymous and synonymous substitutions (dN/dS) from protein-coding regions, sequences were then translated into amino acid sequences using Biopython (Cock et al., 2009) and locally aligned within the Biopython integrated package pairwise2 using a Smith-Waterman algorithm. Locally aligned coding sequences and the corresponding unaligned DNA sequences were used to create codon alignments using pal2nal (Suyama et al., 2006). Selection tests were conducted using all available seven (M0, M1a, M2, M3, M7, M8 and M8a) codeML models (Supplementary table 1), as implemented in PAML v. 4.9 (Yang, 2007). These models where then compared in four (M0 vs. M3, M1a vs. M2, M7 vs. M8, M8 vs. M8a) predefined likelihood-ratio tests (LRT) to decide whether the different null models (i.e., model that does not allow for any codons ω > 1, corresponding to absence of positive selection) can be rejected (Yang, 2007).
Orthologous genes that were significant (< 0.05) in all four LRT comparisons were annotated using Gene Ontology (GO) terms (Ashburner et al., 2000). These terms are hierarchically ordered descriptions of genes or proteins molecular functions, biological processes, and cellular components. The GO term annotation was done on the online Server Argot2.5 and its batch processing function (Lavezzo et al., 2016). To use this function, the longest available amino acid sequence of each protein was used in a blast search against the Uniprot Swiss-Prot database (The UniProt Consortium, 2021) and a hmmer search (Finn et al., 2011) against the Pfam-A database (Finn et al., 2014). In addition, we compared the genes exhibiting positive selection in the selection tests with those found to be differentially expressed relative to species and temperature in Paraskevopoulou et al. (2020). Among those genes functionally related to temperature tolerance (GO:00009408; “response to heat”; GO:00009409; “response to cold”), only one orthologous gene, the HSP 40kDa, exhibited signs of positive selection and was differentially expressed in more than one temperature category, hence chosen for in-depth analysis.
DNA extraction
To extract DNA, single rotifer specimens were collected from laboratory cultures, washed in 96% EtOH and stored in a 1.5mL reaction tube in10µL HPLC-H20. In total 56 rotifer individuals (B. calyciflorus s.s., n = 25; B. fernandoi, n = 19; B. rubens, n = 4; B. diversicornis, n = 3; B. angularis, n = 5) were collected from different locations in Germany. For B. calyciflorus s.s., additional samples from USA, Italy and Austria were available (Supplementary figure 2, Supplementary table 2). To prevent that the DNA from single individuals being lost during the extraction process, 3µL carrier RNA [1µg/µL] (QIAGEN, Germany) was added to the sample during the lysis. DNA was extracted using the NucleoSpin®Tissue extraction kit (Macherey-Nagel, Germany) following the manufacturer’s protocol for animal tissue (page 30-33).
Primer design HSP 40kDa
To amplify the HSP 40kDa from all five different Brachionus species, primers were designed using the candidate gene sequences (B. calyciflorus s.s. and B. fernandoi) as well as the published HSP 40kDa sequence of B. calyciflorus (unknown species assignment regarding current taxonomy; Genbank: KC176712.1; Yang et al. 2014). Sequences were aligned in Geneious v. 8.1.9 (Kearse et al., 2012) using ClustalW (Thompson et al., 1994) and the implemented primer3 algorithm (Untergasser et al., 2012) was used to design primer pairs.
Amplification and Sequencing of the ITS1, COI and HSP 40kDa
To determine the species and clonal diversity, both the nuclear internal transcribed spacer (ITS1) and the mitochondrial Cytochrome Oxidase Subunit I (COI) were amplified, using universal invertebrate primers (for ITS1: Palumbi, 1996; for COI: Folmer et al., 1994). PCR conditions, primer sequences, and used PCR chemicals mixes can be found in the Supplementary table 3, 4 and 5. The HSP 40kDa gene of all five Brachionus species was amplified using newly designed primer pairs (Supplementary table 3, 6 and 7). Successfully amplified HSP 40kDa, COI and ITS1 products were purified using an enzymatic (ExoAP) procedure and sequenced on a Sanger sequencing platform (Applied Biosystems™ 3500 Genetic Analyzer). Sequences from all specimens used in this study (Supplementary table 2) were visually inspected using Geneious v. 8.1.9 and heterozygous positions (for ITS1 and HSP 40kDa) were encoded with the standard IUPAC code for ambiguity (Comnish-Bowden, 1985). To reconstruct both alleles of the diploid nuclear DNA, the PHASE algorithm version 2.1 (Stephens et al., 2001; Stephens & Donnelly, 2003) was used, as implemented in DNAsp v. 6 (Rozas et al., 2017).
HSP 40kDa Sequence Diversity
To compare structural variability and the heterozygosity level of the HSP 40kDa gene among different species, consensus sequences per species were generated. In order to avoid any bias which may arise from uneven geographic sampling across species, we only used specimens originating from the same region “Uckermark” for this analysis: B. calyciflorus s.s. (n = 11), B. fernandoi (n = 8), B. rubens (n = 4), B. diversicornis (n = 3) and B. angularis (n = 5). Sequence ambiguities in the consensus sequence were coded as N’s. Pairwise nucleotide diversities of the 1,184bp long HSP 40kDa gene sequence and the 435bp long COI sequences between the five species were calculated and non-synonymous resp. synonymous substitutions in the HSP 40kDa gene were identified using DNAsp v. 6 (Rozas et al., 2017).
Expressed and Structural Variation in the HSP 40kDa gene between B. calyciflorus s.s. and B. fernandoi
Sequences originating from multiple individuals (B. calyciflorus s.s., n = 50; B. fernandoi, n = 38) were translated using the published translation reading frame of the HSP 40kDa gene of B. calyciflorus (Genbank: KC176712.1). As we did not yield the complete sequence for all specimens, the alignment was truncated to 888bp present in all sequences. Number of non-synonymous substitutions per non-synonymous site between the two species were compared using SLAC analysis implemented in Datamonkey (Weaver et al., 2018; Kosakovsky Pond & Frost, 2010).
HSP 40kDa allele Identification
To identify different allele types within the B. calyciflorus species complex, allele networks were generated from the resulting 88 phased sequences (B. calyciflorus s.s. n = 50 and B. fernandoi n = 38) using the program popART version 1.7 (Leigh & Bryant 2015). Subsequently, a Circos plot (Krzywinski et al., 2009) was generated based on allele types identified by popART, species, and specimens’ origin.
HSP 40kDa gene diversity relative to established barcoding markers
Species recognition within the B. calyciflorus complex relies on species-specific ITS1 sequences and clonal lineages are further characterized by their mitochondrial COI haplotype. For 43 specimens (n = 43; B. fernandoi n = 18, B. calyciflorus s.s. n = 22, B. rubens n = 1, B. angularis n = 1, and B. diversicornis n = 1), sequences were available for both these marker genes and the HSP 40kDa gene. For these specimens, congruence in genetic affinity across genetic markers was assessed using a tanglegram approach. Dendrograms and the tanglegrams were generated in R v.4.0.5 with the package dendextend v.1.15.1 (Galili, 2015).
Selection tests and divergence time
To test whether the expressed genetic variation among the different Brachionus species is under selection, a selection test was performed based on the 29 HSP 40kDa species-specific alleles using the Program PAML v.4.9 (Yang, 2007). A specific focus was on the divergence of the two species B. calyciflorus s.s. and B. fernandoi. This analysis needs a phylogeny as input which was based on the 534 bp long ITS1 fragment and calculated using RaxML v.8 (Stamatakis, 2014), performing 1000 bootstrap iterations (Supplementary figure 3). This ITS1 phylogeny as well as the HSP 40kDa unique alleles were used to infer sites under selection according to the codeML site model implemented in PAML, testing all seven different models of selection (table 1) with a subsequent site under selection inference via Bayes Empirical Bayes (BEBs) (for parameter setting see Supplementary table 8). An additional branch site test was performed to identify patterns of selection on three different branches: (I) the shared branch of B. calyciflorus s.s. and B. fernandoi, (II) B. calyciflorus s.s. only and (III) B. fernandoi only (Supplementary figure 3). Subsequently, to further compare the two formerly cryptic species of B. calyciflorus s.s. and B. fernandoi, a codon-based McDonald-Kreitman test (MKT) and Tajima’s D statistics (allele frequency based) were calculated based on inferred the unique alleles using DNAsp v. 6 (Rozas et al., 2017).
Species-specific fixed amino acid substitutions were identified (Supplementary figure 5) and the ancestral amino acid was inferred under parsimony, using sequence information from B. rubens, B. angularis and B. diversicornis and the ITS1 phylogeny (Supplementary figure 3). The functional effects of the specific amino acid (aa) substitutions were predicted with Polypen2 (Adzhubei et al., 2010) using B. calyciflorus s.s. as a reference. Note here that Polyphen2 does not consider the possibility of a positive effect of an aa substitution, such that it ranks any substitution without a predicted change in protein function as "benign" (i.e., no effect), while any predicted change is ranked as "damaging" (i.e., predicted to impact protein function).
To estimate the divergence time between B. calyciflorus s.s. and B. fernandoi the program BEAST v.1.10.4 (Suchard et al., 2018) was used. Whole mtGenome coding sequences (cds) of different rotifer species were downloaded from NCBI: B. calyciflorus s.s. (Kiemel et al., 2022), B. fernandoi Kiemel et al., 2022), B. angularis (Kim et al., 2020), B. manjavacas (Kim et al., 2021), B. plicatilis (Suga et al., 2008), B. koreanus (Hwang et al., 2014), B. rotundiformis (Kim et al., 2017), B. rubens (Choi et al., 2019) and P. similis (Choi et al., 2020) (for Genebank accession numbers cf. Supplementary figure 8). In absence of a published mutation rate for rotifers, the standard divergence rate for invertebrates/insects (2.3% Myr-1) with a constant substitution rate of 0.0115 per million years per lineages (Brower, 1994) was used to calibrate the phylogeny. Cds were aligned in Geneious v. 8.1.9 using ClustalW and the IQ-Tree webserver (Trifinopoulos et al., 2016) was used to determine the best fitting substitution model. The BEAST run was conducted with 10 million MCMC iterations, with GTR+G4+I chosen as the substitution model. Convergence was checked with Tracer v.1.7.2 (Drummond & Rambaut, 2007), using the first million MCMC iterations as the burn-in value.