An average of 4.82 GB of clean data was generated for each sample. The average mapping rate was 97.076% (0.40 SD) for S. scurra, 96.69% (0.26 SD) for S. araucana, and 96.105% (0.48 SD) for S. ceciliana with respect to the reference genome of S. scurra (Table S1). Dunn's post hoc test revealed specific comparisons with significant differences. Between S. araucana and S. ceciliana, a significant difference was observed with an adjusted p-value of 0.0179. Similarly, between S. araucana and S. scurra, the difference was also significant with an adjusted p-value of 0.0290. The comparison between S. ceciliana and S. scurra showed the most notable difference, with an adjusted p-value of 0.00000264 (Figure S1). A significant number of SNPs were analyzed for each species and population (Table S2). For S. araucana, 6,888,985 SNPs were analyzed in the Temblador population (n=10) and 3,801,687 SNPs in the Concepción population (n=5). For S. scurra, 5,756,211 SNPs were analyzed in the Temblador population (n=10) and 6,027,035 SNPs in the Concepción population (n=10). For S. ceciliana, 5,939,690 SNPs were analyzed in the Chiloé population (n=10) and 4,471,003 SNPs in the Magallanes population (n=6)
The principal component analysis (PCA) revealed that the first two components (PC1 and PC2) explained 51.89% of the total variability in the data (Figure 3). These components clearly discriminated three groups: S. scurra (Temblador and Concepción), S. araucana (Temblador and Concepción), and S. ceciliana (Chiloé and Magallanes). For S. araucana, there were five samples from Concepción that clustered entirely with S. ceciliana and were excluded from further analyses. Another five individuals, which were slightly separated from those in the Temblador population and slightly associated with the samples of S. ceciliana, were included in the data set (Figure 3).
Distribution of FST and DXY Values
FST values (0-0.504) and DXY values (0-0.061) varied substantially among the three studied Scurria species (Figure 4). The Kruskal-Wallis analysis revealed significant differences between the species regarding the FST index (Kruskal-Wallis χ² = 6697.9, df = 2, p < 0.05) and the DXY index (Kruskal-Wallis χ² = 10.44, df = 2, p < 0.05). When comparing populations on either side of the first transition zone (30°S and 34°S), S. scurra had the highest average FST = 0.074, followed by S. araucana = 0.047; while when comparing populations of S. ceciliana on either side of the second transition zone (41°S and 43°S), the average FST was lower =0.009 (Figure S2-A). Chromosomes with FST outliers varied among species (Table 1 and Figure S2-B): ch02, ch03, ch04, ch05, ch06, ch07, ch08, and ch09 in S. scurra; ch01, ch02, ch03, ch04, ch05, ch06, ch07, ch08, and ch09 in S. araucana; and ch02, ch03, ch04, ch05, ch06, ch08, ch09 and in S. ceciliana. Average DXY values were similar across species. Unlike the observed trend in average FST values, the species that displayed the lowest DXY values was S. scurra (0.014), followed by S. araucana and S. ceciliana, which had the same average (0.017) (Figure S2-A). Although all chromosomes had DXY outliers, some had significantly more outliers, and this varied among species (Table 1 and Figure S2-B): ch01, ch02, ch04, and ch08 in S. scurra; ch01, and ch02 in S. araucana; and ch01, and ch02 in S. ceciliana. In all species, chromosome 1 (ch01) showed the highest average DXY.
Divergence patterns revealed through FST and DXY outliers
In all species, the predominant model was DA, i.e. regions with high FST (P ≥ 0.99) and low DXY (P ≤ 0.95) were the most abundant (Table 1 and Figure 4). The DGF model in S. scurra presented 45 regions with outlier values distributed across chromosomes ch02 (1), ch03 (1), ch04 (18), ch05 (3), ch06 (5), ch07 (13), ch08 (3), and ch09 (1); in S. araucana it was represented by 5 regions with outlier values distributed across chromosomes ch02 (3), ch04 (1), and ch08 (1); in S. ceciliana, the 72 regions with identified outlier values were distributed across chromosomes ch02 (8), ch05 (1), ch08 (61), and ch09 (2). For the DA model, in S. scurra 114 regions with outlier values were observed distributed across chromosomes ch02 (12), ch03 (9), ch04 (15), ch05 (23), ch06 (32), ch07 (12), ch08 (1), and ch09 (10); in S. araucana, 154 regions with outlier values were observed distributed across chromosomes ch01 (6), ch02 (39), ch03 (14), ch04 (27), ch05 (7), ch06 (4), ch07 (36), ch08 (7), and ch09 (14); while in S. ceciliana, the models of DGF (72) and DA (87) were almost equally represented. The 87 regions with outlier values associated with DA in S. ceciliana were distributed across chromosomes ch02 (1), ch03 (1), ch04 (1), ch05 (2), ch06 (1), ch08 (31), and ch09 (50).
Twelve genomic regions with outliers were shared between the species S. scurra and S. araucana. The first genomic region (DGF model) is located on chromosome 4 and contains a single gene (g22915; Table S3). The other 11 genomic regions are associated with the DA model and are located on chromosomes 4 and 7. They contain 3 genes (g24601, g24602, and g24603) and 8 genes (g16034, g16035, g16036, g16037, g16038, g16039, g1604, and g16072), respectively. All the genes shared between these species presented a high rate of non-synonymous mutations (pN/pS > 1; Table S3). The gene associated with the DGF model (g22915) is involved in lipid metabolic processes. Regarding the genes associated with the DA model, some exhibit unique functions while others share related functions. Among the genes with unique functions is g24603, which is involved in signaling and plasma membrane processes. The genes with related functions include: g16035 and g16038, involved in catalytic and hydrolase activity on proteins; g16036 and g16037, involved in metabolic processes of superoxide, copper ion binding, and oxidoreductase activity, as well as responding to oxidative stress and toxic substances.
S. scurra presented nine unique genes with a ratio pN/pS > 1 under the DFG model, which can be grouped into various functional categories (Table S3). In terms of membrane-related functions, several genes were identified: g1122 (pN/pS = 1.15), g15741 (pN/pS = 2.34), and g15747 (pN/pS = 2.29), related to the membrane; g19258 (pN/pS = 2.18), involved in transmembrane transport, symporter activity, and anion transport; and g9109 (pN/pS = 1.2), which participates in the G-protein coupled receptor signaling pathway, amine receptor activity coupled to G-protein, and the plasma membrane. On the other hand, gene g15576 (pN/pS = 1.48) is related to DNA binding, transcription initiation at RNA polymerase III promoter, the transcription factor TFIIIC complex, cellular anatomical entity, ncRNA transcription, and cellular macromolecule biosynthetic process. Regarding signaling and programmed cell death, gene g15578 (pN/pS = 2.01) is involved in these functions. Additionally, the gene g3038 (pN/pS = Inf) is associated with oxidoreductase activity, and the gene g3039 (pN/pS = 2.35), whose product, for example, a protein or a metabolite, is localized or has functions in the extracellular space.
Compared to S. scurra, S. araucana presented only two unique genes; however, these genes exhibit greater functional complexity and a wider diversity of specific cellular localizations (Table S3). The functions of these genes span various functional categories. In terms of ATP hydrolysis activity and transport, the genes g15029 (pN/pS = 1.01) and g15031 (pN/pS = 1.16) participate in ATP hydrolysis activity, transmembrane transport of xenobiotics, ABC transporter activity, long-chain fatty acid transport, and phospholipid transport. Additionally, they are involved in the transport and detoxification of xenobiotics across the plasma membrane and the blood-brain barrier. The blood-brain barrier protects the brain from potentially harmful substances present in the blood, but it also regulates the passage of nutrients and other elements necessary for brain function. In this way, they may participate in the response to stress and external compounds such as estradiol, oxidative stress, and lipopolysaccharide. Principio del formularioFinal del formulario
Regarding cellular and metabolic processes, these genes are associated with cobalamin metabolism, bile acid and bile salt transport, regulation of biological quality, nucleotide transport, and response to internal stimuli such as peptides and steroid hormones. Finally, in terms of cellular localization, these genes are associated with various cellular structures, such as the lateral and basolateral plasma membrane, the Golgi apparatus, cytoplasmic vesicles, the endosome, the perinuclear region of the cytoplasm, and fungal-type vacuoles.
For its part, S. ceciliana presented twelve unique genes, which are grouped into various functional categories (Table S3). In terms of biological regulation and stress response, gene g12028 (pN/pS = 1.11) is involved in biological regulation, transition metal ion homeostasis, superoxide metabolic processes, antioxidant activity, and cellular response to oxidative stress and toxic compounds. Regarding DNA binding and transcription regulation, gene g3630 (pN/pS = 1.21) participates in DNA binding and oxidoreductase activity, while gene g3855 (pN/pS = 1.18) is related to the nucleus, nucleosome binding, cellular component assembly, and chromatin organization. Additionally, gene g7048 (pN/pS = 1.03) is implicated in DNA repair and ubiquitin-mediated protein modification. Genes related to mitochondrial activity and energy processes include gene g3850 (pN/pS = 2), associated with the mitochondria, and gene g3892 (pN/pS = 1.76), involved in ATP hydrolysis activity, DNA replication, and double-strand break repair through homologous recombination. In terms of enzymatic and metabolic activity, genes g3859 (pN/pS = 1.54) and g3887 (pN/pS = 1.55) are related to transferase activity, phosphate metabolic process regulation, and endoplasmic reticulum stress response. Functionally, gene g3909 (pN/pS = 1.68) is associated with zinc ion binding and carbonate dehydratase activity in the extracellular region and plasma membrane. Additionally, some genes have structural and reproductive functions, such as gene g3833 (pN/pS = 1), which is a structural constituent of the ribosome, and gene g9865 (pN/pS = 1.15), which is involved in the reproduction of multicellular organisms. Finally, gene g9866 (pN/pS = 2.28) participates in biological quality regulation, retinal homeostasis, and chaperone-mediated protein complex assembly.
Size of genomic regions and number of genes
In all three species, 52.83% of the regions of genomic divergence were found in coding regions (Table 2 and Figure S3). Both S. scurra and S. araucana had the same number of genomic regions with coding regions (92). S. scurra had 23 coding regions associated with the DGF model and 69 associated with the DA model. On the other hand, S. araucana had 3 coding regions associated with the DGF model and 89 associated with the DA model. Finally, S. ceciliana had 68 coding regions, with 27 associated with the DGF model and 41 associated with the DA model. The Kruskal-Wallis test revealed significant differences between the studied species regarding the number of genes contained within the genomic regions (χ² = 21.576, gl = 2, p < 0.05). On the other hand, no significant differences were observed in the size of the genomic regions (χ² = 4.8446, df = 2, p = 0.08872). The distribution of the number of genes and the size of the genomic regions by model is illustrated in Figure 5. For the DGF model, the genomic regions in S. ceciliana contained the highest number of genes (136), followed by S. scurra (71) and S. araucana (10). The largest genomic regions also corresponded to S. ceciliana (45000 bp), followed by S. scurra (31515 bp) and S. araucana (26666 bp). For the DA model, S. araucana had the highest number of genes (225) compared to S. scurra (155) and S. ceciliana (148). Similar to the DGF model, the largest genomic regions were found in S. ceciliana (34800 bp), followed by S. araucana (29333 bp) and S. scurra (25618 bp).