DNA barcoding provides insights into Fish Diversity and Molecular Taxonomy of the Amundsen Sea

The Southern Ocean is experiencing complex climate change, and the Amundsen Sea is one of the regions that has responded most rapidly to climate change. Due to their role in ecosystems, environmental sensitivity and high endemism, Antarctic demersal fish are a favorable group that can act as an indicator of the response of Antarctic organisms to climate change. However, our knowledge of Antarctic fish fauna is insufficient, with knowledge gaps even in their taxonomy. This situation is greatly influenced by the limitations of traditional taxonomy and thus calls for alternative solutions such as DNA barcoding. In this study, DNA barcoding analyses of 69 fish samples obtained from the Amundsen Sea were conducted using the mitochondrial COI gene. Based on the molecular species delimitation results, 13 fish species were found to belong to two orders, six families, and 12 genera. Both the maximum likelihood and Bayesian inference methods showed that the phylogenetic relationships of Bathydraconidae were paraphyletic, which was consistent with previous phylogenetic research. Our research showed that the COI gene, as a DNA barcode, is not only suitable for the identification of Antarctic fish species but also reflects some phylogenetic characteristics that might provide important evidence and support for studies of Antarctic fish phylogenetic relationships. In summary, our study provides an important reference for fish diversity and taxonomy in the Amundsen Sea, which may further enhance our understanding of the biodiversity, taxonomy and biogeography of fish in this area.


Introduction
The Southern Ocean occupies almost 10% of the ocean area on Earth (Joyner 1998). It is the only ocean that surrounds Earth and is not divided by continents. This gives it a unique ocean current system. The Antarctic Circumpolar Current (ACC) travels around Antarctica in a clockwise direction, driven by sustained westerly winds (Allison et al. 2010). It prevents warm water from flowing from lower latitudes to Longshan Lin linlsh@tio.org.cn Hai Li lihai@tio.org.cn 1 identification based on molecular biology has emerged to give taxonomists more choices and has the potential to become a universal method. This method is expected to become one of the most convincing types of classification evidence (Hebert et al. 2003a). DNA barcoding is increasingly advocated for in the identification of species. DNA barcoding based on the cytochrome c oxidase subunit Ι (COΙ) mitochondrial gene has been applied to the identification of species (Hebert et al. 2003b). A COΙ fragment of 650 bp has enough sequence diversity to reflect significant species-level differences and has demonstrated high efficiency and accuracy in species identification on a global scale, such as in Japanese marine fish (Zhang and Hanner 2011), Indian marine fish (Lakra et al. 2011), Cuban freshwater fish (Lara et al. 2010), Indo-Pacific coral reef fish (Hubert et al. 2012), and even birds (Hebert et al. 2004), mammals (Francis et al. 2010), and bivalves (Mikkelsen et al. 2007), among others. In this paper, the COΙ-based molecular identification method is applied to Antarctic fish of the Amundsen Sea. Our research aims to provide fundamental taxonomic information for fish species of the Amundsen Sea and thus provide a solid scientific basis for the ecological assessment and biological conservation of the Southern Ocean.

Specimen collections
All specimens were collected at Xuelong icebreaker research vessels during the 36th Chinese National Antarctic Research Expedition (CHINARE) in 2020. Specimens were caught by a bottom trawling net (2.2 m wide, 0.65 m high, and 6.5 m long, 20 mm mesh diameter). Every net was employed for approximately 10 ~ 15 min at speeds of 2 ~ 3 kn. All samples were collected from 5 stations ( Fig. 1) in the Amundsen Sea. All caught fish were sorted at -20 °C and provisionally identified. Muscle samples were stored in 95% ethanol for DNA extraction. Morphological identification followed Gon's classification method (Graeme 1992). Finally, all fish were fixed in 10% formaldehyde and stored as voucher samples at the Third Institute of Oceanography, Ministry of Natural Resources.

DNA preparation, PCR and sequencing
DNA extraction was carried out with muscle tissue by using a DNeasy Blood and Tissue Kit [Qiagen, Hilden, Germany]. Some steps followed those of Hellberg et al. (2014) (Mintenbeck et al. 2012). These fish live in cold, oxygenrich, and stable ocean environments and are highly endemic (Mintenbeck and Torres 2017). These characteristics, along with the roles the fish play in the ecosystem, make Antarctic fishes a favorable group that can act as an indicator of environmental change in the Southern Ocean.
Even in the vast ocean area of the Southern Ocean, there are only approximately 370 species of fish described that account for ~ 2% of all fish species worldwide, and this number is an underestimate (Eastman 2000). Ice cover, lack of deep-sea samples, low sampling frequency and insufficient traditional taxonomy may be the reasons for underestimation (Alt et al. 2021). Unfortunately, the situation of the fish fauna of the Amundsen Sea is even worse because the Amundsen Sea is located in a remote location relative to scientific research stations and routes (Griffiths et al. 2011). There have been only limited observation records and an underwater observation survey report (Eastman et al. 2012), while studies based on molecular taxonomy have not yet been reported. Currently, the Amundsen Sea is among the places where the sea temperature in the Southern Ocean rises most obviously (Kim et al. 2021). The rapid rise in sea temperature has led to a decrease in sea ice cover and a sustained decline in the ice shelf (Haumann et al. 2016). Meanwhile, the benthic ecosystem in Antarctica is vulnerable (Pineda-Metz et al. 2020), and glacier retreat (Sahade et al. 2015) and associated iceberg scouring (Gutt and Piepenburg 2003; Barnes and Souster 2011) have a huge impact on benthic communities, including Antarctic fish, which mostly belong to demersal fish (Mintenbeck et al. 2012). Moreover, the decline in salinity and dissolved oxygen (Yager et al. 2012;Randall-Goodwin et al. 2015) also brings challenges to fish survival that cannot be ignored. As one of the important indicator groups of climate change, the lack of information on the composition of fish communities in the Amundsen Sea will seriously affect the evaluation of the structure and function of its marine ecosystem. Therefore, a fish diversity baseline inventory is urgently needed, and clarifying the characteristics of Amundsen Sea fish diversity patterns can help us better understand the impacts of climate change on Amundsen Sea marine ecosystems.
Traditional fish classification is based on morphological identification, which is time consuming and depends on the experience of the taxonomist (Steinke et al. 2009). However, the morphologies of sibling species are similar, which can easily lead to misidentification. In particular, the amazing diversity of sizes, colors, and shapes in different life stages of fish is a challenge to taxonomists (Zhang and Hanner 2012). Moreover, the taxonomic division of some fish in the Southern Ocean is controversial (De Broyer et al. 2014). All these problems require new solutions. Species chain Monte Carlo (MCMC) tool for analyzing DNA sequences under the multispecies coalescent (MSC) model. The ultrametric tree with haplotypes was reconstructed using BEAST v1.10.4 (Drummond et al. 2012). The parameters in BEAUti use the GTR model and gamma shape site model. The number of gamma categories is 4, the relaxed clock is uncorrelated, and the chain length is 30,000,000 iterations for MCMC. The taxonomic units calculated by the ASAP and BPP were compared with the sequences of known species in the NCBI database to determine the taxonomic authenticity of the species. The taxonomic units with ≥ 98% similarity to the known sequences were the same species (Murphy et al. 2016), and those with < 98% and ≥ 95% similarity to the known sequences were the same genus (Ratnasingham & Hebert 2013).
The suitable genetic distance model was calculated by jModelTest v2.1.10 (Posada 2008). Genetic distances were calculated using the Kimura two-parameter (K2P) distance model (Kimura 1980) with 1000 bootstrap replicates and uniform rates using MEGA X (Kumar et al. 2018). Intraand interspecies genetic distances and pairwise distance were considered. We used the online tool SMS to find suitable models of nucleotide substitution under the Akaike information criterion (AIC). A BI tree and ML tree were used to construct the phylogenetic relationships. The BI tree was constructed using MrBayes v3.1.2 (Huelsenbeck et al. 2001), and MCMC analysis was run with 10,000,000 generations, sampling every 1000 generations. We used PhyML3.0 (Guindon et al. 2010) to build an ML tree with GTR and 0.186 gamma shape parameters as substitution models, NII for tree improvement, and the aLRT SH-like fast likelihood method. Finally, the majority-rule consensus tree was reconstructed and displayed using Figtree v1.4.4.

Morphological and DNA identification
A total of 69 fish samples were collected in this study. Most of them were adults and well preserved, but some individuals were small or damaged during preservation and thus difficult to identify. The identification was greatly limited by the poor Antarctic fish classification literature. In this study, 12 morphological species were identified by morphological characteristics and keys (Appendix 1).
All COΙ fragments were successfully amplified and sequenced. The sequences of the COΙ gene with high quality (no double peaks, short fragments or background noise) were aligned and contained no insertions, deletions, or stop codons. The length of the COΙ sequences were prepared in advance. Muscle samples (approximately 30 mg) were weighed into 1.5 mL microtubes, and then the steps in the manufacturer's instructions were followed. Finally, DNA was stored at -20 ℃ until PCR amplification. The primers in this study were designed by Ward (2005) and were used for COΙ amplification.
All PCRs had a total volume of 25 µL and included 17.25 µL of ultrapure water, 2 µL of dNTPs (2.5 mM), 2.5 µL of 10 × PCR buffer (including Mg 2+ ) (20 mM), 1 µL of each primer, 0.25 µL of Taq polymerase [TaKaRa, Kusatsu, Japan] (5 U/µL), and 1 µL of DNA template. Amplifications were performed using a SensoQuest LabCycler [Sen-soQuest, Germany] gradient thermal cycler. PCR cycling consisted of an initial step of 4 min at 95 ℃ and 35 cycles of 30 s at 94 ℃, 30 s at 50 ℃, and 30 s at 72 ℃, followed by a final extension at 72 ℃ for 10 min. PCR products were loaded onto 1% agarose gels and selected for sequencing, and all PCR products were purified and sequenced by Personal Biotechnology Co., Ltd.

DNA identification and phylogenetic analysis
All COI sequences were edited using DNASTAR Lasergene SeqMan Pro 7.1 and aligned manually using Sequencher 4.1 To facilitate the calculation of the genetic distance, two additional data points from the NCBI database were added for each species with fewer than three fish. We used two DNA identification methods to access taxonomic units: assembly of species by automatic partitioning (ASAP) (Puillandre et al. 2021) and Bayesian phylogenetics and phylogeography (BPP) (Yang et al. 2014) to infer putative species boundaries based on the COΙ gene. ASAP uses single locus sequence alignments to create species partitions; it is based on the implementation of a hierarchical clustering algorithm and compares only pairwise genetic distances. All aligned COΙ sequences were calculated by ASAP (https://bioinfo.mnhn.fr/abi/public/asap/asapweb. html) with the JC69 (Jukes-Cantor) model to compute the distance and default settings (split groups below probability 0.01, keep 10 best scores). BPP is a Bayesian Markov .71%, and T = 31.36% on average, with a slight bias against G and C. The best classification result in ASAP (second-best model) supported 69 sequences representing 11 taxonomic units. Artedidraco lonnbergi and Dolloidraco longedorsalis were potentially one taxonomic unit. Lycenchelys sp. and Ophthalmolycus amberensis were also in the same situation. However, BPP showed a different result from ASAP (Fig. 2). BPP confirmed that 69 COI sequences belonged to 13 taxonomic units, and this result is basically consistent with the result of traditional morphological identification. Altogether, molecular methods proved that 69 sequences belonged to 13 species of fish, 12 genera, 6 families, and 2 orders ( Table 1). The newly isolated nucleotide sequences were deposited in GenBank under accession numbers (Appendix 1).

Genetic distance and phylogeny analysis
The uncorrected K2P pairwise distance within species was below 1%, averaged 0.31%, and ranged from 0 to 1.01%. The genetic distance between species varied between 1.84% and 29.9% (Fig. 3). The best-fitting model was GTR + G, and the gamma distribution shape parameter was 0.186. Two phylogenetic trees, the BI tree and ML tree, showed similar topologies, and the majority-rule consensus tree was used to show the phylogenetic relationship of fish. The tree supported a branch of Bathydracinidae nested within Channichthyidae. Most individuals in the tree clustered together in groups of the same species.
was 652 bp after alignment, including 237 polymorphic sites (223 parsimony-informative sites, 14 singleton variable sites). The average base composition was A = 21.03%,  The ultra-metric tree with haplotypes was obtained from BEAST determine its taxonomic status. Accurate taxonomic status and species identification require a combination of morphological and genetic findings. DNA barcoding shows the difference between the two species only at the genetic level but lacks support from morphological characteristics. The morphological characteristics of species are the scientific basis for their taxonomic status and biological studies, but traditional taxonomy relies on the experience of taxonomists. Therefore, a combination of molecular and traditional morphological methods for species identification is necessary.

Phylogenetic relationships
The COΙ gene is a short nucleotide fragment from mitochondria and is not the best choice for phylogenetic analysis; however, the topology of its phylogenetic tree might still have reference value (Steinke et al. 2009). The tree topology based on COΙ barcoding is usually related to the delineation of clusters. Although the ML tree was based on a priori inference and Bayesian inference was based on a posteriori inference, the topology supported by the results was basically the same (Fig. 4). In particular, they both supported that Bathydracinidae were paraphyletic. Previous studies reported similar results (Derome et al. 2002;Bargelloni et al. 2004). Multiple nuclear markers and multiple studies also confirmed that Bathydracinidae are paraphyletic (Near et al. 2004;Rock et al. 2008). In terms of the phylogenetic relationship, our COI-based phylogenetic signal further verifies the topological structure revealed by other studies.

The demersal fish fauna in the Amundsen Sea
In recent decades, with deepening research and the emergence of commercial fishing, increasing information about

Effectiveness of COΙ barcoding and species identification
The accuracy of DNA barcoding is the key to species identification, which depends on the degree of intra-and interspecific variation of the selected gene fragments. The less intra-and interspecific overlap there is, the more effective the barcoding. Intraspecific variations are generally similar among species (Waugh 2007). However, the range of interspecific differences varies depending on the size of the selected group and geographic populations. The use of means for intraspecific and interspecific genetic distance comparisons does not allow for the detection of problematic cases. Therefore, we compared the minimum interspecific distance with the maximum intraspecific genetic distance (Meier et al. 2008). In this study, the minimum interspecific distance was 1.84%, the maximum intraspecific genetic distance was 1.01%, and the barcoding gap was between 1.01% and 1.84%.
We used two different methods to infer the putative species boundaries, namely ASAP and BPP. ASAP is based on single-marker pairwise genetic distance and avoids the heavy computational burden of phylogenetic reconstruction. It does not require any biological a priori insights and can quickly come up with relevant species hypotheses (Puillandre et al. 2021). BPP can accurately assign identity at the species level without knowing species boundaries in advance, even when analyzing rare taxa with only one locus available (Yang and Rannala 2017). The classification of most species is consistent. BPP and morphology have obtained similar results, while ASAP has some differences. As the BPP results were consistent with the BLAST results against the GenBank database, BPP was likely to show more accurate species identification results. However, it is worth noting that there are ten results displayed by ASAP. We consider the classification results of only the first-and second-best scores. If barcoding gaps or other prior conditions are considered, ASAP can achieve the same results as BPP. Overall, DNA identification can provide simple and reliable species classification results and shows the uniqueness of the method when morphology is difficult to perform.
In this study, 12 morphological species were identified, and 13 species were identified by DNA barcoding. Lycenchelys sp. was misidentified as O. amberensis. It is important to note that Lycenchelys sp. has been previously identified by Rock (2008), who supported that the individual was from a valid species without a morphological description. There are few data related to this species in the online database. The morphological characteristics of this sample in this study were impaired, and more specimens and more detailed descriptions of this species are still needed to Fig. 4 The Bayesian inference COI phylogenetic tree for 69 Antarctic fish in the Amundsen Sea was obtained from MrBayes, with the scale bars proportional to substitution rates; support values are ML Probabilities support/ Bayesian Posterior; ML supports for the clades are also present in the ML trees identify by morphology; in contrast, our results are based on molecular taxonomy analysis of fish catches. From this perspective, our identification results are undoubtedly more credible.
To the best of our knowledge, our study is the first on the molecular taxonomy of fish in the Amundsen Sea. Our results provide important taxonomic information on the demersal fish fauna in the Amundsen Sea. This is of great significance for understanding the biodiversity, taxonomy and biogeography of fish in the Amundsen Sea. However, we believe that there remain many unknowns about the diversity of demersal fish in this area that should be explored. Broader sampling of latitudes, deeper sampling depths, and higher sampling densities are all necessary for future research. Finally, the integration of molecular identification and morphological identification is suggested to ensure precise taxonomy in future studies of Antarctic fishes.

Conclusions
This study illustrates the fauna and phylogenetic relationships of fish in the Amundsen Sea based on the 36th CHIN-ARE. The results show that DNA barcoding is an effective method for identifying Antarctic fish, especially in the case of sample morphological damage. Thirteen species from six families of Antarctic fishes were identified, and six species were first recorded in the Amundsen Sea region. Our study provides reliable information on the distribution and classification of demersal fishes in the Amundsen Sea, which is highly similar to that in other parts of the Southern Ocean. The Amundsen Sea is geographically remote, but as one of the areas with the most rapid climate change, fish research in this area is an important part of the exploration of the Antarctic ecosystem affected by climate change. More surveys should be conducted to better understand fish in the Amundsen Sea and explore the profound impact of climate change on fish in polar regions. the community structure and classification of fish in the Southern Ocean has been discovered. In general, Notothenioidei, including Artedidraconidae, Bathydraconidae, Channichthyidae, Harpagiferidae, and Nototheniidae, has an absolute advantage in terms of number, accounting for most of the total species biodiversity (Eastman and McCune 2000;Eastman 2004). Additionally, there are some typical deep-sea fish groups, such as Liparidae and Zoarcidae. Some Antarctic fish diversity studies based on molecular taxonomy have been applied in the Ross Sea (Smith et al. 2012), Prydz Bay (Li et al. 2018, Scotia Sea (Rock et al. 2008), Dumont d'Urville Sea (Dettai et al. 2011), andAntarctic Peninsula (Mabragaña et al. 2016) and verified the aforementioned Antarctic fish diversity pattern.
In this study, 13 species of fish were identified in the surveyed seas, most of which belonged to Artedidraconidae, Bathydraconidae, Channichthyidae, and Nototheniidae in addition to Liparidae and Zoarcidae. Harpagiferiade did not appear in our study because these species are usually distributed in the sub-Antarctic region (Navarro et al. 2019), but the Amundsen Sea is located at high latitudes. Relatively speaking, there were only a few sampling stations with shallow sampling depths, which may be why we missed those typical deep-sea groups. At present, the fish fauna of the Amundsen Sea area have been studied by underwater observations. Our results supported that Notothenioidei dominates both in abundance and biomass. This is consistent with the aforementioned general pattern of the Southern Ocean fish fauna. The fish we caught were also roughly similar to the fauna observed by Eastman et al. (2012); however, our study provided more detailed assignment at the species level, with some additional exclusive species recorded. In particular, Ophthalmolycus amberensis, Chaenodraco wilsoni, Dacodraco hunteri, Akarotaxis nudiceps, Artedidraco lonnbergi and Vomeridens infuscipinnis might be recorded for the first time in the Amundsen Sea. It should also be noted that Eastman's data came from underwater photography, and some species are difficult to  (2012)