The Phylogeographic Relationships and the Evolution History of Carassius Auratus Complex With a Newborn Homodiploid Crucian Carp-like Fish (2nNCRC)

Background: One of the important aspects of studying evolution is to understand how new species are formed and their uniqueness maintained. Hybridization can lead to the formation of new species with the reorganization of adaptive system and signicant changes in phenotype. It is wondrous that eight stable strains of 2nNCRC derived from the interspecies hybridization have been established in our laboratory. To examine the phylogeographical pattern of the wildly distributed genus Carassius in the Eurasia, and investigate the possible hybrid origin of Carassius auratus lineage, in light of past climatic events, the mitochondrial genome (mtDNA) were used to reconstruct the phylogenetic relationship between the C. auratus complex and the 2nNCRC, and to assess how demographic history, dispersal and barriers to gene ow have led to the current distribution of mtDNA lineages for C. auratus complex. Results: As expected, the 2nNCRC had a very close relationship with the C. auratus complex, which was distinctly separated with other three species of Carassius. The C. auratus lineage possibly originated from China during the Late Pliocene, far postdated the diversication of C. carassius in Europe and C. cuvieri in Japan. The admixture of mtDNA haplotype lineages of C. auratus detected across the whole Eurasia has experienced a rapid diversication since Early Pleistocene Conclusion: Combined the molecular dating analyses, species distribution modeling and ancestral area reconstruction, the speciation of C. auratus seemed not to be the processing of lineage diversication from the most recent common ancestor of C. carassius or C. cuvieri. The formation of 2nNCRC in our laboratory could be a good candidate explaining for the hybrid origin species for C. auratus lineage, as well as the paleoclimate oscillation and geological event during Pliocene and Pleistocene in China supplying an opportunity for the distant hybridization. The most wildly distributed C. auratus lineage could be attributed to the dispersal during the glacial period and the recent human-facilitated dispersal.

reconstruction, the speciation of C. auratus seemed not to be the processing of lineage diversi cation from the most recent common ancestor of C. carassius or C. cuvieri. The formation of 2nNCRC in our laboratory could be a good candidate explaining for the hybrid origin species for C. auratus lineage, as well as the paleoclimate oscillation and geological event during Pliocene and Pleistocene in China supplying an opportunity for the distant hybridization. The most wildly distributed C. auratus lineage could be attributed to the dispersal during the glacial period and the recent human-facilitated dispersal. Background A fundamental question in evolutionary biology is whether speciation is gradual or punctuated [1]. The hybridization is regarded as an 'evolutionary catalyst' which may be an important source for the origin of a new species or increase the quantum of genetic variability. Hybridization can result in the reorganization of adaptive system and lead to the formation of new species [2]. Traditional concept holds that the speciation was the processing of lineage diversi cation [3]. However, recent studies have con rmed that the hybridization speciation by means of Lineage fusion was also one of the important species diversi cation [4,5]. Hybridization between species can trigger vast genetic and genomic imbalances, including a high rate of DNA mutations and combinations [6], often results in signi cant changes in phenotypes and genotypes of hybrid offspring, which may facilitate speciation and adaptive radiation.
Drastic climatic and geological oscillations, such as the glacial expansion and mega-droughts forced freshwater species to contract their distribution ranges and reside in small refuge in many cases. The potential gene exchange and hybrid speciation was likely to happen owning to the close contact between distant species. Increasingly, studies suggest that it is not uncommon on the inter-species hybridization and gene introgression in sh, such as the families Poeciliidae, Atherinidae, Cyprinidae, Cobitidae, and even the Carcharhinidae [7][8][9], especially the East African cichlids, which have been attracted much attention in recent years [10][11][12]. As the most number of species in vertebrate, more than 34000 species have been recorded (https://www. shbase.de/). It was deduced that the formations of many shes were associated with hybridization [13,14]. The ancestral hybridization has been suggested to play a key role in facilitating species diversi cation of cichlids sh in the Lake Malawi, Lake Victoria, as well as Lake Tanganyika [10][11][12]. Such hybridization would be common when allopatric species come into second contact due to the drainage evolution or huge environmental changes, such as the mega-droughts caused strong water level changes and probably led to severe reduction of their natural habitat, which could increase the odds of interaction between sh species and eventually cause the hybridization between distant relatives. The ancient hybridization of Lake Victoria cichlids sh was considered to be related to the capture of Malagarasi (Congo) tributaries and the East African mega-droughts, providing an opportunity for the Congolese lineage colonized in the Lake Victoria region and Congo-Nilotic admixture event [10].
However, for a long time, it lacks of enough and direct evidences to prove that sh hybridization will produce new species. The distant hybridization can generate allotetraploid and autotetraploid sh, which supply an important and direct evidence for the hybridization speciation in sh [15,16]. It was reported that the 2nNCRC (2n = 100) could be derived from the interspecies hybridization between Cyprinus carpio (female, 2n = 100) and Megalobrama amblycephala (male, 2n = 48) in our lab. In addition, eight stable generations for 2nNCRC with a very clear genetic background have been founded in our lab since 2014, supplying a great value model system for studying the hybrid speciation in sh.
It is widely believed that there exist ve species in genus Carassius, including Carassius carassius, Carassius cuvieri, Carassius langsdor i, Carassius gibelio and Carassius auratus, which have been considered as valid species in Fishbase (https://www. shbase.de/). Well, whether the origin of the C. auratus is identical with that of 2nNCRC or not, and whether the C. auratus is evolved from 2nNCRC are unknown. It is very necessary to illuminate the historical evolutionary processes of C. auratus. The phylogeographic study might settle this issue because it is usually used to reveal the historical processes affecting vicariance, dispersal, extinctions and radiations that lead to the geographic distributions of genetic lineages [17,18].
The recent climatic oscillation and geological events during Pliocene-Pleistocene have played a fundamental role in shaping contemporary patterns of biodiversity and species diversi cation and distribution, which are of main interest in phylo-and biogeography [19,20]. Pleistocene glaciations are known to have exerted a far-reaching in uence on the evolution of organisms in the Northern Hemisphere [21,22]. For example, palaeontological and genogeographical studies indicate that European and North American species experienced repeated episodes of contraction and expansion of their ranges due to major climatic oscillations [23]. The Carassius species complex has a wide distribution across the Eurasian continent and neighboring islands [24,25]. According to the fossil record of Carassius species in Pliocene epoch (5.3-2.6 Mya, million years ago) discovered in north of China [26], the Quaternary palaeoenvironmental changes in East Asia and Europe would have a great in uence on the speciation and diversi cation of Carassius species. In freshwater shes, the dynamics of re-colonization are tightly linked to the history of river drainage systems [27]. During glacial melt periods, ephemeral rivers and periglacial lakes could arise, as well as the recon guration of the landscape caused by drastic climatic change and geological events may allow species to disperse into new habitats, providing opportunities for colonization and new species interactions.
These processes have resulted in complicated recolonization scenarios for the Carassius species complex in East Asia, where the haplotypes of mitochondrial control region and tf alleles were clustered into four and three major lineages respectively, and they both speculated that the Yangtze River basin was the potential origin center for Carassius species complex, and then radiated across East Asia [25,28], the highest genetic diversity in the Yangtze River basin suggesting that it should be the refuge for Carassius species during glaciations. Gao et al. (2012) deduced that the close relationship between the C. auratus complex from eastern mainland China and the south-central Ryukyus was the result of a natural Pleistocene dispersal [24]. However, the existence of two distinct lineages of C. carassius in Europe [27], was mainly due to the Danubian catchment separated with other river system by the Alps, the Sudety Mountains and the Carpathian Mountains.
According to the formation of the newborn of 2nNCRC, combined with the close phylogenetic relationship between C. carpio and Carassius species complex [29,30], we speculated that the existence of possible route through distant hybridization under natural conditions can generate C. auratus, which has happened during Pliocene-Pleistocene based on these hypotheses as follows: (1) the origin times of Cyprinus and Megalobrama are much earlier than that of Carassius [24,30,31]; (2) according to the distribution of current sh species and fossile data [26,32], as well as the divergence time estimation for species in Carassius, the speciation of C. auratus in China might be punctuated and not be derived from the processing of lineage differentiation with other species; (3) the land bridge between China and Japan during glacial cycling providing an opportunity for the expansion of C. auratus from China to Japan or other areas. Thus, establishing a robust time-calibrated phylogeny is a rst requirement for tracing the possible origin and the diversi cation patterns for Carassius species complex.
We use mitochondria DNA sequences and Bayesian inference to estimate phylogenetic relationships and lineage divergence times among the newborn of 2nNCRC and the genus Carassius, and investigate the ancestral geographic ranges and the biogeographical history of C. auratus.

Morphological traits
Twelve to Twenty samples of each generation of 2nNCRC were random selected to summarize the morphological characteristics on qualitative and quantitative traits. 2nNCRC have no barbells, 29-32 lateral scales, the rst gill rakers number ranged from 49-53, 3-4 Spines total and 15-18 Soft-rays total in Dorsal n, and 3 Spines total and 5-6 Soft-rays total in Anal n (Table S1 in Supplementary  Information). Especially, both of 2nNCRC and C. auratus have the same morphotype and formula of pharyngeal teeth, four tabular teeth on each side of the pharyngeal bone. All of that are very different from C. carpio and M. amblycephala.

Phylogenetic tree and haplotype network
The aligned and concatenated sequences (14 mitochondria genes) were 13634 bp with 5841 variable sites. Of these, 816 were parsimony-informative. The Maximum Likelihood (ML) and Bayesian tree showed a clear identi cation on the species of Carassius and most species of Cyprinidae (Fig. 1). All species from Carassius, containing the 2nNCRC formed a monophyletic clade with high support values, and three valid species of Carassius were observed ( Fig. 1, in bold). However, the 2nNCRC, together with the C. gibelio, were unable to disentangle from the C. auratus but form a single genetic cluster, and the genetic distances among 2nNCRC and C. auratus and C. gibelio were much lower than that among 2nNCRC and other species of Carassius (Table S2 in Supplementary Information). Consequently, the C. auratus and C. gibelio were named as C. auratus complex in the present study.
The Median Joining Network of cytb showed that there was no shared haplotype among the C. carassius, C. cuvieri and C. auratus complex, and the latter had an admixture distribution across the Eurasia and no phylogeographical pattern was found. However, the haplotype of C. carassius just distributed in Europe with two distinct sub-lineages, and the haplotype of C. cuvieri had a narrow distribution in Japan (Fig. 2). In addition, there existed shared haplotypes between 2nNCRC and C. auratus.

Divergence time estimation
The fossil calibrated molecular clock using the incorporated 14 genes from mitochondria and only cytb yielded similar results for the divergence between C. carassius and other four species of Carassius, indicating the C. carassius lineage diverged from other species of Carassius during the Late Miocene (about 8 Mya) (Fig. 3a, 3b). And they both dated the most recent common ancestor (TMRCA) of C. auratus during the Late Pliocene (at approximately 3.56 and 3.12 Mya, respectively). However, the split between C. cuvieri and C. langsdor i using the incorporated 14 genes was estimated at approximately 6.43 Mya, which was much earlier than that using only cytb gene (4.03 Mya).
Within C. auratus complex, C. gibelio from Russia (Far East) and Mongolia rstly diverged from other sublineages of C. auratus complex. C. auratus from Fujian in China and Vietnam were estimated to have separated at approximately 1.6 Mya, followed shortly thereafter by the divergences of C. gibelio widly distributed in Eurasia at approximately 1.49 Mya. While other separated sublineages of C. auratus complex distributed in Mainland China, Taiwan Island, Japan and Europe diverged during 1.26 Mya and 0.47 Mya, but without a distinct phylogeographic pattern (Fig. 3b).
In addition, based on the incorporated 14 genes, the Cyprinus most likely diverged from the Carassius at approximately 21.98 Mya, and the most recent common ancestor of Cyprinus was dated to 10.14 Mya (Fig. 3a), while the C. carpio and the species of Carassius seperated at approximately 11.86 Mya using only the cytb gene (Fig. 3b).

Ancestral range reconstruction
The results of the ancestral range reconstruction are depicted in Fig. 4. Among the six models of geographic-range evolution compared in a likelihood framework in Bio-GeoBEARS, the DIVA + j model was chosen according to the best likelihood and AICc associated scores (Table S3 in Supplementary  Information). According to the ancestral area reconstruction, 34 dispersal and 18 vicariance events occurred within the evolution of the studied Carassius. The TMRCA of Carassius probably diverged into two distinct clades in the Late Miocene (Fig. 4). One clade included the C. carassius mainly distributed in Europe, while the other clade further split into three separate lineages through three vicariance events, with the rst event corresponding to C. cuvieri and C. langsdor i, whose ancestor was distributed in Japan (node 43, G: 66.94%), and the second event corresponding to the C. gibelio, whose ancestor was likely to live in Siberia and Mongolia (node 39, D: 41.32%), and the third event corresponding to the C. gibelio and C. auratus, whose ancestor mainly distributed in China and Southeast Asia (node 38, F: 54.39%). Hence, the wildly distribution of C. auratus in Eurasia and North America can be explained by dispersal events.

Species distribution modeling
Maxent modeling predicted a current range similar to that known for C. auratus in East Asia with little variance ( The similar results were also found in each species of Carassius with high AUC values. For C. auratus, precipitation of warmest quarter (bio18; 74.9%), precipitation of coldest quarter (bio19; 8.8%), annual precipitation (bio8; 7.3%), and isothermality (bio3; 6.5%) were the largest contributors to the model contributing 97.5% as supported by jackkni ng. The similar bio18, bio19, bio3 and bio8 were the largest contributors for C. cuvieri and C. langsdor i to the model contributing 98.5% and 99.2%, respectively. The bio19, bio8, bio3 and mean diurnal range (bio2) were the largest contributors for C. carassius. While the bio 19, precipitation seasonality (bio15) and bio3 were the largest contributions for C. gibelio.
When the model used current conditions to predict suitable habitat for the species of Carassius during the last glacial maximum (LGM), the main area of suitable habitat with high probability for C. auratus was as follows: the Yunnan-Guizhou Plateau, the middle and lower reaches of the Yangtze River, Taiwan Island and Japan in East Asia for C. auratus ( Fig. S1b in Supplementary Information), and a considerable dispersal event between mainland China and Japan was observed. The main areas of suitable habitat for other four species of Carassius were shown in Fig. S2b-5b.
During the last interglacial period (LIG; ~120-140), suitable habitat re ected that of the present distribution, with greater levels of suitable habitat in east Asia, the southeast Asia and the central Europe for C. auratus (Fig. S1c in Supplementary Information), suggesting that C. auratus expanded into new areas after the ice sheets of the LGM receded; with great levels of suitable habitat in the whole Europe for C. carassius and C. gibelio ( Fig. S2c and S4c in Supplementary Information); the great levels of suitable habits in East Asia including China, Japan and the lower elevations of the Himalayas for C. cuvieri seem that this species had a wild distribution in East Asia during the LIG (Fig. S3c in Supplementary  Information), which was not consistent with current distribution; a very restricted distribution in the islands of the East Asia, including the Taiwan islands, Ryukyu Islands and South Japan (Fig. S5c in Supplementary Information), suggesting that the C. langsdor i distributed in Europe current was likely the introduced species, rather than a native species.

Discussion
This study represents the rst attempt to better understand the relationship of the arti cial hybrid sh species and the natural evolution sh. The 2nNCRC derived from the interspecies hybridization between C. carpio and M. amblycephala is very similar to the C. auratus in morphological and genetic characteristics with a very close phylogenetic relationship. Our phylogenetic analysis showed that the genus Carassius was a monophyletic clade in Cyprinidae and three species including C. carassius, C. cuvieri and C. langsdor i were further con rmed as valid species, however, C. auratus and C. gibelio were unable to disentangle from each other based on mtDNA. The admixture of mtDNA haplotype lineages and wildly distribution of C. auratus across the whole Eurasia seem that it has a very different evolution history. In the following, we discuss our ndings in the context of the paleoclimate oscillation occurred during Pliocene and Pleistocene.
Divergence and distribution pattern of Carassius re ect that the C. auratus complex was not likely to be the lineage differentiated from the common ancestor of C. cuvieri or C. carassius The Carassius species complex was widely distributed across Eurasia and exhibits remarkable morphological and genetic diversity [24,33], and even some lineages were endemic to particular geographical regions, such as C. gibelio was restricted to the northern Amur River systems and eastern Europe [34,35]. Both C. langsdor i and C. cuvieri occurred on the main islands of Japan [36, 37], C. carassius mainly distributed in Northwest and Central Europe [38]. According the molecular dating, the common ancestor of C. langsdor i and C. cuvieri distributed in Japan split with that of C. auratus complex during Late Miocene and Early Pliocene ( Fig. 3a and 3b), which was much earlier than that estimated by Gao et al. (2012) [24], but coincide with that estimated by Podlesnykh et al. (2012) [39] and Takada et al. (2010) [40], corresponding to the divergence of island and continental limnetic fauna and the settlement of Japanese islands by ancestral species of Carassius. The regressions of the ocean level were probably the diversifying factors that resulted in the formation of land bridges between Japanese archipelago islands and the continent, suggesting that dispersal rather than vicariance has created the current distribution pattern of Carassius in Eurasia.
It was believed that the freshwater ichthyofaunas from Japanese Islands and China resembled each other during Miocene and Pliocene as the main islands of Japan and mainland China are known to have formed a contiguous land mass in the late Pliocene [41]. Fossils indicate that in the late Pliocene, C. auratus was distributed in North of China and Japan [26,42], coinciding with the dating of TMRCA of C. auratus complex in this study. In that case, we speculate that the C. auratus complex lineage should be separated from C. cuvieri lineage or in turn, and they should have overlap distribution in East Asia at current and history. However, the C. cuvieri has a narrow distribution on the main islands of Japan at current, and the speciation of C. cuvieri was much earlier than that of C. auratus complex in the present and previous studies [24,25]. Furthermore, the C. auratus distributed in Japan at present should be dispersed from China during the glacial period and the recent human-facilitated dispersal according to the ancestral areal distribution and SDM in the present study ( Fig. 4 and Fig. S1 in Supplementary  Information). According to the divergence and current distribution pattern of Carassius was mainly caused by the dispersal with the help of the formation of land bridges between Japanese archipelago islands and the continent [43,44]. The common ancestor Carassius was likely to dispersal from Eurasia to Japan Island, rather than in reverse direction. The most recent common ancestor of C. cuvieri and C. langsdor i was distributed in Japan, while C. auratus in China (Fig. 4). We speculate that if there exist a recent common ancestor for the three species, it should be distributed in Chinese mainland, rather than in Japan Islands (Fig. 4, node 44, G: 25.97%). Hence, there are reasons to believe that the C. auratus should not be lineage diverged from the most recent common ancestor of C. cuvieri and C. langsdor i.
Similarly, the most recent common ancestor of C. carassius was distributed in Europe. However, the C. auratus distributed in Europe at present was veri ed dispersing from China or Southeast Asia. For instance, recent genetic evidence supports anthropogenic introduction of the crucian carp to the UK during the 15th century [45]. The C. carassius is native to parts of central, eastern and northern Europe and almost exclusively restricted to lentic ecosystems [38,46]. The strong geographic structure was found because of the susceptibility of C. carassius to genetic isolation and bottlenecks caused by their small population sizes and especially the low dispersal [27]. Two genetically distinct sub-lineages of C. carassius distributed in Europe was found in the present study, coinciding with the previous study that one found throughout northern and central-eastern European drainages and a second almost exclusively con ned to the Danubian catchment. Consequently, the Carpathian Mountains and the Central European Highlands was inferred to have acted as a barrier to the colonization of C. carassius into northern European drainages during the Pleistocene. According to the reconstructed ancestral areas and SDM, the C. carassius originated from Europe and was restricted to their native areas ( Fig. 4 and Fig. S2 in Supplementary Information). Consequently, very few records of C. carassius distributed in Asia were found, except West Siberia, Kazakhstan, Uzbekistan and Turkey, all of which were very close to Europe and there existed no signi cant geographical barrier with Europe. However, the Central Siberian Plateau, Mongolian Plateau, Altay Mountains, Tianshan Mountains, Pamir Mountains, Himalayas, Iran Plateau and Great Caucasus, with an elevation more than 1000 m, formed natural dispersal barriers like watersheds impeding freshwater species exchange. According to the geological event of Eurasia, the diversi cation of C. auratus from C. carassius far postdated the formation of high-elevation orogens such as the Tibetan Plateau, Mongolian Plateau and Tian-Shan, which happened during Miocene and even earlier [47], as well as the well-dated Carassius fossils have been found in North of China [26], all far postdating the formation of those mountains. Furthermore, in view of the current and paleogeographic distribution of C. carassius (Fig. S2 in Supplementary Information), the recent common ancestor of C. carassius was almost absent in East and Southeast Asia. This all suggests that the C. auratus should not be the lineage diverged from the most recent common ancestor of C. carassius.
The paleoclimate and geological event facilitated the distant hybridization Based on the above analysis, and the fact of the 2nNCRC derived from distant hybridization in our laboratory [16], our results also reinforce a possible hybrid origin route for C. auratus in nature. Rapid hybrid speciation was recently shown to occur by Lamichhaney et al. (2018) [5], and many species of cichlid shes in East Africa have been veri ed that the ancestral distant hybridization has been suggested to play a key role in facilitating species diversi cation [10][11][12]. Our nding further supported that this was very likely to have happened in Cyprinidae.
The morphological characteristics on qualitative and quantitative traits between 2nNCRC and C. auratus are very similar. Furthermore, it has been certi cated that the paternal (M. amblycephala) mitochondria DNA fragments were stably embedded in the mitochondria genomes of F 1 -F 3 generations of 2nNCRC to form a chimaera, and most of these mutation sites were similar and even consistent with the existing diploid C. auratus [48].
In the phylogenetic tree, the closest relationship between Carassius and Cyprinus was further revealed, suggesting the origin of Carassius species could be correlated with Cyprinus. The robust, time-calibrated molecular phylogeny suggested that the recent cyprinid sh fauna in East Asia probably originated at 21.98 Ma (Fig. 3a), coinciding with that was estimated from early to middle Miocene [49]. The separation of C. carpio and Carassius occurred about 11.86 Mya (Fig. 3b), close with the divergence between gold sh and common carp (10.0 Mya) [50], and the origin of the common ancestor of common carp and crucian carp (11.4-8.1 Mya) [31], all of that close to the time of hybrid speciation of allotetraploid C. carpio (12. [30]. The most recent common ancestor of C. auratus and C. gibelio was estimated approximately at 3.12 Ma, whereas there is no very signi cant differentiation between the two species in the present study, maybe correlation with the interspecies hybridization, as they are often sympatry in the same river system, and the hybridization between interspecies is not uncommon in genus Carassius [51,52]. Our results showed that the C. auratus most originated from Asia, especially in China (Fig. 4), and the speciation time seem to be related with the violent climate oscillation and geological event occurred during Pliocene and Pleistocene.
Landscape features and geological history were hypothesized to strongly affect the biogeographical processes [53,54], and this can be especially true in freshwater systems [55,56]. One major consequence of drastic climatic changes and geological activities are ecosystem and landscape recon gurations. A speci cally large event is the drastic uplift of the Tibet Plateau ultimately leading to the formation of owing eastward drainage system in China, such as Yangtze River [57,58], which was built between Oligocene and Miocene, but the uni cation of the upper and middle Yangtze River in the Three Gorges mountain region was happened between Late Pliocene and Early Pleistocene [59,60]. Followed by the Yangtze River and Han River owed into the middle of Jianghan Basin. The connection of the upper and middle Yangtze River would facilitate species dispersal from west to east, similar to what has been observed for the frog Quasipaa boulengeri [61] and the freshwater snail Bellamya aeruginosa [62, 63]. Therefore, the establishment of the east-owing Yangtze River and the changes of river systems could offer opportunities for the dispersal, gene ow and even for secondary contact and hybridization between closely related freshwater species [63], especially for sh. Furthermore, the intensi ed tectonic movement and glacial cycling in Pleistocene, as well as the prevailing monsoon had a major impact on the East Asian biota since the late Pliocene [64, 65], as a result the weather were cold and dry [66, 67], rivers and lakes experienced frequent range expansions and regressions in Jianghan Basin and Dongting Lake basin [68,69]. The regressions of lakes and rivers could dramatically reduce the habitat region of sh, and this precarious situation was further exacerbated in glacial period. Consequently, the restricted habitat region would cause hybridization between closely and even not closely related species. An example of a speci cally large event is in the LGM, the sea level would be 130-150 m lower than today, resulting in strengthened down-cutting processes of the rivers, most lakes in Plains region of Eastern China have opened and dried, which has been veri ed in Taihu Lake, Boyang Lake and Dongting Lake [70]. The cooler and drier climates intensi ed during this period, and the sh habitats in these lakes would be fragmented or even lost. The secondary contact of different species would be common when the repeated loss of habitat as the vast majority of freshwater shes were in vitro fertilization. Specially, the hybridization could be common when allopatric species come into secondary contact [71], hence creating opportunities for new speciation and diversi cation. The distant hybridization between C. carpio and M. amblycephala, with the similar ecological niches, most likely occurred during the late Pliocene. A typical example is the repeated range expansions and regressions of lakes likely contributed to the high diversity of African cichlids [72,73], since East Africa underwent dramatic climatic and geological changes in Pleistocene over the past few million years, the constant expansion and regression of the great East African lakes have led to the repeated loss of habitats or formation of new habitats [63,73], such as the hybridization might have facilitated these speciation bursts for the cichlids in Lake Tanganyika, and the Time-calibrated trees supported that the radiation of Tanganyika cichlids coincided with lake formation [11].
However, during the interglacial period with the warmer and wetter conditions, the C. auratus adapted to their environment thrive and leave more offspring and ultimately would have dispersed northward to Heilongjiang, eastwards to Zhejiang and Fujian, and south-westward to Yunnan and Vietnam. It was believed that C. auratus complex had a Pleistocene fast radiation [24], coinciding with the diversi cation of C. auratus lineage in present study (Fig. 3a, 3b). The divergence time among several sublineages of C. auratus complex was ahead of 0.34 Mya and posterior to 0.47 Mya (Fig. 3b) [74], while the divergence among most of sublineages were ahead of 0.86 Mya, which was earlier than the start of Wangkun Glaciation (0.70-0.50 Mya) in the Middle PIeistocene dated by ESR and paleo-magnetism, widely accepted as the earliest glaciation in China [75]. Consequently, the dispersal rather than vicariance was most responsible for the phylogenetic pattern of C. auratus in China.

Conclusions
The newborn homodiploid crucian carp-like sh derived from the hybridization of C. carpio and M. amblycephala was exactly the same as the diploid C. auratus in nature. The molecular phylogenetic analyses revealed an intraspecies relationship between the 2nNCRC and the diploid C. auratus. According to the reconstruction of ancestral areas and estimated divergence time, the C. auratus lineage most likely originated from the East Asia during the later Pliocene when the hydrological changes and connectivity happened in China, as well as the repeated expansion and regression of lakes facilitating the distant hybridization between the species of Cyprinidae, and especially the several glacial-interglacial cycles contributed to the dispersal and lineage differentiation. Considering the paleoclimate oscillation and geological event during Pliocene and Pleistocene, we speculated that the existence of possible route through distant hybridization under natural conditions generated the C. auratus due to the unlikeliness that the C. auratus lineage diverged from the most recent common ancestor of C. carassius or C. cuvieri. However, it's only one speculate that the C. auratus would have the similar hybrid origin with the newborn crucian carp-like sh, further studies should be executed to study the population genetics and biogeography on the C. auratus, C. carpio and M. amblycephala, in order to verify our hypotheses or negate it.

Material And Methods
Samples and sequence data preparation All of the 2nNCRC samples, including F 1 to F 5 generations used in this study were cultured in ponds at the Protection Station of Polyploidy Fish, Hunan Normal University, and fed arti cial feed. The F 1 generation of 2nNCRC is denoted as 2nNCRC-F 1 , the F 2 generation of 2nNCRC was the F 1 self-cross offspring and denoted as 2nNCRC-F 2 , and so on the 2nNCRC-F 3 , 2nNCRC-F 4 and 2nNCRC-F 5 . One specimens of each generation (F 1 to F 3 ) were collected and a total of three mitochondrial genomes of 2nNCRC were sequenced to construct a phylogeny tree based on two both rRNAs and twelve protein-coding genes (ND6 was excluded) (the methods for DNA extraction, ampli cation, and sequencing are given in additional le 1 in Supplementary Information). To perform a comprehensive phylogenetic analysis, another 85 specimens of representative Cyprinidae sh (including 31 specimens of Carassius, 19 of Cyprinus, 14 of Megalobrama and 24 of other cyprinid sh) and 10 Catostomidae sh complete mitochondrial genomes (both rRNAs and twelve protein-coding genes) were retrieved from the Genbank (see additional le 2 in Supplementary Information). In addition, 837 cytb genes of Carassius were also acquired from the Genbank (see additional le 3 in Supplementary Information) to obtain the phylogeographic structure of Carassius across the Eurasia.
Previous study showed that the morphometric of crucian carp-like sh (2nNCRC) were very similar with that of C. auratus on the lateral scales and the conventional measures [16]. In this study, the morphological characteristics on qualitative and quantitative traits were further compared among the species of Carassius and 2nNCRC, as well as its parents (see Table S1 in Supplementary Information). Furthermore, Pairwise genetic distance of inter-and intraspeci c variation in Carassius genus (see Table  S2 in Supplementary Information) were calculated under the K2P model for the cytb dataset using MEGA 7 [76].

Phylogenetic analysis
All sequences were aligned with MAFFT [77] as implemented in Geneious 4.8.3. For mitochondrial genomes, all alignments (twelve proteins and two rRNAs) were combined. We tested the 14 genes data for saturation using DAMBE v. 6.4.41 [78]. The test revealed an ISS-value that was signi cantly lower than the ISS.c in all cases (P < 0.0000), indicating the suitability of the fourteen genes for phylogenetic analysis. The homogeneity of the fourteen genes was tested in *PAUP [79], and the P value was 0.69 (> 0.05). To test our hypothesis that the 2nNCRC is precisely the C. auratus, the ML and Bayesian trees incorporating the mitochondrial genomes (14 genes data) were generated using RAxML v. 8.0 [80] with 1000 bootstrap replicates and MrBayes v. 3.2 [81], respectively. In the phylogenetic tree analysis, 98 sequecnes (see additional le 2 in Supplementary Information) were used, and the species of Catostomidae were used as outgroup. We determined the best-tting substitution models for each gene fragment using Moldetest 3.7 [82], which were shown in Table S3  To further visualize haplotype diversity and distribution of Carassius, we generated two haplotype network using 837 cytb genes, and colored each haplotype by the geographic region from where it was collected. The haplotype networks were constructed using network v. 5.0 (www. uxusengineering.com/sharenet.htm) and applying the median-joining and Maximum Parsimony options.

Divergence time estimation
To test our hypothesis that the divergence times of Cyprinus and Megalobrama are much earlier than that of Carassius. The Divergence time was estimated using a molecular clock approach as implemented in BEAST. We used the combined data in the phylogenetic tree and employed a (uncorrelated lognormal) relaxed clock, as a likelihood-ratio test (LRT) [85] rejected the strict molecular clock hypothesis for the data (P < 0.01), with three calibration points. We used the following calibration points with normal distribution priors: the oldest fossil of Plesiomyxocyprinus arratiae similar to Myxocyprinus asiaticus was constrained to from the middle Eocene or earlier, approximately 40-38 Mya [86], the minimum age of Catostomidae is 60 Ma based on a catostomid fossil from the Paleocene [87], and the timing of the drastic uplift of the Tibet Plateau between , which was utilized as one calibration point for the Schizothoracinae as the species endemic to Qinghai-Tibet Plateau [89][90][91]. The clade corresponding to each calibration point was constrained to be monophyletic. The GTR + I + G model was the best t for the combined dataset (Table S3 in Supplementary Information). The 'speciation: Yule process' tree prior was used to construct the tree. We ran four independent runs for 20 million generations logging trees every 2,000 generations. Convergence was checked with TRACER v. 1.7.1 [83]. Posterior trees from the four runs were combined after removing the rst 10% as burn-in in LogCombiner v. 2.5.2 (http://beast.bio.ed.ac.uk/logcombiner). The maximum credibility tree was created in TreeAnnotator v.

available in the BEAST package.
Furthermore, another divergence time was estimated using the cytb data. We used a reduced dataset of cytb: specimens for which two or more sequences from the same region were included. This resulted in 250 terminals -237 representatives of the ve species of Carassius and 13 outgroup sequences (C. carpio) from GenBank (see additional le 4 in Supplementary Information). We used a conservative approach by employing calibration points from previous studies [27]. The most recent common ancestor (TMRCA) of C. carassius in the northern and central-eastern European drainages and the Danubian catchment was constrained to 2.18-2.15 Mya. Another calibration point was the fossil of Carassius in Pliocene epoch (5.3-2.6 Mya) in north of China (Yushe, Shanxi province), The GTR + I + G model was also the best t for the cytb dataset (Table S3 in Supplementary Information). The 'speciation: Yule process' tree prior was used to construct the tree. We ran four independent runs for 50 million generations logging trees every 5,000 generations. The other settings are same with the above.

Reconstruction of ancestral areas
To test our hypothesis that whether the C. auratus was derived from the processing of lineage differentiation with other species in Carassius or not, we further performed a biogeographic reconstruction of ancestral areas for species of Carassius using BioGeoBEARS [92] in RASP 4.02 [93]. The analyses were conducted on a fully resolved topology from the BEAST analysis containing ve species of Carassius, eight major geographical areas were de ned based on the worldwide distribution of and BAYAREALIKE + j). The best-t model was assessed on comparing the Akaike's information criterion and likelihood-ratio tests and the DIVALIKE + j was choosed (Table S4 in Supplementary Information).
Ancestral distributions were reconstructed using 24 sequences in Carassius (see additional le 5 in Supplementary Information). To account for phylogenetic uncertainty, 4000 post-burn-in trees resulting from the BEAST analysis were integrated for inference. The maximum number of ancestral areas was set to four, as C. auratus and C. gibelio can be widespread.

Species distribution and paleodistribution modeling
We used species distribution modeling (SDM) to construct a model of current, Last Glacial Maximum (LGM, about 22,000 years ago), and Last Interglacial (LIG; ~120,000-140,000 years before present) Carassius distributions. The occurrences records for nearly all Eurasia species in the genus Carassius (including C. carassius, C. cuvieri, C. langsdor i, C. gibelio and C. auratus) were collected from the Global Biodiversity Information Facility (GBIF, http://data.gbif.org/), Fishbase (https://www. shbase.de/), the Fish database of Taiwan (http:// shdb.sinica.edu.tw/) and complementary literature (sources summarized in additional le 6 in Supplementary Information). Data records were inspected and occurrences outside of Eurasia, without geo-referencing, were excluded from the analyses. Furthermore, we performed a careful quality check and meticulously scrutinized all records for any geographic or taxonomic issues. Geographic duplicate and very adjacent records, as well as records from introduced species or those pre-dating 1970 were removed, as the current bioclimatic dataset ranges over a 30-year period (1970-2000), and a total of 1,499 unique, geo-referenced and quality-checked occurrence records were nally retained across Eurasia for species in the genus Carassius.
We extracted current bioclimatic data from the WORLDCLIM dataset (v 2.1, http://www.worldclim.org/) at 30 arc-seconds resolution, LGM bioclimatic data from the Model for Interdisciplinary Research on Climate (MIROC) dataset at 2.5-min resolution [94], and LIG bioclimatic data from Otto-Bliesner et al. (2006) [95] at 30 arc-seconds resolution. There exist nineteen bioclimatic variables are included in the WORLDCLIM current and LIG [95] and LGM [96] datasets. The ArcGIS 10.0 (ESRI: Redlands, CA) was used to extract climatic variable layers to include only Eurasia to improve predictive power of Maxent models [97]. Prior to constructing SDM, the ENMTools (v 1.3) [98] was used to determine which bioclimatic variables were correlated, using R > 0.90 as a cutoff, and twelve variables which were found to be correlated with at least one other variable were removed.
We used the Maxent (v 3.4.0) [99] to model current and past Carassius distribution, and used the following settings for the Maxent model: hinge features only, regularization multiplier of 1, 10,000 max number of background points, replicate run type of 10 cross-validations, 500 maximum iterations, and 0.00001 convergence threshold. We ran jackknife tests to measure the importance of each bioclimatic variable. Models used 700, 49, 56, 489 and 205 (total 1499) records for testing and six BIOCLIM environmental layers (bio2, 3,8,15,18,19) to produce models for present and paleodistributions of C. carassius, C. cuvieri, C. langsdor i, C. gibelio and C. auratus, respectively.   Ancestral range reconstruction for the Carassius. The colors of the charts correspond to the most likely ancestral areas inferred, and the black color means the unknown area. Letters represent the biogeographic regions same with that in Fig. 2. Note: The designations employed and the presentation of the material on this map do not imply the expression of any opinion whatsoever on the part of Research Square concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries. This map has been provided by the authors.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.