Evolution of the Yangtze River reconstructed by the largest molecular phylogeny of Cyprinidae

The Yangtze River is the largest river in Asia, but its evolutionary history has long been debated. Diverse groups of endemic freshwater shes have evolved in this river. Here we represent the historical spatiotemporal pattern of the endemic East Asian cyprinid clade based on the largest molecular phylogeny of Cyprinidae, including 1420 species and fossil records. Based on the evolution of egg types adapting to different hydrological conditions, we show that the ancestors of this endemic clade (laying adhesive eggs) were distributed in southern East Asia before ~24 Ma and subsequently dispersed to the Yangtze River basin to spawn semi-buoyant eggs at ~19 Ma. These results indicate that the Yangtze River reversed its ow direction from southward to eastward to form the present river system within the Oligocene-Miocene boundary (~24-19 Ma). Some East Asian cyprinids evolved into shes producing adhesive eggs again at ~13 Ma, together with an increased net diversication rate, indicating that the river began to form a potamo-lacustrine system during the Mid-Miocene. The new sketch of the formation history of the Yangtze River system through Cyprinidae phylogeny, together with the evolution of egg types in endemic East Asia cyprinids, has better spatiotemporal integrity than traditional geological studies.

history of the Yangtze River system through Cyprinidae phylogeny, together with the evolution of egg types in endemic East Asia cyprinids, has better spatiotemporal integrity than traditional geological studies.

Main Text
As the Qinghai-Tibet Plateau uplifted, the Asian climate was mainly controlled by the transition from the planetary wind system in the Paleogene to the South Asian monsoon and East Asian monsoon during the Neogene (1)(2)(3), which led to the evolution of drainage systems in South and East Asia (4)(5)(6). The Yangtze River is the largest river in Asia, and its origin and evolution have attracted wide attention. The upper reaches of the palaeo-Yangtze River (palaeo-Jinshajiang) owed southward, then diverted eastward in response to tectonic movement and the monsoon climate, and nally incised through the Three Gorges to form the modern Yangtze River (4,5,7,8). However, an exact understanding of the spatiotemporal evolution of the Yangtze River system has been suggested to be challenging and controversial (8). Previous geological surveys (Supplementary Tables 1 and 2) showed that the initial date when the palaeo-Jinshajiang began owing southward ranged from the late Eocene (9) to the early Pleistocene (10), and the initial date when the Yangtze River diverted eastward with incision of the Three Gorges ranged from the Eocene 40-45 Ma (11) to as recent as the Pleistocene a few hundred thousand years ago (12), based on different sampling locations, dating methods and proxies. However, no con dent biological evidence is available.
The Yangtze River is one of the basins with the most diverse and endemic species of freshwater sh in the world (13,14). The historical changes in drainage basins governed the diversi cation and dispersion of freshwater shes (15), and in turn, reconstruction of the phylogeographic dynamics of freshwater shes could re ect the spatiotemporal evolution of the river system (16,17). It is known that an endemic clade of East Asian Cyprinidae evolved while adapting to unique climatic and hydrological conditions under the in uence of a strong East Asian monsoon during the uplift of the Qinghai-Tibet Plateau (18). In this endemic clade, some shes produce demersal eggs, while others produce adhesive or semi-buoyant eggs (Fig. 2a). Semi-buoyant eggs are considered a key trait used by East Asian cyprinids to adapt to monsoon and large river environments (19,20). The development of the production of semi-buoyant eggs, e.g., by the four major Chinese carp in the Yangtze River, required a long riverway (>500 km) with a fast ow (>0.5 ms -1 ) (21,22). Given the effects of the Ice Age, it is likely that the riverway exceeded 1000 km for the normal development of these semi-buoyant eggs (23). The earliest fossils of Ctenopharyngodon and Hypophthalmichthys were found in the Sihong Basin in the Yangtze River basin (24) (Fig. 2c). As these two carp spawn semi-buoyant eggs, the appearance of their ancestor can be used as an indicator of the formation of the modern Yangtze River. A few studies have focused on the evolution, radiation and key traits of endemic East Asian cyprinids that have adapted to the East Asian monsoon and large rivers (19,20), yet no attention has been given to the historical distribution and dispersal of cyprinids across East Asia or the possible relation between the spatiotemporal development of the Yangtze River and the evolution of egg types of endemic East Asian cyprinids.
We reconstructed the phylogeny of Cyprinidae based on the largest molecular data set currently available for cytochrome b genes from 1,420 Cyprinidae species belonging to nine subfamilies and 284 genera (Supplementary Data). The topologies of phylogenetic trees obtained by Maximum likelihood and Bayesian inference analyses were almost consistent ( Fig. 1), and these trees were also similar to those constructed in previous studies (25)(26)(27)(28). By using fourteen calibration points including sixteen fossil records (Supplementary Text), we estimated the divergence times for Cyprinidae ( Supplementary Fig. 3).  (Fig. 2a). Combined with the fossils of Ecocarpia ningmingensis in the Ningming Basin, Guangxi Province (29), and the main distribution of extant species of these cyprinids, we inferred that the ancestors of these cyprinids were distributed in the south of East Asia from the palaeo-Pearl River to the palaeo-Red River before ~24 Ma (Fig. 2b), indicating that East Asian drainage systems were mainly centralized in the south at that time. This nding also suggests that the Yangtze River had not yet been formed and that the palaeo-Jinshajiang likely owed towards the south.
The cyprinid fossils Nanningocyprinus wui and Huashancyprinus robustispinus found in the Oligocene Formation of the Nanning and Ningming Basins are also in agreement with our results (30). Other biological and geological evidence also suggests that the palaeo-Jinshajiang once owed southward and probably owed through the palaeo-Red River into the South China Sea (Supplementary Table 1). The age of the river incision located at Laojunshan near the rst bend of the Yangtze River from 20 to 30 Ma was determined by bedrock apatite (U-Th-Sm)/He thermochronology (31), and this age was close to the dating of the palaeo-Jinshajiang owing southward speculated in this study. Recently, studies using 40 Ar/ 39 Ar dating and zircon U-Pb dating methods and statistical analysis suggested that a major Paleogene river probably originated in the southeastern Qinghai-Tibet Plateau and owed through the Jianchuan Basin, extending to northern Vietnam during the late Eocene-Oligocene period, but disappearing by the early Miocene (32,33). Biological evidence from a time-calibrated phylogeny of only one sh genus (34)(35)(36) showed that the dating of the south-owing palaeo-Jinshajiang is younger than that predicted in this study by using the endemic East Asian Cyprinidae.
The time-calibrated phylogeny of Cyprinidae ( Supplementary Fig. 3) reveals that the schizothoracine sh endemic to the Qinghai-Tibet Plateau did not appear prior to ~20 Ma ( Supplementary Fig. 7). Combined with palaeontological evidence (37), we infer that the palaeoelevation of the central Qinghai-Tibet Plateau was probably not above 2000 m a.s.l. in the Oligocene. In this period, as the southeastern part of the plateau was not uplifted signi cantly, the reorganization of rivers in this region did not occur. At the same time, southern East Asia was in a humid belt, while a broad arid belt stretched across central East Asia from west to east (1,38). These results indicate that the Yangtze River system had not yet been connected and owed eastward. Our study provides new biological dating for the southern ow of the palaeo-Jinshajiang in the Oligocene.
The formation of the Yangtze River at the Oligocene-Miocene boundary Fishes with semi-buoyant eggs consisting of squaliobarbins and hypophthalmichthyins occurred in the early Miocene (~19 Ma; 95% CI: 17.1-21.3 Ma) (Fig. 2a). The earliest fossils of Hypophthalmichthys, Ctenopharyngodon and Elopichthys were found from the early Miocene formation in the Sihong Basin, Jiangsu Province (24). These results indicate that the endemic East Asian cyprinids dispersed to the position of the current Yangtze River and evolved into shes laying semi-buoyant eggs at approximately 19 Ma (Fig. 2c), suggesting that the Yangtze River reversed its ow direction eastward and formed the present drainage system at the Oligocene-Miocene boundary (~24-19 Ma). The chemical weathering index mineralogical ratio chlorite/(chlorite + haematite + goethite) (C RAT ) from Ocean Drilling Program (ODP) Site 1148 in the South China Sea (39) rapidly rose to a peak at approximately 19 Ma (Fig. 3b), indicating that the climate in East Asia became humid at that time, and abundant rainfall was conducive to the formation of the Yangtze River system. In geological studies, the age of the formation of the present Yangtze River system was constrained to 23-36.5 Ma based on 40 Ar/ 39 Ar dating of basalts and U-Pb dating of zircon sand grains from the lower reaches of the Yangtze River and the appearance of evaporites and lacustrine sedimentation in the Jianghan Basin (7). By using LA-ICP-MS zircon U-Pb geochronology and heavy mineral analysis from the Cenozoic sedimentary provenance of the Jianghan Basin, the age of the incision of the Three Gorges was postdated to 32 Ma, and the best dating was represented by the ~24.6 Ma unconformity (40). These results are close to the date of Yangtze River formation estimated by the timing of the divergence of the semi-buoyant egg group in our study.
The formation of the current Yangtze River system has hindered gene ow of some terrestrial species between the north and south sides, resulting in genetic diversi cation and speciation. The divergence dating of the primitively segmented spider genera Sinothela and Ganthela, which are distributed on the north and south sides of the Yangtze River, was estimated to be 13-30 Ma, which supported the suggestion that the Yangtze River system was formed before the Miocene (41). This divergence timing has a much broader range than we inferred, probably due to fewer species and the lack of fossil calibrations.
In addition, based on the results of the time-calibrated phylogeny of schizothoracine shes ( Supplementary Fig. 7), the timing of the divergence between primitive and specialized grades was likely at approximately 18 Ma, indicating that the Qinghai-Tibet Plateau reached an elevation of approximately 3000 m a.s.l. in the early Miocene. The southeastern margin of the plateau was uplifted rapidly from the late Oligocene to the early Miocene, resulting in the reversal or capture of river systems in response to large-scale strike-slip faults (4,7). Therefore, the Yangtze River diverted its ow direction from southward to eastward and incised through the Three Gorges to form the modern river system at that time.

Formation of the potamo-lacustrine ecosystem in the Yangtze River in the middle Miocene
The formation of the potamo-lacustrine complex ecosystem greatly promoted sh diversi cation (42).
Previous studies have mainly focused on the formation mechanisms and ages of several present lakes in the Yangtze River basin (43,44), while the earliest formation of the potamo-lacustrine ecosystem remains unclear.
Based on the result of the rate-through-time plot from the Bayesian Analysis of Macroevolutionary Mixtures program (Fig. 3a), the net diversi cation rate of the endemic East Asian cyprinids increased quickly at approximately 13 Ma. This result indicates that the drainage network in the Yangtze River basin was rich, which provided a large number of niches, facilitating rapid radiation and dispersal of shes. In the middle Miocene (~13 Ma; 95% CI: 11.4-14.6 Ma), the sh laying adhesive eggs arose again, including xenocyprins and cultrins (Fig. 2a). This nding indicates that to adapt to the lake environment, endemic East Asian cyprinids evolved into shes spawning adhesive eggs that attached to aquatic plants to develop. These results suggest that the potamo-lacustrine ecosystem of the Yangtze River began to appear at that moment (Fig. 2d). Coincidentally, the chemical weathering index C RAT of ODP Site 1148 in the South China Sea peaked at ~13 Ma (Fig. 3b) (39), which indicates that the climate in East Asia was more humid at that time, and a large amount of precipitation resulted in the formation of a potamolacustrine ecosystem in the Yangtze River, greatly increasing species diversi cation. Hence, understanding the formation history of the potamo-lacustrine ecosystem is important for protecting the Yangtze River ecosystem and its biodiversity.
In summary, we used the spatiotemporal evolutionary pattern of endemic East Asian cyprinids from the largest molecular phylogenetic tree of Cyprinidae, fossil records and information on egg type evolution while adapting to varied hydrologic conditions to reconstruct the formation history of the Yangtze River system. Our results indicate that the ancestors of East Asian cyprinids were distributed in the south of East Asia from the palaeo-Pearl River to the palaeo-Red River during the Oligocene, when the Yangtze River system had not yet been formed, implying that the palaeo-Jinshajiang owed southward at this time. The endemic East Asian cyprinids dispersed to the Yangtze River basin and evolved into shes laying semi-buoyant eggs at approximately 19 Ma, which suggests that the Yangtze River system formed in response to large strike-slip tectonism and climate change, constraining the formation age of the Yangtze River at the Oligocene-Miocene boundary (~24-19 Ma). Notably, the endemic East Asian cyprinids evolved into shes spawning adhesive eggs again at approximately 13 Ma, coinciding with a rapid increase in the net diversi cation rate of this endemic clade and a peak in the chemical weathering index C RAT in the South China Sea (39), indicating that the Yangtze River system probably developed to form a potamo-lacustrine ecosystem in the middle Miocene. Our studies constrain the ages of the important geological events during the evolution of the Yangtze River from a biological perspective, helping us to understand the evolutionary history of the Yangtze River system.

Methods
Data collection and processing. Information on the scienti c names, taxonomic position, distribution and egg type of Cyprinidae was obtained from FishBase (www. shbase.org), Catalog of Fishes (http://researcharchive.calacademy.org/research/ichthyology/catalog/ shcatmain.asp) and additional dedicated publications (Supplementary Data). A total of 1423 cytochrome b sequences from 284 genera of Cyprinidae and three outgroup taxa were collected by GenBank (accession numbers of all the sequences are listed in Supplementary Data). While sequences were aligned using MAFFT version 7, ambiguous regions in alignments were removed using Gblocks v.0.91 (45).

Phylogenetic analyses. The phylogenetic analyses were conducted with Maximum likelihood (ML) and
Bayesian inference (BI) in RAxML v. 8.2.12 (46) and MrBayes v.3.2 (47), respectively. The ML analyses were implemented under a GTRGAMMA model with 100 rapid bootstrap inferences using a thorough ML search. For BI analyses, the best-tting nucleotide substitution model GTR+F+I+G4 was calculated with ModelFinder in PhyloSuite (48). Two independent runs were performed through 20,000,000 generations with four Markov chains. The rst 25% of trees were removed as burn-in. Chain convergence was inspected in Tracer 1.5 (http://tree.bio.ed.ac.uk/software/tracer/), and the results with an effective sample size (ESS) for each parameter >200 were accepted. A consensus tree was produced.
Divergence time estimation. Based on the optimal ML tree topology obtained with RAxML v 8.2.12 (46), penalized likelihood dating analysis was conducted in treePL (49) to estimate divergence time. We sorted the available fossils from the literature and selected sixteen fossils as fourteen calibration points (Supplementary Text). To identify the appropriate level of rate heterogeneity in the phylogram, crossvalidation analysis was conducted in treePL while testing 37 smoothing parameter values from 1018 to 10-18. To calculate the con dence intervals for the dating estimates of each node, 100 bootstrap replicates were generated by RAxML, with topology xed to the best ML phylogram but with varying branch lengths. We then conducted treePL on these 100 replicates. Age statistics of all nodes were summarized with TreeAnnotator v. 1.10.4 (50).
Rate-through-time analysis. The net diversi cation rate of cyprinids was inferred based on the chronogram of Cyprinidae using the program Bayesian Analysis of Macroevolutionary Mixtures (BAMM) v2.5 (51). The BAMM analysis was run for 100 million generations at a Poisson rate prior to 0.1, sampling event data every 10,000 generations. Prior distributions were set based on the setBAMMPriors function in the BAMMtools R package (52). The rst 20% of samples were discarded as burn-in. The ratethrough-time plot of the endemic East Asian cyprinids was extracted and visualized by using BAMMtools from the Cyprinidae data set.
Ancestral state reconstruction. The egg types of cyprinids from 507 species were coded as A (adhesive), B (bivalve), C (demersal), D (nesting), and E (semi-buoyant) (Supplementary Data). The ancestral state of egg types was reconstructed using BI. BI analysis was implemented in RASP v.4.0 (53) with Bayesian Binary Markov Chain Monte Carlo (BBM). Ten MCMC chains were run simultaneously for 5,000,000 generations with the JC (Jukes-Cantor) xed model, and the maximum number of areas was set to 1.

Declarations
Online content Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available in the online version of the paper.

Data availability
We have chosen not to deposit the data at this time but declare that data supporting the ndings of this study are available within this article and its Supplementary Information, and all additional data are available from the corresponding author on request.
Code availability All additional computer codes are available from the corresponding author on request.   Net diversi cation rate and chemical weathering index CRAT as a function of time. a, The rate-throughtime plot of the endemic East Asian cyprinids (see details in Supplementary Fig. 4). b, The chemical weathering index CRAT of ODP Site 1148 in the South China Sea as a function of time (modi ed from Clift et al. (39)). The yellow line indicates the appearance of the semi-buoyant egg group at ~19 Ma. The purple line represents that the shes with adhesive eggs appeared again, and the net diversi cation rate of the endemic East Asian cyprinids increased rapidly at ~13 Ma.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. SupplementaryInformation.pdf