Origin and evolution of the Yangtze River reconstructed from the largest molecular phylogeny of Cyprinidae


 The Yangtze River is the longest river in Asia, but its evolutionary history has long been debated. So far no robust biological evidences can be found to crack this mystery. Here we reconstruct spatiotemporal and diversification dynamics of endemic East Asian cyprinids based on a largest molecular phylogeny of Cyprinidae, including 1420 species, and show that their ancestors laying adhesive eggs were distributed in southern East Asia before ~24 Ma, subsequently dispersed to the Yangtze River to spawn semi-buoyant eggs at ~19 Ma. This indicates that the Yangtze River diverted eastward around the Oligocene-Miocene boundary. Some of these cyprinids evolved again into fishes producing adhesive eggs at ~13 Ma, together with a peaked net diversification rate, indicating that the river formed a potamo-lacustrine ecosystem during the Mid-Miocene. Our reconstruction of the history of the Yangtze River has higher time resolution and much better continuity than those deriving from geological studies.


Introduction
The Asian climate evolved from a planetary wind system in the Paleogene to a monsoon setting during the Neogene (1-3), linked in part to the uplift of the Qinghai-Tibetan Plateau (4,5). This same process led to the evolution of drainage systems in South and East Asia (6-8). The Yangtze River is the longest river in Asia, and its origin and evolution have attracted wide attention. It has been suggested that the upper reaches of the palaeo-Yangtze River (palaeo-Jinshajiang) originally owed south from its source towards the South China Sea, but at some point diverted eastward in response to tectonic movements, and nally incised through the Three Gorges to form the modern Yangtze River (6, 7,9,10). However, an exact understanding of the spatiotemporal evolution of the Yangtze River system has been challenging (10).
Earlier geological surveys (Supplementary Tables S1 and S2) argued for the initial date when the palaeo-Jinshajiang diverted eastward at the First Bend to range from the late Eocene (11) to the early Pleistocene (12). Likewise, ages for the incision of the Three Gorges range from the Eocene 40-45 Ma (13) to as recent as the Pleistocene, a few hundred thousand years ago (14), based on different sampling locations, dating methods and proxies. However, no robust biological evidence is available for cracking this mystery.
Compared to other river basins the Yangtze is one of the most diverse in terms of endemic species of freshwater sh worldwide (15,16). The historical changes in drainage basins governed the diversi cation and dispersion of freshwater shes (17), and in turn, reconstruction of the phylogeographic dynamics of freshwater shes could help constrain the spatiotemporal evolution of the river system (18,19). It is known that an endemic clade of East Asian Cyprinidae evolved while adapting to unique climatic and hydrological conditions under the in uence of a strong East Asian monsoon during the uplift of the Qinghai-Tibetan Plateau (20). In this endemic clade, some shes produce demersal eggs, while others produce adhesive or semi-buoyant eggs (Fig. 2a). Semi-buoyant eggs are considered to be a key trait used by East Asian cyprinids to adapt to monsoonal and large river environments (21,22). The development of semi-buoyant egg production, e.g., by the four major Chinese carp in the Yangtze River, required a long riverway (>500 km) with a fast ow (>0.5 ms -1 ) (23,24). The earliest fossils of Ctenopharyngodon and Hypophthalmichthys were found in the Lower Miocene of the Sihong Basin in the Yangtze River basin (25) (Fig. 2c). As these two carp spawn semi-buoyant eggs, the appearance of their ancestor can be used as a constraint on the formation of the modern Yangtze River. A few studies have focused on the evolution, radiation and key traits of endemic East Asian cyprinids that have adapted to the East Asian monsoon and large rivers (21,22), yet no attention has been given to the historical distribution and dispersal of cyprinids across East Asia, or the possible relationship between the spatiotemporal development of the Yangtze River and the evolution of egg types of endemic East Asian cyprinids.
We reconstructed the phylogeny of Cyprinidae based on the largest molecular data set currently available for cytochrome b genes from 1,420 Cyprinidae species including representatives of all nine subfamilies and 284 of the 367 genera (77%). The topologies of phylogenetic trees obtained by Maximum likelihood and Bayesian inference analyses were almost consistent, and these trees are also similar to those reconstructed by previous studies (26-29). By using fourteen calibration points including sixteen fossil records, we estimated the divergence times for Cyprinidae and the diversi cation dynamics of the endemic East Asian cyprinids. Subsequently, we obtained the time-calibrated phylogeny of endemic East Asian cyprinids with different egg types from the ancestral egg type reconstruction of Cyprinidae. Based on the fossil records and the main distribution of the extant endemic East Asian cyprinids, the ancestral distribution of this endemic clade was inferred. Combined with the evolution of different egg types and the historical distribution and dispersion of the endemic East Asian cyprinids, we estimated the age at which the southward-owing palaeo-Jinshajiang rst connected with the middle reaches via the First Bend and formed the Yangtze River system, including the potamo-lacustrine system in the middle and lower reaches of the Yangtze River.

Results And Discussion
The palaeo-Jinshajiang owing southward in the Oligocene Our results show that the ancestors of the endemic East Asian cyprinid clade, which produces adhesive eggs, appeared in the late Oligocene (~24 Ma; 95% credibility interval (CI): 22.3-26.7 Ma), including metzins, aphyocyprins and opsariichthyins (Fig. 2a). Combined with the fossils of Ecocarpia ningmingensis in the Ningming Basin, Guangxi Province (30), and the primary distribution of extant species of these cyprinids, we inferred that their ancestors were distributed in southern East Asia largely within the palaeo-Pearl River and palaeo-Red River before ~24 Ma (Fig. 2b). This implies that the modern Yangtze River had not yet been formed and that the palaeo-Jinshajiang likely owed towards the south, connecting with a stream similar to the modern Red River.
The cyprinid fossils Nanningocyprinus wui and Huashancyprinus robustispinus found in Oligocene formations of the Nanning and Ningming Basins are also consistent with our results (31). Other biological and geological evidence suggests that the palaeo-Jinshajiang once owed southward and probably connected through the palaeo-Red River into the South China Sea (Supplementary Table S1).
The age of the river capture at Laojunshan near the First Bend of the Yangtze River occurred prior to 24 Ma, maybe as early as the Early Oligocene as suggested by the change in sediment composition in the Red River delta, (32). Incision of the gorge in the First Bend area started between 20 and 30 Ma based on bedrock apatite (U-Th-Sm)/He thermochronology (33). Recently, studies using 40 Ar/ 39 Ar mica dating and zircon U-Pb dating methods coupled with statistical analysis suggest that a major Paleogene river probably originated in the southeastern Qinghai-Tibetan Plateau and owed through the Jianchuan Basin, extending to northern Vietnam during the late Eocene-Oligocene period, but disappearing by the early Miocene (34,35). Biological evidence from a time-calibrated phylogeny of only one sh genus (36-38) showed that the dating of the south-owing palaeo-Jinshajiang is younger than that predicted in this study by using the endemic East Asian Cyprinidae.
Schizothoracine shes commonly live on the Qinghai-Tibetan Plateau and surrounding area at an elevation of 1250-4750 m a.s.l. (39,40). In this study, the time-calibrated phylogeny of Cyprinidae (Supplementary Fig. S3) reveals that the schizothoracine sh endemic to the Qinghai-Tibetan Plateau did not appear before ~20 Ma ( Supplementary Fig. S6). Combined with palaeontological evidence (41), we infer that the palaeo-elevation of the central Qinghai-Tibetan Plateau was fairly low in the Oligocene, may not above 2000 m a.s.l. During the Eocene, reorganization of rivers did not occur because the southeastern part of the plateau was not uplifted signi cantly until the Oligocene (34). At the same time, southern East Asia was in a humid belt, while a broad arid belt stretched across central East Asia from west to east (1,42). These results indicate that the middle and lower reaches of the Yangtze River system had not yet been connected to the Jinshajiang. Our study provides new biological dating for the southern ow of the palaeo-Jinshajiang in the Oligocene.
The formation of the Yangtze River close to the Oligocene-Miocene boundary Fishes with semi-buoyant eggs consisting of squaliobarbins and hypophthalmichthyins existed in the Yangtze during early Miocene (~19 Ma; 95% CI: 17.1-21.3 Ma) (Fig. 2a). The earliest fossils of Hypophthalmichthys, Ctenopharyngodon and Elopichthys were found from the Lower Miocene of the Sihong Basin, Jiangsu Province (25). These results indicate that the endemic East Asian cyprinids dispersed to the position of the current Yangtze River and evolved into shes laying semi-buoyant eggs by approximately 19 Ma (Fig. 2c), suggesting that the Yangtze River had reversed its ow direction eastward and formed the present drainage system before that time, close to the Oligocene-Miocene boundary (~24-19 Ma). The hematite/goethite proxy from Ocean Drilling Program (ODP) Site 1148 in the South China Sea (43), which is positively correlated to the strength of East Asian monsoons, rapidly rose to a peak at approximately 19 Ma (Fig. 3d). This indicates that the climate in East Asia became humid at that time, and abundant rainfall was conducive to the formation of a major Yangtze River system with high discharge.
Geological studies constrain the age of formation of the present Yangtze River system to 23-36.5 Ma based on 40 Ar/ 39 Ar dating of basalts and U-Pb dating of zircon sand grains from the lower reaches of the Yangtze River and the appearance of evaporites and lacustrine sedimentation in the Jianghan Basin (9).
Detrital zircon U-Pb geochronology and heavy mineral analysis from the Cenozoic sediments of the Jianghan Basin de ne their provenance and indicate that the age of incision of the Three Gorges must have postdated 32 Ma. The best date for initiation of the modern river is likely after the ~24.6 Ma unconformity (44). These results are close to the date of Yangtze River formation estimated from the timing of divergence of the semi-buoyant egg group in our study. Due to using different sampling locations, dating methods and proxies, the ages of the connection and formation of the Yangtze River system were incompatible with other geologically based models (Supplementary Table S2).
The formation of the current Yangtze River system hindered gene ow of some terrestrial species between the north and south sides of the mainstream, resulting in genetic diversi cation and speciation. The divergence dating of the primitively segmented spider genera Sinothela and Ganthela, which are distributed on the north and south sides of the Yangtze River, was estimated to be 13-30 Ma, which is consistent with the suggested formation the modern Yangtze River system before the Miocene (45). This divergence timing has a much broader range than we inferred, probably due to fewer species and the lack of fossil calibrations.
In addition, the specialized schizothoracine shes mostly live in the Qinghai-Tibetan Plateau at an elevation of >2750 m a.s.l (39,40). Based on the results of the time-calibrated phylogeny of schizothoracine shes ( Supplementary Fig. S6), the timing of divergence between primitive and specialized grades was likely at ~18 Ma, indicating that the Qinghai-Tibetan Plateau had reached a high elevation in the early Miocene. The surface of the southeastern Qinghai-Tibetan plateau was uplifted when ductile lower crust beneath the central plateau owed towards the plateau margin from the late Oligocene to the early Miocene (46,47). At the same time large-scale strike-slip faults linked to extrusion of crustal blocks from Tibet by the colliding Indian block resulted in the reversal or capture of river systems (6, 9). Therefore, the Yangtze River diverted its ow from being towards the south to eastward and incised through the Three Gorges to form the modern river system at that time. Based on the result of the diversi cation dynamic from Bayesian Analysis of Macroevolutionary Mixtures (BAMM) and RPANDA analyses (Fig. 3a-c), the net diversi cation rate of the endemic East Asian cyprinids peaked at approximately 13 Ma, together with a signi cant rate shift con guration in the maximum a posteriori probability. These results imply that the drainage network was rich in the Yangtze River basin, and provided a large number of niches, facilitating rapid radiation and dispersal of shes. In the middle Miocene (~13 Ma; 95% CI: 11.4-14.6 Ma), sh laying adhesive eggs arose again, including xenocyprins and cultrins (Fig. 2a). This nding indicates that to adapt to the lake environment, endemic East Asian cyprinids evolved into shes spawning adhesive eggs that attached to aquatic plants to develop. These results suggest that the potamo-lacustrine ecosystem of the Yangtze River had appeared by that time (Fig. 2d). Coincidentally, the hematite/goethite proxy of sediment at ODP Site 1148 in the South China Sea, which is related to humidity and temperature peaked at ~13 Ma (Fig. 3d) (43), indicates that the climate in East Asia was humid at that time. Strong precipitation would have sustained a potamolacustrine ecosystem in the Yangtze River, greatly increasing species diversi cation.
In summary, we used the spatiotemporal evolutionary pattern and diversi cation dynamics of endemic East Asian cyprinids from the largest molecular phylogenetic tree of Cyprinidae, fossil records and information on egg type evolution while adapting to varied hydrologic conditions to reconstruct the formation history of the Yangtze River system. Our results indicate that the ancestors of East Asian cyprinids were con ned to the south of East Asia between the palaeo-Pearl and palaeo-Red rivers during the Oligocene, prior to formation of the Yangtze River system. At that time the palaeo-Jinshajiang owed southward to the South China Sea roughly along the course of the modern Red River. Endemic East Asian cyprinids had dispersed to the Yangtze River basin and evolved into shes laying semi-buoyant eggs bỹ 19 Ma, which suggests that the Yangtze River system had formed by that time in response to regional surface uplift, large strike-slip tectonism and climate change. The formation of the Yangtze River is constrained to be around the Oligocene-Miocene boundary (~24-19 Ma) when monsoon climate was strengthened. Notably, the endemic East Asian cyprinids evolved into shes spawning adhesive eggs again by approximately 13 Ma, coinciding with a peak in the net diversi cation rate of this endemic clade and the intensity of the East Asian summer monsoon (43), indicating that the Yangtze River system probably had developed into a potamo-lacustrine ecosystem with high productivity by the middle Miocene. Our studies constrain the ages of important geological events during the evolution of the Yangtze River from a biological perspective, helping us to understand the evolutionary history of the Yangtze River system.

Methods
Data collection and processing. Information on the scienti c names, taxonomic position, distribution and egg type of Cyprinidae was obtained from FishBase (www. shbase.org), Catalog of Fishes (http://researcharchive.calacademy.org/research/ichthyology/catalog/ shcatmain.asp) and additional dedicated publications (Supplementary Data). Cytochrome b gene is the most commonly used genetic loci in species identi cation of the vertebrates, particularly shes (51,52). A total of 1423 cytochrome b sequences from 284 genera of Cyprinidae (representing 77% of the 367 living genera) (53) and three outgroup taxa were collected by GenBank (accession numbers of all the sequences are listed in Supplementary Data). While sequences were aligned using MAFFT version 7, ambiguous regions in alignments were removed using Gblocks v.0.91 (54).
Phylogenetic analyses. The phylogenetic analyses were conducted with Maximum likelihood (ML) and Bayesian inference (BI) in RAxML v. 8.2.12 (55) and MrBayes v.3.2 (56), respectively. The ML analyses were implemented under a GTRGAMMA model with 100 rapid bootstrap inferences using a thorough ML search. For BI analyses, the best-tting nucleotide substitution model GTR+F+I+G4 was calculated with ModelFinder in PhyloSuite (57). Two independent runs were performed through 20,000,000 generations with four Markov chains. The rst 25% of trees were removed as burn-in. Chain convergence was inspected in Tracer 1.5 (http://tree.bio.ed.ac.uk/software/tracer/), and the results with an effective sample size (ESS) for each parameter >200 were accepted. A consensus tree was produced.
Divergence time estimation. Based on the optimal ML tree topology obtained with RAxML v 8.2.12 (55), penalized likelihood dating analysis was conducted in treePL (58) to estimate divergence time. We sorted the available fossils from the literature and selected sixteen fossils as fourteen calibration points (Supplementary Text). To identify the appropriate level of rate heterogeneity in the phylogram, crossvalidation analysis was conducted in treePL while testing 37 smoothing parameter values from 1018 to 10-18. To calculate the con dence intervals for the dating estimates of each node, 100 bootstrap replicates were generated by RAxML, with topology xed to the best ML phylogram but with varying branch lengths. We then conducted treePL on these 100 replicates. Age statistics of all nodes were summarized with TreeAnnotator v. 1.10.4 (59).
Diversi cation dynamic analyses. The net diversi cation rate of the endemic East Asian cyprinids was inferred based on the program Bayesian Analysis of Macroevolutionary Mixtures (BAMM) v2.5 (60). The BAMM analysis was run for 100 million generations at a Poisson rate prior to 0.1, sampling event data every 10,000 generations. Prior distributions were set based on the setBAMMPriors function in the Rpackage BAMMtools 2.5.0 (61). The rst 10% of samples were discarded as burn-in. The posterior distribution of rates allowing the estimation of the diversi cation rates through time and the best con guration of the diversi cation rate shift was analyzed and visualized by using BAMMtools from the endemic East Asian cyprinids data set. To cross-validate the BAMM inferences we used the R-package RPANDA to study diversi cation dynamics. Within RPANDA (62), we performed time-dependent diversi cation models as implemented with nine models (Supplementary Table 3) to detect whether rates varied through time. We used nine paleoenvironment-dependent models (Supplementary Table 3) to nd a relationship between the changes of diversi cation rates and East Asian monsoons in RPANDA. East Asian monsoons data was obtained based on the hematite/goethite proxy which was measured from the Ocean Drilling Program (ODP) Site 1148 in the South China Sea (43). Compared the likelihood supports, we chose the model with the smallest corrected Akaike information criterion (AICc) value as the best diversi cation model.

Declarations Data availability
Fossil calibration points, phylogenetic trees, summary of literature, models of RPANDA diversi cation analyses and information on shes of the Cyprinidae are available in Supplementary Information.

Code availability
All code used in this study is deposited in Dryad (https://datadryad.org/stash/share/bUZQ8TepzMJczH2QjUC7mc8JyjHYGDg2GDAOdDPc0V8).   Diversi cation dynamic and macroevolutionary patterns of the endemic East Asian cyprinids. a, A single rate shift con guration with the maximum a posteriori probability represented as a phylorate plot of the endemic East Asian cyprinids showing variation in speciation rates. Warmer colors represent higher rates.
Red dot denotes estimated position of the single rate shift con guration that has highest posterior probability in 95% credible set of shift con gurations. b, The rate-through-time plot of the endemic East