Phylogenomics and biogeography of Wisteria (Fabaceae), with implication on plastome evolution of inverted repeat-lacking clade

Background: The inverted repeat-lacking clade (IRLC) of Fabaceae is characterized by loss of an IR region in plastomes. Both the loss of an IR region and the life history may have affected the evolution of the plastomes in the clade. Nevertheless, few studies have been done to test the impact explicitly. Wisteria , an important member of IRLC and has a disjunct distribution between eastern Asia and eastern North America, has confused interspecific relationships and biogeography, which need to elucidate in depth. Results: The plastome of six newly sequenced Wisteria species and a Millettia japonica ranged from 130,116 to 132,547 bp. Phylogenetic analyses recognized two major clades in IRLC: Glycyrrhiza - Millettia - Wisteria clade and a clade containing the remaining genera. North American Wisteria species and Asian species formed reciprocal clades. Within Asian clade, each of the two Japanese species was sister to a species in the Asian continent. A ~16kb inversion from ndh F to ycf 1 in all IRLC species. Wisteria and Millettia japonica have two intron of rps 12 gene but all other IRLC species just have one. Synonymous substitution rates ( d S ) of protein coding genes were higher in the IRLC species than non-IRLC species. Woody species have lower substitutions rates than herbs. Wisteria may have originated in East Asia by the boundary of Oligocene and Miocene and the eastern Asian-eastern North American disjunction formed in the Late Miocene, while two vicariance events formed the disjunct distributions between the Asian continent and the Japanese islands in the Quaternary. Conclusions: In the IRLC clade, Wisteria , Milletia japonica and Glycyrrhiza form a clade to the remaining genera, most of which are herbaceous. Both the loss of one IR region and the herbaceous habit elevated mutation rates of the plastomes. Multiple vicariance events between eastern Asia and eastern North America, and between the Asian continent and the Japanese Islands may have promoted speciation of Wisteria since the Late Miocene. Plastomes contain rich genetic diversity for studying genetic for the systematics and biogeography of Wisteria

3 structure and migration of populations in response to climatic changes, which benefits conservation of rare and endangered species.

Background
The plastomes of most vascular plants usually have a conservative quartile structure with a large single-copy (LSC) region and a small single-copy (SSC) region separated by two inverted repeat (IR) regions [1][2][3][4][5]. They are about 150 kb in size and contain around 114 unique genes, including four rRNA genes, 30 tRNA genes, and 80 protein genes [6].
Comparative analyses of plastomes have identified some unique features, such as large segment inversion [7], high frequency of repeats [8], gene gain and loss events (including pseudogenization) [9], and expansion or contraction of IRs [10]. The IR region varies greatly among plant lineages, which may have affected the stability of chloroplast structure resulting in an elevated variation of single copy regions [11,12]. One of the most notable changes of the IR region is the loss of a repeat in a lineage of Fabaceae, namely the inverted-repeat-lacking clade (IRLC), which has been supported by various molecular systematic studies [15][16][17][18][19][20][21].
Wisteria, one of the most romanticized and spectacular garden plants, has been widely cultivated around the world. Species of Wisteria are woody, deciduous lianas, and bear odd pinnate compound leaves, pendulous racemes, typical papilionaceous corolla in purple, violet or white [31][32][33]. Wisteria composed of 6-8 species with a disjunct distribution in Eastern Asian and Eastern North American temperate deciduous forest [16,21,[34][35][36][37][38] [19,40]. Previous studies support the reciprocal monophyly of eastern Asian and eastern North American species, but the interspecific relationships of Asian species have not been resolved [37], largely due to the small amount of data. Plastomes have been widely used in phylogenetic studies of closely related species [1,4,9,41]. Within Wisteria, plastomes of W. sinensis and W. floribunda have been reported before [42]. Thus, we obtained plastomes from other species of Wisteria for generating a robust phylogeny of the genus.
The objectives of the study were 1) exploring the impact of the loss of an IR and life history traits on the evolution of plastomes in the IRLC clade, 2) resolving phylogenetic relationships of the IRLC clade and Wisteria, and 3) examining the implications of the phylogeny for the systematics and biogeography of Wisteria.
The forward, palindromic and reverse repeats were detected in W. floribunda, W. venusta and W. villosa, while only forward and palindromic repeats were found in W. brachybotrys, W. frutescens and W. sinensis (Fig. 2B, Table S4). The total number of repeats ranged from 43 (W. villosa) to 70 (W. venusta) and most repeats were in no-coding regions (82.09%-87.14%, except 65.85% for W. villosa). Among the six plastomes of Wisteria, the proportion of forward repetition was the highest, ranging from 31 in W. villosa to 65 found in W. venusta (Fig. 2C, Table S5).

Phylogenetic relationships
The matrix of 28 plastome CDS sequences was 71,545 bp in length. In both the ML and BI

Substitution rates
The mean d S of SC genes was 0.022 ± 0.001 in the IRLC pairs and 0.009 ± 0.000 in the non-IRLC pairs. For IR genes, they were 0.014 ± 0.003 and 0.002 ± 0.000 respectively. All herbaceous pairs had significant higher d S than the woody pairs (P < 0.01). The herbaceous pairs in the IRLC had 2.07-fold d S in SC region than woody pairs, whose average d S of the SC genes was 0.010 ± 0.000 (Table S2).

Divergence time estimation
The estimated divergence time of IRLC was 40.11 (35.73-44.36) Ma (node c, Fig. 5, Table   2). The stem age and crown age of  Table 2.

Ancestral area reconstruction
The result of BMM analysis indicated that the most recent common ancestor (MRCA) of Wisteria was distributed in China with 33.57% probability, in Japan with 33.22% probability, meanwhile, in eastern North America with 12.22% probability (Fig. 5). There may be a dispersal event from China to eastern North America, then followed by a vicariance event, with the probability of 12.78%. For Asian wisterias, the ancestral range was indistinct, with 37.09% chance in China, a 31.18% chance in Japan and 31.43% probability in China and Japan. Vicariance events were also identified between W.

Phylogenetic relationships among IRLC and within Wisteria
Previous studies have employed some plastid and nuclear sequences to determine the phylogeny of legume family, however, the phylogenetic relationships among IRLC taxa have not been resolved satisfactorily [16,[34][35][36]. In this study, The IRLC contained two strongly supported subclades (BS = 100, PP = 1; Fig. 4). Glycyrrhiza was the basal group of the IRLC in the matK gene tree [16], or formed a clade with Vicioid species (such as  Fig. 4), which is sister to the remaining genera [42,62]. Millettia japonica has a similar phenotype with Wisteria, and was initially described as Wisteria japonica Siebold & Zucc. Recently, it was recognized as a separate genus Wisteriopsis J.
Compton & Schrire [19]. In addition, Caragana and (Astragalus + Carmichaelia) form a clade, which is sister to the clade of the remaining seven genera (Fig. 4). This is consistent with previous research based on matK gene [38].
Our plastome sequence data provide strong support for the monophyly of Wisteria and for the sister relationship of the American W. frutescens and the Asian species (Fig. 4).
Furthermore, The Japanese W. floribunda and W. brachybotrys are not sister species, instead they form sister relationships with W. sinensis and W. venusta, respectively.
Wisteria brachybotrys and W. venusta have been considered as conspecific by Valder [39], or different species because of different ovule numbers and floral structures [31,40].
However, the divergence time estimation show that they diverged very recently (0.06 Ma).
Additional sampling of multiple populations of the two species is needed to test the species boundary. In previous research, the phylogenetic relationships among W. sinensis, W. villosa and W. floribunda were not resolved [37], most likely due to the small amount of data. Here the two W. floribunda from Japan and Korea form a clade sister to the unresolved clade of W. sinensis and W. villosa. Leaf pubescence, pedicel length, and flower size of W. sinensis and W. villosa are highly variable and even overlap, thus it is unreliable to distinguish them using the so-called diagnostic characteristics [39,63]. This suggests that future studies on population genetics should be used to test whether they represent two separately evolving lineages or species.

Plastome evolution in the IRLC clade
Plastomes have experienced various evolutionary changes in plants, one of which is the loss of one copy of the IR in the IRLC clade of Fabaceae, which may disrupt the gene stability and increase the rates of substitution in single copy regions [8,12,15,18,[64][65][66]. The plastomes of six species of Wisteria and Millettia japonica have a single IR region (Fig. S1), and share similar size (130,116 to 132,547 bp), overall structure, gene order and content (Table 1). In addition, compared to the non-IRLC lineages of Fabaceae, the IRLC plastomes contain the traditional LSC and IR region plus a reversed SSC (a ~16kb inversion from ndhF to ycf1). However, the loss of one IR region may not be the major driving force for the genomic changes in the IRLC [15], because gene orders from Glycyrrhiza glabra to Medicago truncatula are consistent and have not experienced much rearrangement compared with non-IRLC members (Fig. 1). The gene order rearrangements appear to be associated with the herbaceous life history since the woody genera in the IRLC clade did not show the rearrangements (Fig. 1). Short generation time with great opportunities for recombination may have led to the structural changes in the herbaceous genera [29]. The localized hypermutation in Vicia, Lens, Lathyrus, Pisum and Trifolium ( Fig. 1) may be due to the existence of a large number of repeat sequences [15,28]. Lens, Trifoium and Vicia have 9% repetitive DNA on average, however, G. glabra has 4.1%, Wisteria has around 3% and Lotus japonica just has 2.4% [15].
We estimated the synonymous rates of SC for the IRLC species pairs and relative taxa in Fabaceae and the results show that the difference was about 7 times for IR regions, whereas 2.4 times for SC regions. And herbaceous pairs have significant higher d s than wood pairs. The loss of an IR region may have disrupted the stability of the plastome leading to the higher substitution rates in IR genes [1,12]. The impact may be further enhanced by life history in the herbaceous genera.
Structural rearrangement and gene/intron loss (including pseudogenization) may be correlated with increases in the rate of molecular evolution [30]. In the IRLC, the rps16-accD-psaI-ycf4 region shows a higher rate of transfers, substitutions, or losses of genes [15]. For example, ycf4 gene shows an acceleration in Millettioids, Robinioids, and the IRLC than in other legumes, especially 20-fold in Lathyrus. Plastome gene losses are rare in photosynthetic species, the lost gene may have been transferred to the nuclear genome or substituted by a nuclear gene [28]. The loss of rps16 in all IRLC species may be compensated by dual targeting of mitochondrial ribosomal protein S16, which is encoded by a nuclear gene [28]. The accD gene functioning in plant development and possessing recombinationally active repeats has been transferred to the nucleus in two Trifolium species [20,25,67,68]. Pisum and Lathyrus have lost ycf4 and psaI gene, respectively, which can attribute directly to hypermutation of ycf4 and its neighboring genes [28].
Some intron losses may also have occurred with the origin of the IRLC clade [8]. For example, most angiosperms have two introns in the rps12 gene, whereas the IRLC species (except Wisteria and Millettia japonica) just have one, which indicates that the loss of this intron occurred subsequent to the loss of one copy of the IR [21]. There is a loss of intron Our results show that Wisteria has originated at 23.85 (23.26-24.86) Ma (node d, Fig. 5), which is much earlier than a previous estimate of 13.4 (9.7-17.5) Ma [37]. Lacking samples of Callerya may result in the older estimated age of Wisteria. Wang et al. believed that Wisteria may have a more northern distribution in Neogene and the diversification in this genus had taken placed in the middle Miocene [69]. We inferred the ancestral area of Wisteria as Eastern Asia with a high probability (78.10%). However, it is equally likely that ancestral area of Wisteria may be in China (33.57%) or Japan (33.22%).
A subsequent dispersal to eastern North America and a vicariance event has produced the disjunct distribution at 8.66 (2.40-17.38) Ma (Fig. 5). The result is consistent with Li et al., and supports the "out of Asia" hypothesis for the majority of eastern Asian and eastern North American disjunct plant genera [37,70]. Fossils of Wisteria have been found in Asia and Europe indicating a wide distribution until middle Miocene [69]. The Bering land bridge is likely the migratory pathway for Wisteria because it accommodated many temperate components in the middle Tertiary and later times, whereas North Atlantic land bridge disconnected at that time [71][72][73].
Eastern Asia is the center of species diversity for Wisteria. The diversification time of Asian wisterias might be 2.22 (0. 75-5.67) Ma in Pleistocene. Climatic oscillations of the Quaternary at around 2 million years ago have deeply affected plant distribution and genetic structure [74,75]. Asian wisteria is native in 'Sino-Japanese Floristic Region', which has never been directly impacted by extensive ice sheets [76][77][78]. But drastic climate and environmental changes, such as sea-level fluctuations, promoted fragmentation of species ranges, population isolation and variation, thereby providing opportunities for allopatric speciation through selection or genetic drift [79]. Albeit with a limited sampling of population of the Asian species of Wisteria, the sister relationships of the Japanese and Asian continental species support the significant role the allopatry has played in the generation of biological diversity in Asia [80].

Conclusions
In this study, we compared the plastome variation inside and outside the IRLC clade at both the generic and species levels. Our results suggest that several genomic and substitution rate changes might happened after the loss of an inverted repeat disrupting the genomic stability and may be further affected by life history traits of the taxa. The

Plant Samples, DNA extraction and Sequencing
Twenty-eight species were included in the study representing all tribes and 13 genera of the IRLC clade and outgroups Robinia L. and Lotus L.. Within Wisteria , six species from China, Japan and USA were included. Collection information was provided in Table 1

Genome Assembly and Annotation
All seven plastome sequences were assembled in NOVOPlasty 2.6.3 with the matK sequence of Wisteria sinensis (AF142732) as seed and the complete plastome sequence of W. sinensis (KT200359) as reference [43]. Whole sequences were assembled using MAFFT v7 in Geneious 10.2.3 (http://www.geneious.com) [44], then annotated with Glycyrrhiza glabra L. (KF201590) as reference, followed by verifying the start and stop codons manually. In addition, tRNAscan-SE was used to confirm the tRNA genes with default parameters [45]. Finally, the circular maps of the plastomes were drawn using the OrganellarGenome DRAW (ORDRAW) [46], followed by manual editing.

Phylogenetic analyses
Akaike Information Criterion (AIC) in jModelTest v2.1.4 was used to determine the optimal nucleotide substitution model for the protein coding genes [52]. Maximum likelihood (ML) analysis was performed in RAxML-HPC v8.2.8 on CIPRES Science Gateway website with 1000 bootstrap replicates [53,54]. Bayesian analysis was also constructed using MrBayes as implemented on XSEDE 3.2.6 with two independent Markov Chain Monte Carlo chains for 10,000,000 generations and sampling every 1000 generations [55]. The first 25% of calculated trees were discard as burn-in and the remaining trees were used to construct a consensus tree to estimate the posterior probability (PP). Tracer 1.7 was used to make sure that the likelihood scores have reached the plateau [56].

Estimation of nucleotide substitution rates
In order to investigate the difference of mutation in and outside of the IRLC clade, 10 pairs of plastomes from the same genus (Vicia L., Lathyrus L., Trifolium L., Astragalus L., Medicago L., Caragana Fabr., Wisteria , Glycine Willd., Vigna Savi, Dalbergia L. f.) of Fabaceae were chosen to estimate the substitution rates (Table S1). Synonymous substitution rates (d S ) were estimated for each CDS via MEGA 7.0 [57]. We also compared the differences between herbs and woody species pairs. Since the dataset did not conform to the normal distribution, a wilcoxon.test in R package was used to test the significance.

Estimation of divergence time
Divergence times were estimated in BEAST v1. 10

Reconstruction of ancestral regions
To infer the ancestral area of Wisteria, Bayesian Binary MCMC (BBM) analysis was performed in RASP v3.2 [61].

Consent for publication
Not applicable. p l a s t o m e s .

Additional Files
Additional file 1: Table S1-S6 Table S1. Information of species used for in this study. Table   S2. List of synonymous substitution rates (d S ). Additional file 2: Figure S1. Gene map of six Wisteria and Millettia japonica plastomes   Phylogenetic chronogram and distribution range of IRLC species as inferred from Beast analyses based on four calibration points (nodes a-d). The divergence times for selected nodes are shown in Table S6.

Supplementary Files
This is a list of supplementary files associated with the primary manuscript. Click to download. Figure S3.pdf Figure S2.pdf additional file1.docx Figure S1.pdf