Chloroplast genome and historical biogeography of the three Magnolias

Magnolia ocinalis, M. ocinalis subsp.biloba and M. hypoleuca are all typical medicinal plants, belonging to genus Magnolia and Family Magnoliaceae. Their molecular information, particularly genetic difference, were known less. In this study, the platform Illumina HiSeq was used to sequence and assemble a novel cp (chloroplast) genome of M. hypoleuca followed by cp purication. Combined with two published cp rawdata, gene cycles and function annotations were comparably performed for the three plant species. The results indicated that 19 791 019 clean reads was assembled for M. hypoleuca cp, Q30 being 91.33%, and genome 160 051 bp. Its GC content is 39.2%, including 37 tRNAs and 8 rRNAs. The M. hypoleuca had smaller chloroplast genome and more introns (or exons) than M. ocinalis and M. ocinalis subsp.biloba. And there were respectively 11 and 8 more functional genes in M. hypoleuca cp than that in the other two. Based on cp complete genomes sequences, we constructed the phylogenetic relationship and estimated the divergence time of the three species by ML (Maximum likelihood) method, with other 10 published Magnoliaceae species. The results showed that M. ocinalis subsp.biloba and M. ocinalis might diverge from M. hypoleuca around 18.98 Ma, then they diverged from each other around 15.00 Ma. Additionally, the middle Miocene warming period might play an important role in the demographic and evolutionary histories of the three Magnolias, which provided a novel insight of the origin and dispersal routes of M. hypoleuca.


Introduction
M. o cinalis, M. hypoleuca and M. o cinalis subsp.biloba are the deciduous tree belonging to the genus Magnolia [1]. Their bark have been used for thousands of years in Chinese and Japanese traditional medicines and are still widely employed as herbal preparations for their sedative, antioxidant, antiin ammatory, antibiotic [2], dyspepsia [3] and antispastic effects [4]. Besides, three Magnolias have many other uses besides medicine. For example, M.hypoleuca, famous in Korea, Japan and China [5], is widely utilized as a natural packaging material for traditional foods in Japan [6]. Meanwhile, a comparision of three Magnolias found that M. hypoleuca owned unique biological characteristics, such as stronger cold resistance ability [7][8], higher ß-cineol content [9] and faster growth and maturity rate.. Former researchers [10] found that M. hypoleuca was the northernmost of all Magnolias plants and its cold tolerance was better than other Magnoliaceae plants,which is native to the south of the Kuril islands [11].
Till now, M. hypoleuca was widely distributed in Janpan, China and Korea. Instead, M. o cinalis subsp.biloba and M. o cinalis are native to China and were widely distributed in China. The origin of Janpanese islands was the eastern margin of the Eurasian Continent rifted approximately 700-750 Ma [12]. After the formation of Japan, those continental islands have experienced various paleogeographic and paleoclimatologic changes resulting in high endemism and biodiversity of terrestrial species [13][14][15], such as Liriodendron [16], Magnolia section Rytidospermum [17], Some of them had been reported with the molecular phylogenetic techniques to study evolutionary patterns or estimate divergence times of disjuncts. It has been proposed that = biogeographic studies of widely distributed plants can be performed to provide insight into the broader patterns of the evolutionary history and geographic diversi cation [18]. However, there is no analysis of the evolutionary history and biogeography of the three Magnolias.
Cp-genome (chloroplast genome) possess a highly conserved tetrad structure, containing two inverted repeat (IR) regions (IRa and IRb), a small single-copy (SSC) region and a large single-copy (LSC) region [19][20]. In addition to photosynthesis, cp genome-encoded proteins are involved in other metabolic processes, such as responses to heat, drought, salt, and light [21]. Besides, cp-genome were widely used in species evolution [22], phylogenetic [23][24][25] and biogeographic studies [26][27][28]. Meanwhile, organismic and environmental processes played a major role in organismal evolution [29][30]. Currently, with the rapid development of high throughput sequencing technologies, made it possible for reseachers to obtain cp genomic sequences to study species evolution, phylogenetic and biogeographic [31].Therefore, whole cp genome of M. hypoleuca was urgently needed, which might signi cantly provide an insight in plant phylogenetic relationships and contributed to the, domestication, and utilization of Magnoliaceae plants.
In this study, we sequenced cp genome of M. hypoleuca and performed the comparative genomes analysis to obtain comprehensive understanding the structure of cp genomes within three Magnolias.
Meanwhile, we studied divergence time of three Magnolias, which indicated that it was consistent with Darwinian evolutionary theory. Our study will provide genetic resources for future research in the genus, and decipher the genetic relationship of three Magnolias to provide some reference value for molecular breeding of superior variety, and also provide useful information for identi ng three Magnolias species, and providing insight into their evolutionary history.

Results
Genome Sequencing, assembly and annotation A length of 19 816 708 raw reads of M.hypoleuca was obtained by Illumina HiSeq PE14 sequencing platform. Through removing the connector and low-quality reads, a length of 19 791 019 clean reads of M.hypoleuca were obtained and the Q30 was 91.33%. After sequence assembly and annotation, the results showed that the total length of the cp genome of M.hypoleuca, M. o cinalis subsp.biloba and M. o cinalis were 160 051 bp, 160 099 bp and 160 183 bp, respectively. (Fig 1 and Table 1). Among them, the genome of M.hypoleuca had a quadripartite structure with an SSC of 18 771 bp and an LSC of 88 146 bp, which were separated by two IR regions (IRa and IRb, each 26 562 bp). The GC content of the overall cp genome was 39.2%, comprising 8 rRNA and 37 tRNA, respectively (Table 1). Through the comparision of the cp genomes of three Magnolias, we concluded that the cp genomes structure had a typical quadripartite structure, with a circular molecule of 160 051 bp to 160 183 bp in length, and the content of GC, rRNA and tRNA were same in the cp genes of three Magnolias. On the contrary, the total number of genes and coding region were distincted in three Magnolias, with its total number of M. hypoleuca were 13 and 14 times higher than M. o cinalis and M. o cinalis subsp.biloba, respectively (Table 1).

Comparative analysis of cp genomes
Chloroplast is important organ for plant photosynthesis, which is very conservative in structure.
Comparative analysis of cp genomes is an essential step in genomics [32][33]. A comparison of the structural differences among cp genomes of three Magnolias indicated that the cp genome of M. hypoleuca was the smallest (Fig 1; M. hypoleuca, 160,051 bp; M. o cinalis subsp.biloba, 160,099 bp; Magnolia o cinalis, 160,183 bp). Instead, Comparison of the functional genes quantity showed that the cp genome of M. hypoleuca was the most which indicated M. hypoleuca had more functional effects. The result showed that M. hypoleuca had 11 more functional genes than M. o cinalis, including psbC ycf14 ycf1 rps14 rps12 rps12 rpl22 rpl23 petD petL ropC2, respectively, and had 8 more functional genes than M. o cinalis subsp.biloba, including psbC ycf14 ycf1 rpl22 rps12 rpoC2 petD petL, respectively (Table 3). Studies on ribosomal protein rpl22 and rpl23 revealed they were essential of metabolism of organisms, whatever the plant was the stage of developmental or light phase. Moreover, the above all different genes were intensively located in Photosynthetic System Subunit, large ribosomal subunit and small ribosomal subunit, which mainly control photosynthesis and polypeptide formation in plants [34][35].
Except for the length of total cp complete genome sequences and amount of functional genes, the frequently divergent regions between three Magnolias were mainly in the introns (or exons) content in genes ( Table 2). Introns play important roles in regulating gene expression [ 36]. The genes containing introns (or exons) of cp genomes of M. hypoleuca had 2 more than that in M. o cinalis and M. o cinalis subsp.biloba. They were respectively petD and rpl16 gene ( Table 2). It indicated that petD and rpl16 genes of M. hypoleuca were more than that in M. o cinalis and M. o cinalis subsp.biloba. In this study, the length of exon of ycf3 and rps12 gene of M. hypoleuca were 226 bp longer and 114 bp smaller than the other two species. In all species, the two rps12 gene were trans-spliced. There were two rps12 gene in cp genomes of three Magnolias. Those were, the length of intron and exon of another rps12 gene of M. hypoleuca were 535 bp longer and 227 bp smaller than the other two species. Moreover, rps12 gene encoded the ribosome S12 protein [37], which is usually highly conserved, and its structural change was thought to be the result of evolution [38]. It revealed differences among the three Magnolias.

Divergence time estimation based on cp complete genomes
We used 14 cp complete genomes of Magnoliaceae and Ginkgoaceae for phylogenetic reconstruction and estimation of the divergence times of three Magnolias. Partial phylogenetic tree of 14 species was constructed by ML method (Fig. 2). Results showed the ML phylogenetic tree based on the cpDNA of partial Magnoliaceae was divided into two main clades (Fig. 2). The rst clade was Magnoliaceae and the second clade was outgroup, Ginkgo biloba. Among them, the rst group could be further divided into two secondary groups, including 11 species of Magnolia genus as the rst groups and Linriodendron genus as the second groups. The phylogenetic relationship of 14 species was mostly consistent with the study of Lirianthe hodgsonii [39][40] tulipifera and L. chinense. Besides, the result showed that the degree of genetic divergence between M. o cinalis and M. o cinalis subsp.biloba were later than M. hypoleuca in three Magnolias. The divergence time of M. o cinalis was estimated to be 15.00 MYA and it was same as M. o cinalis subsp.biloba, later than divergence time of M. hypoleuca which was estimated to be 18.98 MYA. Meanwhile, Liriodendron was estimated to have diverged in the mid Miocene based on allozyme and restriction fragment length polymorphism (RFLP) analyses of cpDNA and paleobotanical evidence [41]. It was consistent with the conclusion in this study (Fig 2). The divergence time of 13 Magnoliaeace we selected was subdivided at the period of 14.20-21.78 MYA. It indicated that the evolutionary and biogeography of partial modern Magnoliaceae species mainly occurred in mid Miocene.

Discussion
The cp genomes of the three Magnolias species were same in the content of GC, rRNA and tRNA, including 37.2% GC, 37 tRNA and 8rRNA. The analysis of codon usage in cp genomes of Actinidia chinensis showed that codon content was mainly affected by multiple factors such as base composition, natural selection and gene mutation [42]. And there was a negative correlation between the variation degree of cp genome sequence and the GC content of the sequence [43]. Thus, the rRNA, tRNA and IR regions were relatively conservative in structure due to the GC content of all regions of Magnoliaceae was high in this regions. The stable cp complete genomes structure of three Magnolias indicated that its variation degree was very small. Furthermore, it showed that the relationship of three Magnolias was relatively close and slow in evolution. It was consistent with the discovery that Magnoliaceae evolved respectively slowly [44].
Instead, there were different numbers of functional genes of cp genomes of three Magnolias. Especially functional genes numbers of M. hypoleuca were more 11 than M. o cinalis, and more 8 than M. o cinalis subsp.biloba. Among them, psbC and psbD genes were key genes of photosynthetic system . Some researchers found the synthesis of psbC gene product might occur in the transcript containing psbD sequence. That indicated that psbD and psbC genes jointly coordinated the effect on light [45]. Some scholars [46] found that psbD gene in the Phyllostachys japonicus leaves played the protective role of photosynthetic system and reduce strong sunlight damage, when the plants were in a stage of growth and color differences. In this study, psbC gene number of M. hypoleuca was more than that of M. o cinalis and M. o cinalis subsp.biloba. Meanwhile, the different number of psbC gene of three Magnolias might be one of the factors affecting photosynthesis and growth cycle.
In this study, the total number of tRNA in the cp genomes of three Magnolias was same, but the corresponding amino acid types were diversi ed. M. hypoleuca had 3 more tRNA in alanine transport than the other two species and it had unique types of tRNA in histidine transport. Some studies showed that the types and quantities of tRNA will change in different periods at the same organism and will be adjusted accordingly with environmental changes [47]. For example, the natural structure formation and expression of tRNA was controled and affected by temperature and ionic strength [48]. Alanine was one of the main products of anaerobic metabolism in plants, which could accumulate rapidly under stress conditions [49], and its free form could resist environmental stimulation. The content of free histidine in cp was positively correlated with the SOD (superoxide dismutase) content and it counld resist cold and oxidation. In short, the unique types and amounts of tRNA in the cp genome of three Magnolias correspond to their various biological characteristics.
Comparing three Magnolias species cp genes introns (or exons ), the results showed that rps12 gene introns of M. o cinalis and M. o cinalis subsp.biloba was missing, and its exons content was 0.50 times of M. hypoleuca. Furthermore, the exons and introns of petD and rpl16 genes were found only in M. hypoleuca. The plastid ribosomal protein S12 encoded by the rps12 gene was a highly conserved protein, which was located in the functional center of the 30S subunit of the ribosome. Some scholars found that rps12 was a trans-splicing gene in ferns [50], and the evolution rate of species was affected due to its change in intron deletion and exon position. It indicated that intron loss accelerated the evolution rate. In this study, the different length of rps12 introns of three Magnolias was consistent with the previous study, revealing the evolutionary relationship between three Magnolias. Furthermore, some scholars [37] found that the diversi ed expression of rps12 gene caused the different phenotypes in plants, such as sharp and narrow leaves, serrated or defective leaves.. What's more, according to the petD gene of Ginkgo biloba [51], it found that the expression level of petD was highest in leaves, next to the stem, which was related to the formation of cytochrome b/f complex. The main function of cytochrome b/f complex was to participate in the electron transport process of photosynthesis and enhanced the photosynthetic ability of leaves. In this study, we found that the petD gene of M. o cinalis and M. o cinalis subsp.biloba was probably pseudogenes that they did not gene expression. That revealed that the photosynthetic capacity and growth ability of M. hypoleuca were stronger than the other two species. Besides, the phylogenetic tree of 14 species showed that the Magnoliaceae was divided into two branches, Liriodendron and Magnolia. Among the three Magnolias, the Phylogenic relationship of M. o cinalis and M. o cinalis subsp.biloba were closer, next to M. hypoleuca. It indicated the divergence of three Magnolias was due to geographic movements. Thus, the main disjunction we dated in this study was within three Magnolias based on estimation of the divergence times and phylogenetic reconstruction.
We used the cp complete genomes of 13 Magnoliaeace species and 1 outgroup for phylogenetic reconstruction and estimation of the divergence times. Through a comparision of the divergence time of three Magnolias, the divergence time between M. o cinalis and M. o cinalis subsp.biloba was estimated to be 15.00 MYA, and divergence time between M. hypoleuca and other two species was estimated to be 18.98 MYA. It indicated that divergence time of M. hypoleuca was earlier than the other two. Moreover, the leaf shape of Magnolias was divided into three types: the rst type of leaf shape was sharply pointed or blunt, the second type had slightly absent or obtuse leaves at the apex and the apex of the leaves of the third type was concaved into 2 shallow lobes, but there was no obvious concave at the apex of the seedlings, which were more obtuse or slightly emarginate [52]. According to the theory of species evolution, the rst type is relatively primitive, and the third type has a higher degree of evolution.
Among them, the leaf shape of M. hypoleuca, M. o cinalis and M. o cinalis subsp.biloba belonged to the rst type, second type and third type, respectively. It indicated that M. hypoleuca was relatively more primitive than M. o cinalis and M. o cinalis subsp.biloba by the diversity of leaf shape of three Magnolias. Moreover, the Miocene was a period with globally warmer climates than those in the preceding Oligocene, or the subsequent Pliocene [53]. Among them, the middle Miocene warming period was from 13 to 18 MYA [54][55]. Some researchers found that the Miocene oras had many elements in common with the modern mesophytic oras of eastern Asia and eastern North America, supporting the proposal that the divergence of the modern north temperate elements occurred during that period. There are some else to support this proposal. such as, Manta, devil rays, Pterocarya [55] and Magnoliaceae in the Northern Hemisphere and so on. This study indicated that time-division of three Magnolias might occur in the period of middle Miocene warming period.

Conclusions
In this study, we reported and analyzed the cp complete genomes of M. hypoleuca, which is the components of deciduous broad-leaved forest in the cold temperate zone. And we compared cp complete genomes of three Magnolias, the results revealed that the genome size of M. hypoleuca was smaller than M. o cinalis subsp.biloba and M. o cinalis. Moreover, we found the distinct difference of three Magnolias from the content and length of introns and exons in genes, and the types and quantity of functional genes. We detected the GC content of three Magnolias cp genome was 39.2%, comprising 8 rRNA and 37 tRNA. In addition, the result showed that M. hypoleuca had 11 and 8 more functional genes than M. o cinalis and M. o cinalis subsp.biloba, respectively. And through constructing the relationship between phylogenetic and divergence time of the three Magnolias, it indicated that the divergence time of three Magnolias might take place during the middle Miocene (13-18 Ma).

Plant material
The fresh and insect-free leaves samples of M. hypoleuca were collected in the State Bank of Chinese Drug Germplasm Resources, Chengdu University of Traditional Chinese Medicine (Chengdu, China).
According to check out previous studies, no cp complete genome of M. hypoleuca has been reported, even the comparision of cp complete genomes of three Magnolias. Therefore, we decided to analyse and compared the difference of cp complete genomes of three Magnolias. In this study, all methods were performed in accordance with the relevant guidelines and regulations of China.

Genome assembly and annotation
Raw sequences were submitted to China National Center for Bioinformation (BioProject number PRJCA004348). Pair-end Illumina raw reads were cleaned from adaptors and barcodes, and then quality ltered by Trimmomatic [57]. After quality ltering, we retrieved the cpDNA sequence of M. o cinalis (NC_020316) and M. o cinalis subsp.biloba (JN867581) as the references. The plastome was assembled using online platform of Galaxy (https://usegalaxy.org) [58]. Then, the cp complete genomes of M.hypoleuca, M. o cinalis and M.o cinalis subsp.biloba were annotated using the online program CpGAVAS2 [59]. And we drawed a circular representation of both sequences by using the online tool Genome VX [60]. The 14 cp complete genomes sequences were aligned using megaX. The program megaX was used to infer ClustalW, with Pairwise alignment of the sequence, according to the pairwise alignment to calculate the pairwise distance matrix. Moreover, model detection tool, "Model",was used for optional model detection. We used the optional model to construct the evolutionary tree using the ML method, with the bootstrap number set as 1000. Among them, only the bootstrap number of Phylogenetic tree was bigger or equal to 50%, it just valued to be reliable. The clades division in gures was mainly based on the clustering results of phylogenetic trees and the morphological classi cation of Magnoliaceae. Then the tree with ".nwk" format was downloaded for next analysis.

Divergence time estimates
We used the cp complete genomes data set for dating the divergence times. ML approaches based on a global clock model was usually used in the time estimates. A likelihood ratio test ruled out a global molecular clock (P < 0.05) for our data [61]. Time estimates were done based on a global molecular clock and fossil data. That was, we used the nal aligned sequences with 14 species, which were converted to MEGA format by using MEGAX software [62], and the phylogenetic tree of 14 species with .nwk format.
An ML tree with lengths inferred from GARLI was used in the PL estimate with steps. All analyses were performed using the GTR model of nucleotide substitution with a gamma distribution with four rate categories. The tree calculation models was implemented in each analysis, with rate variation across branches assumed to be uncorrelated and lognormally distributed. Among them, three clades with an expanded outgroup species was used in the dating analyses, meaning the that individual DNA alignments were pruned to eliminate multiple accessions of the same species [63]. The fossil times of L. chinense vs L. Tulipifera was 14.2 MYA, M. zenii vs L. chinense was 55 MYA and M. sprengeri vs M. zenii was 28.3 MYA.The outgroup fossil times Ginkgo biloba vs Magnoliaceae was 313 MYA. Therefore, the phylogeny was calibrated using four Magnoliaceae fossils that were applied to the stem nodes of L. chinense, L. Tulipifera, M. zenii and M. sprengeri. Additional fossil of Ginkgoaceae species was used to calibrate the stem node of Magnoliaceae and selecteed as an outgroup. All analyses were outputed in nwk format. The nal estimates were obtained using the model that yielded the highest posterior probability. Posterior distributions of parameters were approximated using two independent analyses of 20 000 000 generations with 10% burn-in. Samples from the two chains which yielded similar results were combined.

Additional Information
The corresponding author is responsible for submitting a competing interests statement on behalf of all authors of the paper. Tables   Table 1 Basic