The Chloroplast Genome Comparative Characteristic of Articial Breeding Tree, a Case About Broussonetia Kazinoki × Broussonetia Papyifera

Background: Broussonetia kazinoki × Broussonetia papyifera (ZJGS) is a hybrid species, which has a very complicated hybrid origin. Its excellent characteristics make it both ecological benets and economically valuable. Results: This study aimes to further understand ZJGS and Moraceae taxa through the ZJGS chloroplast (cp) genome structure and the comparative with 12 closely related Moraceae species, especially the cross parent B. kazinoki and B. papyrifera. The analyses show that ZJGS cp genome is 160,903 bp in length. Among the 13 Moraceae species, the cp genome length of seven Broussonetia species (ranges from 160,239 bp to 162,594 bp) is larger than that of six Morus species (ranges from 158,459 bp to 159,265 bp). However, the average GC content of Broussonetia species is lower than that of Morus species, which is 35.72% and 36.26%, respectively. Compared with Moraceae species, the ZJGS cp genome has a high degree of sequence similarity with B. kazinoki and B. monoica. In the comparison of repeated sequences, ZJGS and its maternal species B. kazinoki shows similar simple sequence repeats (SSRs) frequency. Among the 77 shared protein-coding genes (PCGs) in Moraceae species, the obvious positive selection of Ka/Ks ratios acted on petD and rpl16 genes of B. kazinoki and B. papyrifera, respectively, and the Ka/Ks ratio > 3. Phylogenetic analysis based on shared PCGs from 28 species shows that ZJGS is closely related to maternal B. kazinoki. Conclusions: These ndings provide data support for the origin of ZJGS hybridization, and provide genomic resources for future ZJGS resource development and molecular breeding.


Background
There are many hybridization phenomena in nature, which are regarded as effective methods to produce "positive species" [1][2][3]. Hybrid breeding is widely used in forestry production, specially in horticulture. At present, there are already many hybrid breeds used in agriculture and forestry production based on the heterosis. The hybrid aspen (Populus tremula × Populus tremuloides) has a higher yield than the parent species, and can be quickly regenerated from the root sucker [4][5][6]. Hybrids of Abies genus have the advantage of strong anti-pollution ability and resistance to pests and diseases [7,8]. The heterosis is predicted by both positive gene contribution and environment. Hybridization is also an important way to produce gene diversity. Biologists have always focused on the role of hybridization in evolution. In-depth analyses of heterozygous genotypes showed that heterosis mainly came from the accumulation of a large number of rare superior alleles with positive dominance [9]. Lin et al.
identi ed ve candidate genes involved in high yield in super hybrid rice LYP9, among which heterozygous segments containing qSS7 and qHD8 showed superiority and contributed to heterosis [10]. In maize, the silent expression of Cell Number Regulator 1 gene (CNR1) can increase the size of plant and organ, and become a direct contributor to heterosis [11]. The nucleotide sequence of the plastid gene rbcL and the nuclear gene PgiC were used to study the reticulated evolution of the Dryopteris varia complex in Japan and a haplotype not belong to any existing species was found. It was speculated that it came from an extinct species [12]. This indicates that hybrids may have higher tness due to heterosis, and become an effective way to preserve their genetic diversity. However, the above genetic information related to hybridization are mostly concentrated in nuclear genome. The chloroplast (cp) genome is the second largest genome after the nuclear genome, and has signi cant advantages in the study of phylogenetic evolution of species.
Chloroplast is a uniparentally inherited and stable organelle in green plants, who is often considered to be an informative resource for genetic diversity of plant [13][14][15]. The cp genome always consists of four conserved regions: a pair of inverted repeats (IRs), a small single-copy region (SSC) and a large single-copy region (LSC) [16][17][18]. Among these regions, the length of IRs regions is always diverse and determines the total length of cp genome [19,20]. The cp genome contains a large number of functional genes, which can be divided into three categories: genes related to photosynthesis (Photosystem I: psa; Photosystem II: psb), genes related to gene expression (ribosomal RNA genes and transfer RNA genes, etc.), and other genes related to biosynthesis (ATP synthase gene, NADH dehydrogenase gene, etc.). It has been reported that matK and ccsA genes evolve faster than other gene groups [21,22]. As the two largest genes, ycf1 and ycf2 located at the IR/SC junction and the IRs region, respectively, which proved to be useful for analyzing cp genome variation in higher plants [23]. As the development of sequencing technology, cp genome sequences are more and more available. The phylogenetic analyses based on the cp genome are achievable and revealed qualitative and valuable information to con rm the genetic relationship of different species [24,25]. For instance, rice is an important cereal crop, so cultivating abundant varieties sounds vital for world's population. The new cultivars are always evaluated by comparing to other ones on the a nity that according to the cp genome, and more than 20 species were categorized into 10 genome types [26]. To elucidate the evolution within neotropicalpaleotropical bamboos, the cp genome of woody bamboo Guadua angustifolia was characterized and compared with other species, the results showed that the speciation events of extant species occurred during or after the Pliocene [27]. The complete cp genome sequence of Morus cathayana and Morus multicaulis were obtained and compared with other genus Morus, the results indicated that both natural selection and mutational bias have contributed to the codon bias [28].
Paper mulberry is a member of genus Broussonetia in Moraceae family. The tree displayed the features on fast-growing, easy breeding, resistant to pruning and widely distributing [29][30][31]. Vegetation survey in tailing's area showes that the tree is an excellent native plant having a comprehensive absorption effect for various heavy metals [32]. Meanwhile, the leaves are an important source of pig feed and some chemical substances are also important resources for medicine [33]. Most importantly, as the name implies, because of its long ber and ease of preparation, paper mulberry contributed to the invention of papermaking and then was also used in barkcloth [34]. Based on the multiple uses of paper mulberry, paper mulberry has historically been an important economic tree species and even accompanied by human expansion [35]. At the same time, humans continue to select varieties of paper mulberry. Now, there is some confusion in the name of paper mulberry. Paper mulberry, Broussonetia papyifera, B. kazinoki × B. papyifera are all used to called the tree, while it may not the same tree [36]. In recent year, a hybrid variety for paper mulberry was bred [15,37]. The breeding process is showed in Fig. 1 and in order to standardize the name, we called it as the hybrid paper mulberry (ZJGS) in the study. ZJGS is a hybrid between B. kazinoki and B. papyrifera, whose maternal lineage is B. kazinoki and paternal is B. papyrifera, respectively. With a long breeding process of more than ten years, the researchers rst used modern breeding technology and then loaded the hybrid materials into the spacecraft and bred by space mutation. The mutant materials were optimized and cultivated after the spacecraft returned to the ground, and nally hybrid plants with excellent traits were obtained. However, in recent years, B. kazinoki was found to be probably a naturally occurring hybrid, with B. monoica as the female parent and B. papyifera as the male parent [15]. The complicated hybridization process has blurred the genetic background of paper mulberry and hindered the determination of the classi cation status of the tree and its utilization.
Due to the tree unclear genome is always large, a few forest tree genomes have been sequenced. Among the genetic resources of ZJGS, only unclear genome of B. papyrifera was assembled [38]. Lacking of molecular genetic information about ZJGS may prevent optimal breeding of the tree and the exploration of adaptability mechanisms. In order to further determine the characteristics of the cp genome during the hybridization process, the cp genome of ZJGS was compared with 12 Moraceae species, including B. papyrifera and B. kazinoki. The research is not only helpful for us to further determine the status of hybrid paper mulberry, but also for us to explore the role of chloroplasts in functional selection.

Results
Chloroplast DNA (CpDNA) structure of ZJGS and comparative cp genomic analyses The cp genome of ZJGS is a circular molecule of 160,903 bp in length (Additional le 1: Table S1). 135 functional genes were identi ed in the ZJGS cp genome, including 81 PCGs, 46 tRNA genes, and eight rRNA genes (Table 1). Of the 81 PCGs, 77 unique genes were generated, of which four genes, rps7, rpl23, ndhB, and ycf2, are distributed in the IRs repeat regions (IRa/IRb). 15 tRNA genes and four rRNA genes are also repeated in the IRs regions. In the ZJGS cp genome, a total of eight different genes contain introns, even the longest intron is ndhA and has a length of 1,117 bp (Additional le 2: Table S2). Table 1 List of genes encoded by ZJGS cp genome.
In the Moraceae species of comparison, the cp genome length of all seven Broussonetia (ranges from 160,239 bp to 162,594 bp) is higher than that of the six Morus (ranges from 158,459 bp to 159,265 bp), but the total GC content is relatively lower in the genus Broussonetia, the average value is 35.72%, which in Morus is 36.26% (Table 2). From the perspective of cp genome length, the species that are closer to ZJGS are B. papyifera, B. kazinoki, B. monoica and B. kaempferi, and the GC content of B. kazinoki is most similar to ZJGS. The length ratio of the IRs regions in ZJGS cp genome is the lowest among all the comparative species, which is 16.04%. In addition, the ZJGS cp genome contains the largest number of genes, while the number of genes that direct synthetic proteins is the least, only 81. CpDNA variation and conservation of ZJGS In order to detect the global sequence variability of the ZJGS cp genome, the sequence of ZJGS was used as a reference to compare with the other Moraceae species (Fig. 2). It is found that the ZJGS cp genomic sequence has the highest similarity with B. kazinoki and B. monoica by comparison. Furthermore, the sequence of its paternal B. papyrifera exhibits higher divergence. These highly different regions include matK, rps16, atpF, rpoC2, rpoC1, ycf3, clpP, ndhF, ccsA, ndhA and ycf1. It has become apparent to us that these genes are good sources for interspecies phylogenetic analysis and evolutionary studies. There are also some genes that are relatively conserved among these species, such as atpH, petN, psbE, rpl2, psbH, psbM and rps7. As a whole, the non-coding sequence of the Moraceae plants is more divergent than the coding sequence, and the sequence variation in the IRs regions is smaller than that in the LSC and SSC regions, which also shows that the IRs regions are more conservative than the LSC and SSC regions.
The BRIG image shows that the cp genome sequence of the Moraceae species is conserved (Fig. 3). We use the ZJGS cp genome as the reference sequence, which is distributed in the innermost circle of the concentric circle, and the outermost circle is the PCGs of ZJGS. Here, it can be clearly seen that the cp genome of the six innermost Broussonetia genus species is more closed than the other Morus species in the outer circle compared to the reference cp genome sequence, indicating that the cp genome sequence of Broussonetia species have high similarity with ZJGS. In addition, from the gure we can also intuitively see that the most conservative sequence is located in the IRs regions, and the GC content is also the highest among all species.
Through the previous study of sequence diversity, we know that the IRs regions are the most conserved regions in the cp genome. Most studies have shown that the contraction and expansion of the IRs regions determine the length of the cp genome, and this change directly affects the length and distribution of genes at the boundary of this region [23,[39][40][41]. Intuitive results show that the gene composition in IR/SC junction of ZJGS cp genome is somewhat different from that of several other species (Fig. 4). Among them, the most obvious is that on the left of the IRb/LSC region is the gene rpl23, while the remaining species are rpl2. Same as B. monoica, B. kaempferi, B. luzonica and B. kurzii, in IRa /SSC region, ycf1 is replaced by trnN in ZJGS cp genome. However, the genetic constitution in the two parent species of ZJGS at the IR/SC junction is similar. In terms of gene length,the rpl2 gene (401 bp) in the ZJGS cp genome is signi cantly shortened in the IRa region and is more than three times shorter than other species (1,509 bp), which is related to the IRa boundary contraction.

Repeat sequence comparison
In this study, the SSRs of ZJGS and 12 Moraceae species cp genome were detected (Fig. 5) (Table 3). Mainly re ected in the following SSRs: C/G, AAT/ATT, AAG/CTT, AAAT/ATTT, AGAT/ATCT, ATCG/ATCG, AAATT/AATTT, AATAG/ATTCT, AATAT/ATATT, and AAATGT/ACATTT, which shows the characteristics of maternal inheritance of ZJGS cpDNA. In order to obtain more genetic regulation information, the long repeat sequences of 13 Moraceae plants were further analyzed (Fig. 6A). Four types of repeats were detected in the cp genome of M. atropurpurea and Broussonetia species except for B. luzonica. And the number of palindromic repeat is the largest, then the type of forward repeat and reverse repeat, the least is the complementary repeat. The forward repeats, palindromic repeats and complementary repeats in the ZJGS cp genome all contain the most repeting elements. Repetitive sequences of 20-30 bp in length reach 50% or more ( Fig. 6B-E). It can be clearly seen that the long repeat sequences of the ZJGS cp genome is dominant in this interval, and these repeat elements can be used as important genetic resources for the genetics and phylogenetic of the ZJGS population.  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  total   A/T  ZJGS  -------15  18  20  5  3  6  1  -68   B. papyrifera -------27  11  2  6  2  4  1  1  54   B. kazinoki  -------18  17  19  3  7  3 --67 The form of sequence evolution: Ka\Ks ratio Ka/Ks ratio represent the ratio between non-synonymous substitution (Ka) and synonymous substitution (Ks) at a particular site, which is used to infer the direction and magnitude of natural selection acting on PCGs. A ratio > 1 means positive selection; < 1 means purifying selection; and = 1 indicates no selection [42,43]. According to previous study, except for the most genes with faster evolution rate, the frequency of synonymous nucleotide substitutions is higher than non-synonymous substitutions due to the puri cation selection [40]. In this study, the cp genome sequence of ZJGS was used as reference, the Ka/Ks ratio of Moraceae species was presented (Fig. 7, Additional le 3: Table S3). The Ka/Ks ratio of such genes: atpH, pabC, petG, petN, psaC, psbE-psbI, psbL-psbT, rps7 and rps16 is 0, which re ects the strong puri cation options and also indicates that these genes are highly conserved. On the contrary, most of the NADH genes (ndhA-ndhK) have higher Ka/Ks ratio, and in the comparison of all species, the Ka/Ks ratio of atpF, ndhB and rpoC1 is > 1, these three genes are strongly positively selected. Except for the above three genes, clpP, ndhA, ndhD, petB and petD genes are also positively selected compared with maternal B. kazinoki of ZJGS. While compared with paternal B. papyrifera, ndhA, ndhD, ndhF, petB, rpl16, rpoA and ycf1 genes were positively selected. Among them, petD and rpl16 have the largest Ka/Ks ratio, and both are > 3. These two genes are highly evolved and can be used for subsequent gene identi cation.

Phylogenetic analysis
In the phylogenetic tree based on 77 shared PCGs, ZJGS and its maternal B. kazinoki formed a single clade with high bootstrap support (99%) through three different methods (Fig. 8). All species are divided into four branches according to their evolutionary history. The rst branch is the family of Moraceae, which was divided into three genera, namely, Broussonetia, Ficus and Morus. Here, Broussonetia and Ficus species are highly clustered with bootstrap values ≥ 99%. In addition, ZJGS, B. kazinoki and B. monoica formed a well-supported clade, as a natural hybrid, B. kazinoki also has the closest phylogenetic relationship with its maternal B. monoica. The second branch is Cannabaceae includes C. sativa and H. lupulus, which is closely to Moraceae. Studies have shown that both Cannabaceae and Moraceae belong to the Urticalean Rosid clade [44,45]. The remaining two families, Rosaceae and Fagaceae, belong to Rosanae. In general, the classi cation results provide important information for the phylogenetic of ZJGS, as well as con rm the source of ZJGS hybridization.

Discussion
In this study, we determined and analyzed the cp genome of ZJGS (a hybrid plant) and compared it with the Moraceae species for structural comparison and sequence alignment. From the structure composition of the cp genome, it can be seen that the seven Broussonetia species and the six Morus species are grouped independently and each of them have similar compositions. Among the seven Broussonetia species, total GC content of ZJGS cp genome sequence is the closest to the maternal B. kazinoki, shows high species a nity [41], which is also re ected in the comparison of sequence diversity and SSRs. In addition, the sequence comparison analysis of ZJGS cp genome and other 12 Moraceae plants at the genomic level re ects the high sequence conservation of the IRs regions, so the IRs regions changes can be used as a marker of species evolution [46,47]. At the boundary of the IRs regions, the changes in the length of the rpl2, rps19, trnN and ycf1 re ect the contraction and expansion of the IRs regions. In the evolution of higher plants, all kinds of repetitive elements have a common feature, which tends to show "coevolution", which plays a crucial role in the genome sequence diversity and gene rearrangement [48,49]. Studies have shown that the number of repeats is related to the degree of genome rearrangement [50][51][52]. Here, long repeat sequences in ZJGS cp genome are the largest, and the number of 20-30 bp repeats may promote the rearrangement of ZJGS [53], these repeats can serve as good molecular markers for species evolution, and play an important role in the variation of cp genome sequence.
The breeding process of ZJGS is long and complicated. It has undergone many cross-breeding and space mutation to form excellent plants with stable inheritance. And its maternal B. kazinoki has experienced controversy [15,54], therefore, it is necessary to compare ZJGS and its hybrid parent systematically. Unlike the previous study [36], the cp genome of ZJGS was analyzed in detail. Through comparison, it is found that the total length and total GC content of ZJGS cp genome sequence (160,903 bp and 35.74%, respectively) are closest to its maternal B. kazinoki (160,841 bp and 35.73%, respectively), while its paternal B. papyrifera is 160,239 bp and 35.83%, respectively. This re ects a high degree of species a nity in the history of system evolution [41,55]. Similarly, in the sequence diversity comparison, we also found that the cp genome sequence of ZJGS and maternal B. kazinoki remains highly consistence. This is related to sequence conservation and provides a basis for the species evolution of ZJGS [40,56]. The construction of the phylogenetic tree shows us the most intuitive results, closely related species are grouped into the same branch. And ZJGS is closest to maternal B. kazinoki, which strongly supports the characteristics of cp maternal inheritance [57,58].
As we have known before, when Ka/Ks > 1, the gene is strongly positive selection [42,43]. We can screen some genes according to Ka/Ks ratio and then carry out functional studies, which have been commonly applied to the eld of molecular evolution [59,60]. In our research, three genes of atpF, ndhB and rpoC1 show a complete positive selection. In previous studies, some genes have been reported with a faster evolution rate, including ycf1, ycf2, accD, clpP, ndhA, rbcL, matK, ccsA, and cemA [22,61]. Compared with two cross parents, the same genes (atpF, ndhA, ndhB, ndhD, petB and rpoC1) with positive selection were screened out. Most of the NADH genes (ndhA-ndhK) have a higher Ka/Ks ratio than photosynthesis genes (Photosystem I: psaA, psaB, psaC, psaI, psaJ; Photosystem II: psbA, psbB, psbC, psbD, psbE, psbF, psbH, psbI, psbJ, psbK, psbM, psbN, psbT, psbZ), which is similar to the previous research [55,62], shows that photosynthesis genes have strong puri cation options. NADH genes have higher activity during cellular senescence and oxidative stress [63,64], and early hybrid selection experiments showed that most proteins synthesized by chloroplasts in the early stages of aging are NDH polypeptides [65]. It can be seen that these genes under positive selection belong to the advantageous genes of hybrid plants, and may be the key to speciation.

Conclusions
This study is based on the cp genome of ZJGS, a hybrid species, whose structure and composition are the same as most angiosperms, and the IRs regions are highly conserved.

Comparative analyses of cp genomes structure
Our research group rst reported the complete cp genome of ZJGS and the sequence was submitted to Genebank with the accession number of MF496038. The more experiment details can be got from the announcement [36]. At the same time, 12 Moraceae plants, including six Broussonetia species and six Morus species were selected to compare cp genomes structure with ZJGS. The cp genome information of 13 Moraceae plants was downloaded from NCBI database, and the GenBank accession numbers were listed in Additional le 4: Table S4. Firstly, the characteristics of cp genomes, such as GC content, length of IRs regions and number of genes were compared manually. Secondly, mVista in Shu e-LAGAN mode and Blast Ring Image Generator (BRIG) were employed to compare cp genomes of the above selected 13 plants and ZJGS was set as the reference [66,67]. To explore the evolutionary event of ZJGS, the details of IRs junction regions were detected and displayed, for the IRs regions were considered as the most conserved regions and these junctions were regarded as an index of cp genome evolution.
Simple sequence repeats (SSRs) and long repeat sequences analysis SSRs, which is one of the most widely used molecular marker systems in plant genetic breeding and revealed more polymorphism [68][69][70]. The SSRs within the cp genome of each species were predicted by MISA with the parameters were set as: ≥ 10 for mono-nucleotides, ≥ 4 for dinucleotides, ≥ 3 for tetra-nucleotides, penta-nucleotide and hexa-nucleotide. The number and details of SSRs were showed. Long repeat sequences, including forward match, reverse match, palindromic match and complementary match, within the cp genome, were identi ed by REPuter [71]. The related settings were showed as follows: (1) 90% or greater sequence identity; (2) a minimal repeat size of 20 bp. The graphs were all plotted using software Simplot 12.5.

Molecular evolution analysis
The Ka/Ks ratio is a good indicator of selective pressure at the sequence level, which can calculate selective pressure within protein coding regions.
Here, the Ka/Ks ratio of 77 shared protein-coding genes (PCGs) of 12 Moraceae plants cp genomes were compared with ZJGS. Firstly, the multiple nucleotide sequences of the homologous genes that code for proteins were aligned by MEGA 7 [72], and then the Ka/Ks ratio was calculated using DnaSP v5 [73].

Phylogenetic analysis
In order to clarify the phylogenetic relationship of ZJGS, 27 related species were downloaded from the NCBI for phylogenetic analysis (the GenBank accession numbers were listed in Additional le 4: Table S4). The software MEGA 7 was used for multiple sequences alignment and evolutionary analysis [72]. Three methods were used to infer the evolutionary history and construct phylogenetic tree, including neighbor-joining method (NJ), maximum likelihood method (ML) and maximum parsimony method (MP). The evolutionary distances of NJ method were calculated using the Poisson correction method [74] and were in the units of the quantity of amino acid substitutions per site. In the ML method, initial tree was obtained automatically by applying Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated using JTT model, and then selecting the topology with superior log likelihood value. The MP tree was obtained using the Subtree-Pruning-Regrafting (SPR) algorithm with search level 1 in which the initial trees were obtained by the random addition of sequences (10 replicates

Availability of data and materials
All datasets supporting the conclusions of this article are included within the article (and its Additional les). All sequences can be downloaded from NCBI Genebank (https://www.ncbi.nlm.nih.gov/) with the accession number which has been mentioned in the article.

Competing interests
All the authors declare that they have no con ict of interest.

Authors' Contributions
YG and ZY are the lead investigator and conceived of the study. XZ and ZW wrote the paper and analyzed all the data. HH and ZJ is the lead investigator for the genome sequence. XZ and YG designed and fund for the current study. All authors have read and commented on the article.    The Ka/Ks ratios of 77 shared PCGs in 12 Moraceae cp genomes were compared with ZJGS.