Quick Identication of Genetic Provenances and Best Practice Woodland Management of Chinese Pine (Pinus tabuliformis)

Chinese pine (Pinus tabuliformis) is one of the most widespread, ecologically and economically important tree species in North China. In this study, we analyzed and compared the genetic diversity and population structure of 158 individuals from 17 populations of P. tabuliformis, by group III, a new mitochondrial marker system (nad1-2, nad4-3, nad5-1 and nad7-1) with two other groups of marker systems. In contrast to the conservation in the evolution of the mitochondrial sequence, the new mitochondrial marker system of P. tabuliformis shows as extremely high polymorphism as 25, whose haplotypes are more than four times of the group I marker system (nad1-2, nad4-3, and nad5-1) as 8 haplotypes. Although the group II, nad7-1 (19 haplotypes), showed high resolution in the provenance identication of P. tabuliformis, the new mitochondrial marker system is more accurate for detection of specic groups like HL, WT and NS and powerful to differentiate populations between GD and SS. The results suggested that the new mitochondrial marker system is as high as the resolution of GBS (genotype by seqencing). It is much more available and will be of great help to provenance identication and molecular assisted breeding of P. tabuliformis. This study will make theoretical foundation for the following studies on the evaluation, cultivation and germplasm management of P. tabuliformis populations and aid the breeding, biodiversity and conservation programs of forest species. nuclear genome sequence(Xia et al. 2018) . The resolution of the classical mitochondrial and chloroplast marker is low and dicult to locate the origin of ancient P. tabuliformis by the limited information. Although the resolution of nuclear genome data by GBS (Genotyping by Sequencing) is good, the cost of sequencing and diculty in data analysis is still a problem for technology extension and universally utilization. Although, high level of polymorphism was observed in nad7-1 of P. tabuliformis (Xia et al. 2020), there are some population like GD (Guandi Mountain) and SS (Songshan Mountain) could not be differentiated. In this study, we combined nad1-2, nad4-3, nad5-1 and nad7-1 together to form a new marker system. We compared and analyzed the genetic diversity of different geographic natural populations of P. tabuliformis by 3 groups (nad1-2, nad4-3 and nad5-1; nad7-1; nad1-2, nad4-3, nad5-1 and nad7-1) of mt marker systems. It is expected that the new marker system is easiest, with highest resolution to identify the provenance of P. tabuliformis, which is easy operated and nice for technology extension. Identifying the provenance of nature population, is one of the powerful tools for genetic improvement of forest species. The result will provide direct evidence for provenance identication and the origin of P. tabuliformis. This study will support the theoretical basis for the tree conserva-tion and utilization of germplasm resources, which is of great signicance for protecting high-quality germplasm of forests species. plantation. The study of origin will support the introduction and growth of local P. tabuliformis populations. The new marker system will provide a powerful tool and analysis platform for promoting the identication and utilization of P. tabuliformis germplasm resources, selecting breeding program. Provenance selection is very important for the survival rate and quality of plantation. The adaptability and growth performance of various geographical provenances is signicant different in varied introduction areas. Therefore, this study provides a marker system and platform for the analysis of different provenances of P. tabuliformis plantations, and lays foundation for further directional breeding and planting promotion.


Introduction
Chinese pine (Pinus tabuliformis Carr.), as one of the dominant coniferous and most important reforestation tree species in northern China, is ecological and economic important (Wu,1995;Ying,et al. 2004;Mao, Wang,2011)So far, genetic improvement researches mainly focused on provenance test, seedling breeding, reforestation program and tree physiology, have carried on to keep stable development of the woodland and germplasm management (Chen et al.2019;He et al.2020;Zhang et al.2020). Large breeding and tree improvement programs were initiated recently with the primary aim to increase forest coverage in harsh climate zones (Zhang, et al.1998;Zhang, et al.1999). Long-term provenance trials, according to the phenology, phenotypic variations, growth, germination and other indicators, found that P. tabuliformis across the whole geographic distribution could be divided into seven climatic types (Xu,1992;Chen et al.2019;He et al.2020). Previous studies show evidence of strong local adaptation as transplantation across climate zones results in high mortality and inferior growth of the transplanted populations (Xu,1992;Gray et al.2016;Zhao et al. 2014). So the mining and use of natural local population resources or populations with close genetic relationship will increase survival rate and be more bene cial (Chen et al.2019;He et al.2020).
However, long-term eld experiments and observations are time-consuming and labor intensive (Xu,1992;Rony et al. 2009;Pan et al. 2020). A complementary method is genomic analysis, including organelle genomic data, to reveal differences at the DNA level between different geographical populations without unstable environmental interference, and are effective for evaluating the genetic diversity of germplasm, population dynamics in conservation and breeding programs (Meng et al.2007;Neale et al.2014;Luo et al.2020;Zurn et al. 2020). Considering the cost and di culty of genomic analysis Holliday et al.2017, Xia et al. 2018 especially for non-model forest plants, organelle sequences analysis is a convenient choose for identi cation of particular population and the determination of their purity (Du,et al.2009;Zhou et al.2010) There have been many reports and studies on the diversity of P. tabuliformis germplasm resources by molecular marker (Chen et al.2008;Wang et al.2011;Hao et al. 2018).However, most of the studies focused on the pedigree genetic structure and demographic history of P. tabuliformis. The reported mitochon-drial markers nad1-2, nad4-3 and nad5-1, chloroplast gene fragments rpl16 and trnS-trnG re ect the low level of genetic variation of P. tabuliformis (Chen et al.2008;Wang et al.2013;Hao et al.2018)especially compared with the genetic variation provided by the nuclear genome sequence (Xia et al. 2018) . The resolution of the classical mitochondrial and chloroplast marker is low and di cult to locate the origin of ancient P. tabuliformis by the limited information. Although the resolution of nuclear genome data by GBS (Genotyping by Sequencing) is good, the cost of sequencing and di culty in data analysis is still a problem for technology extension and universally utilization. Although, high level of polymorphism was observed in nad7-1 of P. tabuliformis (Xia et al. 2020), there are some population like GD (Guandi Mountain) and SS (Songshan Mountain) could not be differentiated. In this study, we combined nad1-2, nad4-3, nad5-1 and nad7-1 together to form a new marker system. We compared and analyzed the genetic diversity of different geographic natural populations of P. tabuliformis by 3 groups (nad1-2, nad4-3 and nad5-1; nad7-1; nad1-2, nad4-3, nad5-1 and nad7-1) of mt marker systems. It is expected that the new marker system is easiest, with highest resolution to identify the provenance of P. tabuliformis, which is easy operated and nice for technology extension. Identifying the provenance of nature population, is one of the powerful tools for genetic improvement of forest species. The result will provide direct evidence for provenance identi cation and the origin of P. tabuliformis. This study will support the theoretical basis for the tree conserva-tion and utilization of germplasm resources, which is of great signi cance for protecting high-quality germplasm of forests species.

Data collection
The data in this study was collected from Wang et al. in 2011(Wang et al. 2011) and Xia et al. in 2020(Xia et al. 2020. The data in Wang et al. in 2011(Wang et al. 2011 included the sequences of nad1-2, nad4-3 and nad5-1 of 184 individuals from 17 populations of P. tabuliformis. And the data in Xia et al. in 2020(Xia et al. 2020 included the sequences of nad7-1 of 158 individuals from 17 populations. After removing the individuals with missing data, 158 individuals from 17 populations, all of which containing the the four mitochondrial segments (nad1-2, nad4-3, nad5-1 and nad7-1), were extracted for the following analysis.

Data analysis
The alignment of the sequences have been made in Clustal x 1.81 Thompson et al.1997). Then different sequences were obtained by manual proof. Both point mutation and insertion deletion were processed in the same proportion, and each insertion deletion was independent of other insertion deletions. A haplotype can be understood as a combination of all nucleo-tides on a DNA fragment different from other haplotypes. Vcftools v1.012 (Danecek et al. 2011).was used to calculate each population. Genetic diversity parameters have been made: individual number (N) and haplotype diversity (nh). At last, the comparison of three groups of mark systems, group 1 (nad1-2, nad4-3 and nad5-1), group 2 (nad7-1) and group3 (nad1-2, nad4-3, nad5-1 and nad7-1) have been made.
The genetic content of the populations ZW1, ZW2, NS and GY in the southern zone are complex and different from each other, composed of M8-H19, M3-H5, M9-H16 and M8-H14, respectively.
The results were consistent with the population structure obtained by marker group I (nad1-2, nad4-3, nad5-1) (Table S1) and marker group II (nad7-1) ( Table  S2) that the mitochondrial genetic structure of P. tabuliformis can be divided into three large groups, indicating the stability of the new marker system. However, the marker group III, the new marker system can further distinguish some isolated and independent groups which could not be separated on marker group I and group II (Fig. 1). For example, so far GD (Guandi Mountain) and SS (Songshan Mountain) can only be distinguished at the level of nuclear genomic sequence (Xia et al.2018), now can be simply identi ed by mitochondrial markers. And compared with marker group I (nad1-2, nad4-3, nad5-1) (Table  S1) and marker group II (nad7-1), more haplotypes were identi ed in HL (Helan Mountain), WT (Wutai Mountain) and NS (Ningshan Mountain) by the new marker system, which shows much higher resolution.

Discussion
Polymorphism distribution of mitochondria Generally, the variation pattern of mitochondrial genome is similar to that of nuclear gene that it is divided into three major distribution regions as the south, the north and the west. From the perspective of population genetic diversity, the genetic variation of popula-tion in the central and western regions of P. tabuliformis are higher than the eastern regions, and the southwest population has the most abundant genetic resources. Among the southwest populations, the genetic variation of NS (Ningshan) is the highest. The results are in consistence with observation got by previous long-term provenance experiments and eld studies (Chen et al.2019;He et al.2020;Xu et al.1992;Zhao et al.2014). It is indicated that NS might be one of the origins of ancient P. tabuliformis. The Southwest China, as part of the "Sino-Japanese Floristic Region", has a complex topography and plant diversity. There are superior natural conditions, complex habitats and Climate conditions are very suitable for the growth of local P. tabuliformis. In recent years, due to intentional protection, human beings have less damage to the original habitat, which has a positive effect on the maintenance and accumulation of genetic variation.
A new mitochondrial marker system for e cient provenances identi cation of P. tabuliformis So far, the mitochondrial fragments widely used in P. tabuliformis is nad1-2, nad4-3 and nad5-1. They have relatively low levels of variation among populations, as only two to four haplotypes when each fragment used alone and eight haplotypes when the three fragments combined (Wang et al. 2011). We found a high level of variation in the nad7-1 region of P.tabuliformis in a recent study, which can identify the maternal sources of different provenances, twice the sum of nad1-2, nad4-3 and nad5-1 (Xia et al.2020). In this study, we combined the four markers (nad1-2, nad4-3, nad5-1 and nad7-1) to form a new system and found the genetic structure of P. tabuliformis with obvious geographical pattern. By comparing the three groups of mitochondrial marker system, it is found that the number of haplotypes in identi ed by group III system (nad1-2, nad4-3, nad5-1 and nad7-1) is 25, signi cantly more than that of group I (8 haplotypes ) and nad7-1.
Although nad7-1 could achieved a relatively high resolution (19 haplotypes), the new marker system (nad1-2, nad4-3, nad5-1 and nad7-1) is more accurate in the detection of speci c populations like HL, WT and NS, and can even identi ed the populations like GD and SS, which could not be distinguished by nad7-1. We have developed GBS (Genotyping by sequencing) to study the population structure and genetic variation of P. tabuliformis (Xia et al., 2018). The resolution of the new marker system (nad1-2, nad4-3, nad5-1 and nad7-1) with rich length variation is comparable to that of the GBS. However, the new mitochondrial system does not need do the sequencing in the company, while as simple as the PCR run gel in the lab and easy to complete in every lab. The results of this study will provide the most e cient method for identifying the maternal origin of P. tabuliformis, and provide technical support for molecular assisted breeding of improved varieties.
By years of investigation and collection of germplasm resources, we conducted management and protection of P. tabuliformis for a long time. In this study, the new mitochondrial system composed by the developed nad7-1 and three classical markers (nad1-2, nad4-3 and nad5-1) was reported for the rst time, which could identify the genetic differ-ences and population genetic structure characteristics among P. tabuliformis populations much more effective and easier. The molecular marker system obtained in this study is good to identify genetic variation, which provide primer support and technology extension for the following study of P. tabuliformis plantation. The study of origin will support the introduction and growth of local P. tabuliformis populations. The new marker system will provide a powerful tool and analysis platform for promoting the identi cation and utilization of P. tabuliformis germplasm resources, selecting breeding program. Provenance selection is very important for the survival rate and quality of plantation. The adaptability and growth performance of various geographical provenances is signi cant different in varied introduction areas. Therefore, this study provides a marker system and platform for the analysis of different provenances of P. tabuliformis plantations, and lays foundation for further directional breeding and planting promotion.

Prospect
Since the P. tabuliformis is distributed in the area with advanced cultural development, most natural forests have been damaged due to various economic and social disturbance. The remaining natural forests are precious for research and production, as a repository for various genetic resources in breeding program. We should protect the decreasing wild resources of P. tabuliformis and expand high-quality plantation. In production and afforestation, more attention should be paid to provenance selection. For example, in the process of seed testing and seeding treatment, different measures and standards should be taken according to provenances, to save seeds and improve the yield and quality of seedlings, so that greater economic bene ts can be achieved.

Disclosure statement
No potential con ict of interest was reported by the authors. Tables   Table 1. Geographical locations, sample size (N), number of haplotypes (nh) of the P. tabuliformis populations identi ed by mark system (nad1-2+nad4-3+nad5-1+nad7-1) Group III marker system (nad1-2+nad4-3+nad5-1+nad7-1) Pie charts show the proportions of mitotypes by nad1-2, nad4-3, nad5-1 and nad7-1 system Note: The designations employed and the presentation of the material on this map do not imply the expression of any opinion whatsoever on the part of Research Square concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries. This map has been provided by the authors.