Population Transcriptomes Reveal the Interspecic Adaptive Genetic Differentiation of Liriodendron by Landscape Genetics

Background: Adaptive genetic differentiation is a hotspot in the research of speciation mechanisms in evolutionary biology. Genomic resources are important for detecting ecological adaptive evolution of non-model plants. Using RNA-seq for non-model plants is a good approach to obtain their genomic resources. The combination of population transcriptome resources and environmental data can provide insights into the genetic mechanism of adaptive genetic differentiation. Results: Based on the population transcriptome data, we investigated the spatial distribution of genetic variations in Liriodendron to detect relationships between ecological factors and genetic differentiation. Environmental data and genetic variations from 17 populations were integrated to detect the population structure, adaptive genes and key environmental factors that shape the population genetic structure by landscape genetic approach. Here, we identied 16592 high-quality single nucleotide polymorphisms (SNPs). The population structure analysis results showed that 17 populations were divided into three groups: L. tulipifera, eastern group and western group of L. chinense. Redundancy analysis and latent factor mixed model analysis suggested that precipitation seasonality, precipitation in the driest quarter, diurnal temperature, and solar radiation in May were closely associated with the adaptive genetic differentiation of Liriodendron. Ecological niche differentiation analysis implied signicant ecological niche divergence between L. chinense and L. tulipifera habitats. In total, 858 environment-related loci were identied, which were associated with 464 genes. Pathway enrichment analysis revealed that these genes were signicantly enriched in multiple biological pathways. Related studies conrmed that these biological pathways play vital roles in plant growth, development, stress, immune response and photosynthesis. Conclusions: Our research provided empirical evidence that environmental factors may play a key role in driving adaptive genetic differentiation of species. Furthermore, the combination of population transcriptome resources and environmental datasets provides new insights into the study of adaptive genetic differentiation of species.


Background
Adaptive genetic differentiation is a hotspot in the research of speciation mechanisms in evolutionary biology [1][2][3][4][5][6][7]. Ecological adaptive differentiation is a crucial mechanism of natural selection in the process of speciation, especially in plants [8][9][10]. Due to the limited propagation distance of plant pollen and seeds, it is impossible to seek advantages and prevent harm through rapid migration, similar to animals. Therefore, ecological adaptive differentiation is more common in plants than in animals [11][12][13][14][15]. Ecological processes are key to adaptive differentiation and speciation when the selection of environmental differences leads to the continuous evolution of barriers to gene ow between populations [1]. Adaptive differentiation refers to the differentiation of species from heterogeneous environments under selection pressure [16]. In response to these selective pressures, plants will undergo adaptive evolution. During the long-term adaptive evolution of plants, signal perception, signal transduction, and gene expression eventually manifest phenotypic, stress resistance and phenological changes [17]. The essence behind these changes is the adaptive evolution of genes involved in the formation of these traits.
Identifying these genes related to ecological adaptation and ecological factors that affect the local adaptation of plants will provide insight into the genetic mechanism of ecological adaptation [18].
In recent years, the complex evolutionary relationship and interaction mechanism between biological evolution and the environment have been widely con rmed in molecular ecology. The environment can affect the neutral evolutionary process (mutation, drift and gene ow) through various evolutionary factors (such as ecological isolation and chemical mutagenesis), thereby affecting the evolution of species or populations [19]. Furthermore, continuous environmental selection pressure can also act on the xation process of speci c genes, thus affecting the adaptive evolution of species or populations and even leading to the formation of new species [20][21]. Adaptive evolution is an evolutionary force that has received wide attention in recent years. According to relevant studies, it has become one of the most important factors affecting the evolution of species in addition to mutation, drift and gene ow, and its contribution to the evolution of species generally reaches 5-10% [22]. However, it is di cult for researchers to comprehensively understand the mechanism of biological evolution and its relationship with the environment due to the limitations of the current human cognitive level and technological means. In addition, there is no comprehensive and systematic method to reveal the genetic mechanism of adaptive genetic differentiation due to the short period of research on ecological adaptation and different research focuses [23].
With the development of high-throughput sequencing technologies, it is convenient to obtain genomic data from populations. Combining environmental factor data with genomic data and using population genetics statistical methods will provide researchers with rich information about the correlation between environmental characteristics and species adaptive evolution and details of their interactions [24][25][26][27]. Although genomic data cover a wealth of genetic information, for forest tree species and non-model species with large genomes, population genome sequencing also attracts a huge cost. For researchers, using RNA-seq strategy for forest tree species and non-model species is a good potential approach to obtain their genomic resources. Therefore, using population transcriptome sequencing may be a suitable strategy for adaptive evolution studies. First, transcriptome data have the advantage of moderate data size compared with genomic data and traditional genetic marker data, and the exon regions of genes are mainly obtained [23]. Second, transcriptome data can be used for various downstream analyses, such as genetic marker exploiting, candidate gene identi cation, QTL mapping, and comparative genomics. Third, transcriptome sequencing is more economical than whole genome sequencing. Currently, research on the ecologically adaptive evolution of species through population transcriptome sequencing strategies is increasingly favored by investigators. Studies have been conducted on the adaptive evolution of species using population transcriptome strategies. In these studies, the researchers used transcriptome sequencing to identify genes involved in adaptive evolution and even found that some species have undergone adaptive differentiation [28][29][30][31][32]. These studies further con rm that it is feasible to study the adaptive evolution of species using transcriptome sequencing strategies. At present, although the transcriptome method has been recognized and developed to study the adaptive evolution of species, most of the studies are mainly on animals or model plants, whereas few studies have been conducted on non-model plants, especially tall forest trees.
Nearly half of the Earth's land is covered by forests, and many tree species play foundations or key roles [33]. Forest trees not only provide survival resources for other organisms but also play vital roles in maintaining ecological balance. Approximately 3/4 of the terrestrial biomass on earth comes from forests, and forests are closely related to the carbon budget in the atmosphere [34]. Therefore, the stability of forest ecosystems plays a key role in the survival of other organisms. Ecological adaptive differentiation is a very important mechanism of natural selection in the process of speciation, especially in the formation of forest trees [8][9][10]. Ecological adaptive differentiation affects the stability of forest ecosystems by changing the diversity of forest trees [34][35][36][37]. Therefore, it is important to understand the genetic mechanism of the ecological adaptation differentiation of forest trees.
Liriodendron is a deciduous broad-leaved tree of Magnoliaceae. The genus includes two extant sister species, namely, Liriodendron chinense and Liriodendron tulipifera [38][39][40]. These two sister species represent a well-known example of an eastern Asian and North American disjunct distribution of temperate deciduous forest species [41]. L. chinense occurs in small, widely scattered populations in subtropical China and northern Vietnam, while L. tulipifera is proli c throughout the southeastern United States [40][41][42]. Extensive fossil evidence indicates that Liriodendron was common throughout the Northern Hemisphere in the Tertiary period [39,[43][44][45], while the genus currently consists of only two sister species. Liriodendron is a shade-intolerant tree that requires adequate sunlight and precipitation during the growing season [46]. Liriodendron generally blooms in May. L. tulipifera is a fasting-growing tree indigenous to southeastern North America but is planted worldwide because acclimates to various environments and sequesters large amounts of carbon dioxide due to its developed root system [47]. Compared with L. tulipifera, L. chinense grows more slowly [48]. Due to endangered habitats, intense interspeci c competition, low seed viability, and arti cial interference, L. chinense is restricted to southern areas of the Yangtze River of China and has been listed as a secondary threatened species in China [40,42]. Previous studies found that the adaptation of the two species is very different [49][50][51]; consequently, these are ideal materials for studying the evolution and adaptive differentiation of East Asia-North American discontinuously distributed plants. Because there is no research on the adaptive evolution of Liriodendron, the genetic mechanism of the interspeci c adaptive genetic differentiation of this genus is unclear.
Although the genetic differentiation between and within species of Liriodendron has been identi ed using genome-wide markers, only one individual was used per population, and most of the SNP markers obtained were not in the exon regions of genes [52]. In this study, 80 Liriodendron individuals from 17 populations were sampled, with 2-5 individuals per population. Here, we report a population-level transcriptome study to investigate the relationship between interspecies genetic differentiation and the adaptive divergence of Liriodendron using a landscape genetics approach. First, we obtained at least 8 Gb of clean data for each of the 80 individuals and implemented SNP calling of cDNA. These single nucleotide polymorphisms (SNPs) were used to examine the population genetic structure. Then, constrained linear ordination analysis, i.e., redundancy analysis (RDA), was performed to identify environmental factors related to adaptive genetic differentiation. In addition, spatial evolutionary and ecological vicariance analysis (SEEVA) was performed to detect ecological niche divergence between groups of Liriodendron. We identi ed the potential adaptive loci by performing correlation analysis between environmental factors and genetic variations in the background levels of population structure.
Finally, the genes where these adaptive loci are located were identi ed, and these genes were subjected to pathway enrichment analysis to mine genes related to environmental adaptation. Furthermore, the characteristics of leaf phenotypes were also investigated and analyzed. The study aimed to (1) identify population genetic structure of Liriodendron, (2) evaluate the role of environmental factors on the adaptive genetic differentiation of Liriodendron, (3) exploit adaptive genes, and provide an empirical reference for studying the genetic basis underlying the adaptation. The combination of population transcriptome resources and environmental data is expected to provide new insights into the study of adaptive genetic differentiation of species.

RNA-seq and SNP calling
The cDNA library of 80 individuals of Liriodendron was successfully constructed, and RNA sequencing generated a total of 832.16 Gb of clean data, with each individual having at least 8 Gb of clean data (Additional le 1: Table S1). With the genome sequences of L. chinense as the reference genome [52], 16592 high-quality SNPs were identi ed by SNP calling with strict ltering.

Population genetic structure
Based on the 13990 neutral SNPs (removing outlier loci identi ed by Arlequin and BayeScan from 16592 SNPs) and using Admixture, we observed three separate ancestry groups: L. tulipifera in North America (NA), L. chinense in eastern China (CE) except for the XN population, and L. chinense in western China (CW) (Fig. 1A). The Admixture plots and cross-validation error of each K (K = 2 to 10) are shown in Additional le 2: Fig. S1 and Additional le 3: Fig. S2, respectively. For the PCA (principal component analysis), the rst principal component (PC1) separated L. chinense from L. tulipifera (P = 1.68 E-15), while the second principal component (PC2) separated CE from CW except for the XN population (P = 4.27 E-36) (Fig. 1B). The neighbor-joining tree further revealed that 17 populations were divided into three clusters, NA, CE, and CW, except for the XN population of CE clustering into CW (Fig. 1C). NA was genetically closer to CW than to CE. The AMOVA results also showed that genetic variation mainly existed among groups (68.78%, F CT = 0.69, P < 0.00001; Table 1). The genetic diversity analysis results showed that the genetic diversity of L. tulipifera populations is higher than that of L. chinense populations as a whole ( Table 2). The genetic differentiation statistics results showed a high degree of genetic differentiation between L. tulipifera and L. chinense (Fig. 1D) (Fst = 0.86111). In addition, obvious genetic divergence was observed between CE and CW (Fst = 0.16935). Furthermore, we found that the genetic differentiation between NA and CE was slightly higher (Fst = 0.76841) than that between NA and CW (Fst = 0.75842).    To provide insight into the roles of environmental factors in population genetic structure and their contribution to this population structure, RDA was performed. The RDA results are shown in Table 3 and Fig. 2. First, the interactive-forward-selection analysis detected the top 16 most in uential environmental variables from 55 environmental variables. The 16 environmental variables included two temperature variables (bio2 and bio7), three precipitation variables (bio13, bio15 and bio17), 6 solar radiation variables (srad1, srad2, srad5, srad6, srad7 and srad10) and 5 wind speed variables (wind2, wind7, wind8, wind9 and wind10). The RDA results showed that the correlation between 16592 alleles and 16 environmental variables was 1.000 on axes 1 and 2. The eigenvalue ratios of axes 1 and 2 were 32.10% and 8.38%, respectively. Sixteen environmental factors divided the 17 populations of Liriodendron into three groups. Axis 1 (RDA 1) separated L. chinense from L. tulipifera, while axis 2 (RDA 2) separated CE from CW except for the XN population ( Fig. 2). Among the 16 environmental factors, 8 environmental factors had a signi cant in uence on the population genetic structure of Liriodendron (Table 3). The eight environmental factors included one temperature variable (bio2), three precipitation variables (bio13, bio15 and bio17), three solar radiation variables (srad2, srad5 and srad6) and one wind speed variable (wind2). The results suggested that precipitation and light play key roles in shaping the population structure of Liriodendron.  Ecological niche differentiation analysis The SEEVA results showed that 22 ecological factors undergo signi cant divergence (D > 0.75, P < 0.0016) between L. chinense and L. tulipifera habitats ( Fig. 3; Additional le 4: Table S2). In addition, 6 ecological factors have diverged signi cantly (D > 0.75, P < 0.0016) in the eastern and western habitats of L. chinense ( Fig. 3; Additional le 4: Table S2).

Leaf shape analysis
The results of leaf shape analysis showed that the leaf shape variation was the most abundant in CW, and the variation range covered almost all the leaf variation in CE and NA (Fig. 4). However, CE and NA have common leaf variations and unique leaf variations. Furthermore, the statistical results of the number of cracks in leaves showed that the number for L. chinense ranged from 3 to 6, and 92.14% of the leaves had 3 cracks. However, the number of cracks in the leaves of L. tulipifera ranged from 3 to 9, among which 5 and 3 were common, accounting for 63.10% and 16.67%, respectively (Additional le 5: Table S3). The vertical distance from the right lateral lobe to the primary vein. y1: The vertical distance from the right lateral sinus to the leaf base. y2: The vertical distance from the right apical lobe to the leaf base) (Con dence interval: 95%).

Characterization of environment-associated loci
There were 1917 and 1808 outlier sites identi ed by Arlequin and BayeScan, respectively (Additional le 6: Table S4 and Additional le 7: Table S5). In total, 1123 outlier loci were identi ed by both Arlequin and BayeScan. The results of the environment-associated analysis showed that 858 EAL associated with at least one environmental factor were identi ed by LFMM (Additional le 8: Table S6). Furthermore, LFMM analysis showed that precipitation seasonality (bio15), elevation, precipitation in the driest quarter (bio17), precipitation in the warmest quarter (bio18), the mean diurnal range of temperature (bio2), solar radiation in May (Srad5) and water vapor pressure in July (vapr7) were associated with the highest numbers (> 70) of EAL (Additional le 8: Table S6).

GO and KEGG enrichment analysis
The genomic contexts of the 858 EAL were determined based on the L. chinense reference genome [52]. We found that 667 EAL were associated with 464 genes, and 191 EAL corresponded with the intergenomic regions (Additional le 8: Table S6). The results of GO annotation showed that these genes represented a broad range of biological processes, molecular functions and cellular components (Additional le 9: Fig. S3). Furthermore, some genes are considered to be involved in the response to biotic and abiotic stresses, phenology, disease resistance and hormonal responses (Additional le 10: Table S7), such as ARF1 (ethylene and cytokinin responses) [53], RAP2-7 (ethylene-responsive transcription factor), RP/EB family member 1B-like (response to wounding) [54], FRIGIDA-like ( owering time) [55], UVR8 (photomorphogenesis and stress acclimation) [56] and RGA2-like (disease resistance protein) [57], were identi ed in other plants. In addition, we found some genes involved in the response to biotic and abiotic stresses, stimuli, immune responses, growth and development, and programmed cell death by Gene Ontology annotation (Additional le 11: Table S8).
The enriched GO terms biological process, cellular component and molecular function are shown in Additional le 9: Fig. S3. Genes of GO terms such as 'inorganic diphosphatase activity', 'protein transport', 'deoxyribonucleotide metabolic process', 'protein secretion', and 'ubiquitin-like protein-speci c protease activity' were highly enriched. Pathway enrichment analysis revealed that genes involved in plantpathogen interactions, phosphatidylinositol signaling systems, ubiquitin-mediated proteolysis, carotenoid biosynthesis, alpha-linolenic acid metabolism and phagosomes were signi cantly enriched (Fig. 5).
Related studies have shown that these biological pathways play vital roles in plant growth, development, stress, immune response and photosynthesis.

Discussion
Origin and population genetic differentiation of Liriodendron Based on allozyme polymorphisms, cpDNA, and fossil evidence, Parks and Wendel rst proposed genetic differentiation between L. tulipifera and L. chinense [39] and suggested that the separation between L. chinense and L. tulipifera occurred 10-16 Ma ago [39]. In our study, signi cant genetic differentiation was also detected based on SNPs from transcriptome data in Liriodendron. Furthermore, we also found an obvious east-west genetic divergence in L. chinense (Figs. 4A-C). This result is consistent with previous reports [52,[58][59][60]. Additionally, Yang used cpDNA and inferred that the divergence time of the east-west lineage was 0.443 Ma ago [60]. In this study, we found that the genetic differentiation between NA and CE was slightly higher (Fst = 0.76841) than that between NA and CW (Fst = 0.75842). The neighbor-joining tree further revealed that NA is genetically closer to CW than to CE (Fig. 1C). Interestingly, the results of leaf shape analysis showed that the leaf shape variation was the most abundant in CW, and the variation range covered almost all of the leaf variations in CE and NA (Fig. 4). The results suggested that CW may be an older ancestral group. Both CE and NA originated from CW.
In addition, geological paleontologists found Liriodendron pollen fossils of the early Cretaceous Aptian-Albian in the white crane cave in GuangZhou, China [61]. This is the oldest fossil record of Liriodendron ever found. Later, researchers found more Liriodendron fossils of the late Cretaceous in Asia, Europe and North America [62,63]. Fossil evidence has shown that Liriodendron is widely distributed across the Northern Hemisphere and reached its maximum distribution during the Tertiary period [39,[43][44][45]. Therefore, we inferred that Liriodendron probably originated in southern China and moved westwards via West Asia, Europe, and the North Atlantic Road Bridge to North America and eastwards via Japan, Russia, and the Bering Land Bridge to North America. The genus was widespread in the Northern Hemisphere in the Late Cretaceous and reached its maximum distribution in the Tertiary period. With mid-Cenozoic global cooling [64], Liriodendron species migrated to low and middle latitudes, which caused the extinction of populations in Europe and Siberia. The populations that were continuously distributed in the Northern Hemisphere were divided into two groups and retreated to eastern Asia and North America. This resulted in the interruption of gene ow and the genetic differentiation of Liriodendron between East-Asia and North America. Subsequently, with the repeated glacial and interglacial cycles during the Quaternary period, North America, Europe and East Asia were covered by ice sheets at mid-high latitudes [65,66]. Ultimately, the populations of Liriodendron mainly retreated to the subtropical areas of China and southeastern United States, forming the current geographical pattern. Furthermore, the statistical results of the number of cracks of leaves showed that leaves with three cracks are the primary leaf shape in L. chinense, compared with ve cracks in L. tulipifera ( Fig. 3; Additional le 5: Table S3). This result con rmed the obvious leaf shape differentiation between L. chinense and L. tulipifera.
For L. chinense, the long-term in situ refugia pattern may have caused genetic differentiation among the isolated populations [67]. In this study, the population structure analysis showed an obvious genetic divergence between CE and CW, but not the XN population (Figs. 4A-C), which is consistent with the pattern of their natural geographical distribution [40,42,68]. This result is consistent with our previous research showing that there are shared haplotypes between the XN population and CW based on cpDNA and adaptive gene sequences [67]. This may be related to the geographical location of the XN population, which is closer to CE between CE and CW and may have experienced recent historical gene ow with CW.
Therefore, we guess that ecological processes may play key roles in adaptive differentiation and speciation when the selection of environmental differences leads to the continuous evolution of barriers to gene ow between populations.

Genetic differentiation and environmental adaptive divergence
In previous studies, it was found that the heterogeneity of the species' habitats would cause the species to undergo adaptive evolution under different environmental gradients and could even lead to adaptive genetic differentiation and the formation of new species [6,[28][29][30][31][32][69][70][71]. In this study, the RDA results showed that 16 environmental factors divided 17 populations of Liriodendron into three groups. Axis 1 separated L. chinense from L. tulipifera, while axis 2 separated CE from CW (Fig. 2). Among the 16 environmental factors, 8 had a signi cant in uence on the population genetic structure of Liriodendron (Table 3). Furthermore, LFMM analysis showed that seven environmental factors were associated with the highest number of EAL (Additional le 8: Table S6). Here, we found that four environmental factors (bio2, bio15, bio17 and srad5) play key roles in the environmental adaptive genetic differentiation of Liriodendron according to RDA and LFMM analyses. In addition, SEEVA results showed that these four environmental factors were signi cant differentiation between L. chinense and L. tulipifera, and had different levels of divergence between eastern and western groups of L. chinense (Additional le 4: Table  S2). In conclusion, we speculate that these four environmental factors play the most important roles in the adaptive genetic differentiation of Liriodendron.
Diurnal temperature plays an important role in the growth and development of plants. Plants photosynthesize during the day to produce sugars and other organic matter. At night, there is no light, and plants consume organic matter through respiration. If the temperature is lower at night, the plant's respiration is weakened, and organic matter accumulates, providing more nutrients for the growth and development of plants. Therefore, a large diurnal temperature will be more conducive to plant growth and development than a small diurnal temperature. In our study, we found a wide difference in the mean diurnal range of temperature (bio2) between southeastern America and subtropical China. Compared with subtropical China where L. chinense is located, the southeastern United States where L. tulipifera is located has a larger average diurnal temperature range (Additional le 4: Table S2). This also con rms why L. tulipifera grows faster than L. chinense [47,48].
Subtropical China has a subtropical monsoon climate with four distinct seasons: hydrothermal same season, dry in winter and rainy in summer. The southeastern United States has a subtropical monsoon humid climate and is warm and humid in winter and rainy in summer. Different climate types in the two regions cause discrepancies in precipitation seasonality (bio15) (Additional le 4: Table S2), which will inevitably have various effects on species evolution in the two regions. We also found a wide difference in precipitation of the driest quarter (bio17) between the southeastern United States and subtropical China. Compared with subtropical China, the southeastern United States has higher precipitation in the driest quarter (Additional le 4: Table S2). This suggests that L. chinense in subtropical China may have better drought resistance. Previous studies also found that L. chinense is sensitive to water conditions and is not resistant to ooding [49,50].
Light plays vital roles in the growth, development and morphogenesis of plants. In this study, there was an obvious difference in solar radiation in May (srad5) between the southeastern United States and subtropical China. Compared with subtropical China, solar radiation in the southeastern United States is stronger in May (Additional le 4: Table S2). Interestingly, the owering period of Liriodendron is also in May. This implies that different levels of solar radiation may have various effects on the owering period of L. chinense and L. tulipifera. An investigation found that the owering period of L. tulipifera begins in late spring to summer and lasts for 2-6 weeks [46], which may be related to the stronger solar radiation in the southeastern United States. In addition, the results of the ecological niche differentiation analysis indicated signi cant ecological niche divergence between L. chinense and L. tulipifera habitats. Conclusively, our research suggests that the divergent selection of environmental differences plays a key role in the interspeci c genetic differentiation of Liriodendron.
Furthermore, the results of genetic diversity analysis showed that the genetic diversity of L. tulipifera populations is higher than that of L. chinense populations. This may be because L. tulipifera is widely and continuously distributed in the southeastern United States, with frequent gene ow between populations, while L. chinense is distributed in small and scattered populations in subtropical China, with isolation among populations, which suggests the reason why L. tulipifera has stronger adaptation. Greater genetic variations provide the genetic basis for the widespread adaptation of L. tulipifera to various environments, which is consistent with previous research [51].

Signatures of natural selection
Both the results of the outlier analysis and environmental association analysis identi ed 5.17% (858/16592) candidate loci that deviate from selective neutrality and relate to environmental adaptation.
Among the 858 loci, 667 loci are associated with 464 genes, and 191 loci correspond to the intergenomic regions. We found that some genes are involved in the response to biotic and abiotic stresses, phenology, disease resistance and hormonal response in other plants (Additional le 10: Table S7). For example, UVR8 is involved in photomorphogenesis and stress acclimation in Arabidopsis [56], and RP/EB family member 1B-like is involved in the response to biotic wounding in Nicotiana attenuata [54]. FRIGIDA-like is involved in owering time in Arabidopsis [55], and RGA2-like encodes a disease resistance protein in Hevea brasiliensis [57]. In addition, ARF1 plays an important role in ethylene and cytokinin responses in Arabidopsis thaliana [53]. MPK6 is involved in ethylene biosynthesis, oxidative stress and osmotic stress in plants. Furthermore, we found that the loci of these genes were associated with speci c environmental factors (Additional le 10: Table S7). For example, FRIGIDA-like (Scaffold580_7851536) is associated with solar radiation, water vapor pressure and wind speed. UVR8 (Scaffold3419_3526874 and Scaffold3419_3527115) is associated with elevation, solar radiation and diurnal temperature. MPK6 (Scaffold291_7154415) is associated with elevation, precipitation seasonality and water vapor pressure.
The identi cation of these adaptive genes suggests that it is feasible to exploit adaptive genes using a landscape genetics approach.
Furthermore, pathway enrichment analysis revealed that genes involved in plant-pathogen interactions, phosphatidylinositol signaling systems, ubiquitin-mediated proteolysis, carotenoid biosynthesis, alphalinolenic acid metabolism and phagosomes were signi cantly enriched. (Fig. 5). The plant-pathogen interaction pathway plays vital roles in the immunity of Arabidopsis [72]. In addition, the phosphatidylinositol signaling system pathway relates to a multitude of cellular processes with key importance for plant function, development and regulation of energy, carbon metabolism and stress response [73][74][75]. The ubiquitin-mediated proteolysis pathway is involved in the immune response in Arabidopsis [76], and the carotenoid biosynthesis pathway plays important roles in fruit development, petal coloration and photosynthesis [77][78][79]. The alpha-linolenic acid metabolism pathway relates to salt tolerance and cold stress [80,81], and the phagosome pathway is involved in environmental stress [82]. These results suggested that most of the variation loci obtained from environmental correlation analysis are indeed closely related to the adaptation of species. It was further con rmed that the combination of population transcriptome resources and environmental datasets is feasible to study the adaptive genetic differentiation of species by landscape genetic approach. These results also suggested that environmental factors related to habitats of species play keys roles in driving adaptive genetic differentiation of species genome. Furthermore, resistance-related genomic loci will provide useful molecular marker resources for forest tree molecular marker-assisted breeding.

Conclusions
In this study, environmental data and SNPs from population transcriptomes were integrated to investigate the spatial distribution of genetic variations to detect relationships between ecological factors and genetic differentiation. The results showed that the population genetic structure of Liriodendron was closely related to the divergence of multiple environmental factors. The high degree of interspeci c genetic differentiation and ecological niche divergence suggests that L. chinense and L. tulipifera may be have experienced selective pressures from different environments in East Asia and North America. Our results suggested that combining population transcriptome and environmental factor datasets is potential feasible to explore ecological adaptation differentiation of species by landscape genetic approach. The combination of population transcriptome resources and environmental datasets also provides new insights into the study of adaptive genetic differentiation of species. This study is an important step towards understanding the genetic mechanism of ecological adaptation differentiation of species. The resistance-related genomic loci will also provide useful molecular marker resources for forest tree molecular marker-assisted breeding. For researchers, using RNA-seq strategy for Liriodendron populations is a good potential approach to obtain their genomic resources. The high-quality SNP library and population transcriptome data will provide the basis for future population genomic analyses of Liriodendron and understanding the genetic mechanism of plant ecological adaptive differentiation of plants. Furthermore, our research also provided an empirical reference for studying the adaptive differentiation of allopatric species.

Methods
Sampling and RNA extraction In order to obtain more orthologous genes, we adopted the strategy of mixing shoot apex samples. Shoot apex materials of 80 individuals from 17 populations (12 populations of L. chinense and 5 populations of L. tulipifera) were collected from a provenance trial plantation of Nanjing Forestry University located in Zhenjiang, Jiangsu Province (119°13′20″E, 32°7′8″N) [48]. Fresh shoot apexes were quick frozen, transported to the laboratory and stored at -80 ℃ pending RNA extraction. All shoot apex materials were collected on the same day in May 2019 (Fig. 6). The origins of the 17 populations are shown in Fig. 7.
There is a voucher specimen for each population, and the voucher specimens and depositary of 17 populations are shown in Table 2. All samples were collected following current Chinese regulations and international guidelines.  RNA was extracted using an RNAprep Pure Kit (Tiangen, China). RNA purity was checked using a NanoPhotometer® spectrophotometer (Implen, CA, USA). A total amount of 1 μg RNA per sample was used as input material for the RNA sample preparations. Sequencing libraries were generated using the NEBNext® UltraTM RNA Library Prep Kit for Illumina® (NEB, USA) following the manufacturer's recommendations, and index codes were added to attribute sequences to each sample.

Data generation
The cDNA libraries were sequenced on an Illumina HiSeq platform with a 150 bp paired-end read length at Novogene-Tianjin. Raw reads were ltered by removing read adaptors and reads featuring unknown nucleotides and those with low quality (where PHRED values were < 20 for more than 50% of the bases). After ltering, ~8 Gb of clean data were generated for each individual (Additional le 1: Table S1). Clean data were mapped to the L. chinense reference genome [52] using Hisat2 v2.0.4 [83].  Tables S9 and Additional le 13: Table S10). In addition, occurrence records of L. chinense and L. tulipifera were collected from the Chinese Virtual Herbarium (CVH), the Global Biodiversity Information Facility (GBIF) and eld investigation to identify their niches.
Repetitive geographic coordinate information was deleted, and 145 and 478 geographic coordinate locations of L. chinense and L. tulipifera were obtained, respectively. A total of 56 environmental variables (55 climatic variables and 1 topographical variable) for the 623 point locations were extracted and downloaded from Worldclim v1.4 (Additional le 13: Table S10 and Additional le 14: Table S11).
Population genetic structure analysis First, Admixture v1.3.0 (http://www.genetics.ucla.edu/software) [85] was used to determine the population genetic structure of Liriodendron. Admixture is a program for estimating ancestry in a maximum likelihood model that allows for faster and more e cient when thousands of SNPs are employed. We ran 10 replicates for each genetic cluster K as 2 to 10. We chose the K value that best t the data by detecting the minimum cross-validation error of each K. The SNP sites used for population genetic structure analysis were all neutral sites (13990 SNPs); outlier loci identi ed by Arlequin and BayeScan were removed from all SNPs (16592 SNPs). We performed principal component analysis (PCA) using Eigensoft v7.2.1 [86]. Eigenvectors were generated from the covariance matrix with the R program.
The signi cance level of the eigenvectors was determined using the Tracey-Window test, which was implemented in the program twstats of Eigensoft v 7.2.1 [86]. In addition, the neighbor-joining tree was constructed based on the distance matrix by phylip v3.695 [87], with 1000 bootstraps.
To provide insight into the distribution of genetic variation and genetic diversity at different levels in the 17 populations of Liriodendron, molecular variance (AMOVA) and genetic diversity analysis were performed in Arlequin v3.5 [88]. Fst (F-statistics) was calculated using Arlequin for the 17 populations of Liriodendron to detect population differentiation.

Redundancy analysis (RDA)
To infer the in uence of the environmental variables on population genetic structure, we performed redundancy analysis, a constrained linear ordination analysis. The analysis was implemented in Canoco 5.0 (http://www.canoco5.com/) [89]. Here, allele frequencies per locus of each population were used as response variables (Additional le 15: Table S12), and the 56 environmental variables were used as explanatory variables. Before performing redundancy analysis, an interactive forward-selection analysis was implemented using Canoco 5.0 to select the top 16 environmental variables that had the greatest impact on the population genetic structure. Subsequently, RDA was implemented for 16 environmental variables to detect those that had signi cant effects on the population genetic structure. To reduce the false positive rate, the P-value was corrected. Finally, environmental factors with corrected P-values less than 0.01 were considered to have a signi cant impact on the population genetic structure.
Spatial evolutionary and ecological vicariance analysis (SEEVA) To detect ecological divergence among groups of Liriodendron, SEEVA [90] was performed, which can analyze environmental data using statistical methods to detect ecological vicariance in genetic differentiation and speciation. The divergence index (D: 0-1) and impact index (I: 0-1) were calculated for each of the 56 environmental factors in interspecies of Liriodendron and east-west groups of L. chinense. Meanwhile, Fisher's exact test [91] was used to obtain an appropriate P-value for tests with small sample sizes. D > 0.75 indicates high-level divergence. A larger impact index implies that an ecological factor has a greater in uence on the differentiation between sister groups and sister species. A P-value less than 0.0016 indicated signi cant divergence for splits between sister groups or sister species. SEEVA v1.01 can be obtained from http://seeva.heiberg.se.

Acquisition and analysis of leaf phenotypic data
To understand whether the phenotypes of the two species diverged, the characteristics of leaf phenotypes were also investigated and analyzed. We collected at least 10 mature leaves from different parts of each of the 80 individuals; a total of 947 leaves were collected, 336 leaves from L. tulipifera and 611 leaves from L. chinense. The relative positions of the lateral sinuses on the right side of the leaf (Fig. 8) and the number of leaf cracks were measured and recorded, and their positions were plotted using the ggplot2 package in R (https://www.r-project.org/). y1: The vertical distance from the right lateral sinus to the leaf base. y2: The vertical distance from the right apical lobe to the leaf base).

Environment association analysis of individual loci
To reduce the false positive rate, three methods were used to detect sites related to adaptive genetic differentiation. First, Arlequin v3.5 [88] and BayeScan v2.01 [92] were used to identify the outlier loci for subsequent environment association analysis. The hierarchical island model in Arlequin was used to detect outlier loci. The advantage of this method is that it is very sensitive to individuals with common histories and substructures. The parameters adopted by Arlequin are as follows: 20,000 simulations, 100 demes to simulate (per group), and the number of simulated groups suggested by the results of Admixture. The loci with FstP-value ≤ 0.05 were identi ed as outlier loci. The second method was performed in BayeScan using the Bayesian approach. The parameters used were as follows: 50,000 burnin iterations, thinning interval of 10, sample size of 5000, 20 pilot runs, each pilot run had a length of 5000, and 10,000 prior odds for the neutral model. As a result, a Bayes factor (PO) of 100 corresponded to a posterior probability (P) of 0.99. Therefore, loci with log (PO) ≥ 2 were considered outlier loci. As a way of detecting SNPs potentially under natural selection for adaptation, we tested for associations between outlier loci and each of 55 environmental factors using a latent factor mixed model in LFMM 1.2 [93]. The advantage of this approach is that it can effectively avoid false positives and false negatives caused by demographic history and population structure. The parameters used were as follows: 10,000 sweeps, 1000 burn-in sweeps, and the number of latent factors suggested by the results of population structure analysis. SNP loci with |Z|-score ≥ 4 and P-value ≤ 1.0 E-5 were considered environmentassociated loci (EAL).

GO and KEGG enrichment analysis
To detect whether the genes where these EAL are located are related to environmental adaptation, functional annotation and enrichment analysis were performed on these genes. First, for the potential genes related to environmental adaptation, we searched the NT (NCBI nucleotide sequences) and NR (NCBI non-redundant protein sequences) databases based on homologous gene theory and performed functional annotations. Then, GO (Gene Ontology) enrichment analysis of these genes was implemented by the GOseq R package (www.bioconductor.org/packages/) [94], and GO terms with P-values less than 0.05 were considered signi cantly enriched. KEGG (Kyoto Encyclopedia of Genes and Genomes) enrichment analysis was also performed using KOBAS software (http://kobas.cbi.pku.edu.cn/kobas3/download/) [95] to test the statistical enrichment of these genes in KEGG pathways.

Consent for publication
Not applicable.

Availability of data and materials
The datasets supporting the conclusions of this article are included within the article and its additional les. Material samples are available from authors.   Leaf shape variation distributions in three groups of Liriodendron (X-axis: The ratio of x1 and x2; Y-axis: The ratio of y1 and y2) (Con dence interval: 95%).
Page 30/33 Figure 5 The plots of the KEGG enrichment analysis results.

Figure 6
Shoot apex materials of Liriodendron.

Figure 7
Page 32/33 The original population distributions of 17 provenances of Liriodendron. (The map was created by software DIVA-GIS 7.5, the software and free spatial data were downloaded from http://www.diva-gis.org) Note: The designations employed and the presentation of the material on this map do not imply the expression of any opinion whatsoever on the part of Research Square concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries. This map has been provided by the authors.

Figure 8
Common leaf shapes of L. chinense and L. tulipifera (x1: The vertical distance from the right lateral sinus to the primary vein. x2: The vertical distance from the right lateral lobe to the primary vein. y1: The vertical distance from the right lateral sinus to the leaf base. y2: The vertical distance from the right apical lobe to the leaf base).

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.