Sample collection
Our study was conducted in accordance with the laws of the People’s Republic of China, and field collection was approved by Chinese Government. All researchers received permission letters from the College of Life Science, Shanxi Normal University, to collect the samples. The samples were taxonomically identified based on their phenotype by Junxia Su (Associate Professor of systematic botany) at the Shanxi Normal University. The voucher specimens were deposited in the herbarium of College of Life Science, Shanxi Normal University (No:20170105030-20170105050).
Eleven populations of O. longilobus and thirteen populations of O. taihangensis were sampled, which covered the distribution ranges of Opisthopappus (Additional file 10: Table S4, Fig.1). Individuals growing at a common site were regarded as a single "population". Fresh young leave devoid of disease or insect pests were selected for each of the sample sites, where 10-15 individuals from each population were collected. The collected samples were placed into sealed bags filled with silica gel, dehydrated/quickly dried, and stored at 20°C for later use. A global positioning system (GPS) was employed to demarcate each sample site and record the longitude, latitude, and altitude of each population.
PCR amplification, sequencing and genotyping
The total genomic DNA was extracted using the modified 2 × CTAB method [70]. The quality of DNA was measured using an ultraviolet spectrophotometer and 0.8 % agarose gel electrophoresis and stored at -20°C for further use. The SNP and InDel primers were designed according to the predicted locus from the transcriptome sequences of Opisthopappus (Additional file 11: Table S5).
For the SNP primers, the 20 µL PCR reaction contained 10 µL 2 × MasterMix, 2 µL template DNA (30 ng/µL), 1 µL primer S (10 µM), 1 µL primer A (10 µM), and 6 µL ddH2O. The PCR procedure proceeded as follows: pre-denaturation at 94°C for 5 min., denaturation at 94°C for 1 min, annealing temperature based on each primer (Table 2) setting for 1 min, elongation at 72°C for 1.5 min., repeated for 35 cycles, last elongation at 72°C for 10 min, and preservation at 4°C. The PCR products detected using 2% agarose gel electrophoresis were confirmed via an automatic analysis electrophoresis gel imaging system, then sent to Sangon Biotech (Shanghai) for sequencing.
For the InDel primers, the PCR reaction was 20 µL, which contained 10 µL 2 × MasterMix, 3 µL template DNA (30 ng/µL), 1 µL primer S (10 µM), 1 µL primer A (10 µM), and 5 µL ddH2O. The PCR procedure was as follows: pre-denaturation at 94°C for 1 min, denaturation at 94°C for 1 min, annealing temperature based on each primer (Additional file 11: Table S5) setting for 1 min, elongation at 72°C for 1 min, repeated for 35 cycles, last elongation at 72°C for 10 min, preservation at 4°C. The PCR products were detected using 8% polyacrylamide gel electrophoresis. The presence or absence of each InDel fragment were coded as ‘1’and ‘0’ respectively.
Population genetic differentiation analyses
For the SNP sequences, haplotypes, haplotype frequencies, haplotype diversity (Hd), and nucleotide diversity (π) were calculated using DNASP 5.10 [71]. The genetic differentiation parameters GST and NST were examined by PERMUT 2.0 [72] based on the haplotype frequency. For the InDel data, the genetic characteristics, Nei's gene diversity index (H), Shannon’s information index (I), and the percentage of polymorphic loci (PPL), were calculated by POPGENE 1.31 [73].
An analysis of molecular variance (AMOVA) was implemented by ARLEQUIN 3.5 [74] and GENALEX 6.5 [75] to detect the distribution of genetic variations within and between populations or species. Subsequently, the values of FST, FCT, and FSC [76] were calculated based on hierarchical AMOVA, and the permutation test was set to 1000.
Cluster analysis based on the maximum likelihood (ML) method and Nei’s genetic distance, respectively, was performed using MEGA 7.0 [77]. Bayesian clustering analysis (BCA) was employed to examine the similarity and divergence of genetic components between populations, and performed using STRUCTURE 2.2 [78] for both SNP sequencing and InDel data. The posterior probability of grouping number (K =2–24) was estimated through 10 independent runs using 500,000 step Markov chain Monte Carlo (MCMC) replicates, following a 1,000,000-step burn-in for each run to evaluate consistency.
The best grouping number was evaluated by △K [79] in STRUCTURE HARVESTER 0.6.94 [80]. These 10 runs were aligned and summarized using CLUMPP 1.1.2 [81] and the visualization of the results was plotted using DISTRUCT 1.1 [82].
To test the genetic divergence between populations or species, a discriminant analysis of principal components (DAPC) was implemented by the function dapc in the R package ‘adegenet’ [83], which initially transformed the genetic data using principal components analysis (PCA) results, and subsequently performed a discriminant analysis on the retained principal components.
The properties of the “without a priori”, using partial synthetic variables to minimize variations within groups [84], might assist with objectively evaluating the artificial classification of O. taihangensis and O. longilobus. Kruskal-Wallis tests for the first two principal components (PCs), and the first two linear discriminants (LDs) of DAPC, were conducted to examine the genetic divergence between populations and species.
Population demographic history inferring
A network relationship was generated through the median-joining method in Network 5.0 [85], to investigate the evolutionary relationships between the Opisthopappus haplotypes. BEAST 1.84 [86] was employed to estimate the differentiation and diversification time between haplotypes, and Chrysanthemum indicum was selected as an outgroup. JMODELTEST 1.0 [87] was utilized to calculate the best nucleotide substitution model, and GTR was finally confirmed as the optimal model. An uncorrelated lognormal relaxed molecular clock model was adopted. Markov chain Monte Carlo (MCMC) was repeated 8×107 times by sampling every 80,000 generations. Since the rate of base substitution in Opisthopappus has not been calculated as yet, we estimated the differentiation time by using the reported average value 2×10−9 of the angiosperm genome. TRACER 1.5 [86] was used to check the convergence of the framework, which ensured that every tested parameter was greater than 200.
To assess whether the species had experienced significant expansion, we used ARLEQUIN 3.5 [74] to calculate the Tajima’s D [88] and Fu’s FS [89] values. Moreover, the sum of square deviation (SSD) and raggedness index (Rag) in the mismatch distribution analysis (MDA) was also performed in ARLEQUIN 3.5 [74]. The process employed a 1000 step permutation test.
Approximate Bayesian computation (ABC) analysis, provided by DIY-ABC 2.1.0 [90], enabled the estimation of complex evolutionary population histories. Based on the estimated genetic variations, genetic structures, and current geographic distributions, three evolutionary scenarios were proposed. Scenario 1: O. longilobus and O. taihangensis were differentiated from a common ancestral population during the same period. Scenario 2: O. taihangensis was an ancestral population, and the O. longilobuswas differentiated from O. taihangensis. Scenario 3: O. longilobus was the ancestral population, and O. taihangensis was differentiated from O. longilobus. Each scenario was performed with 1,000,000 simulations. There was up to 95% confidence intervals through logistic regression, and the posterior probability of each evolutionary event was estimated using the observation data closest to 1% of the simulated data. An optimal evolutionary scenario was selected according to the maximum posterior probability value, and the parameters including effective population size and divergence generation was estimated under the optimal scenario.
In addition, the historical and contemporary gene flows were estimated within the two Opisthopappus species by MIGRATE-N 3.6 [91] and BAYESASS 3.0 [92]. In MIGRATE-N 3.6, maximum-likelihood analyses were performed using 10 short chains (104 trees) and three long chains (105 trees) with 104 trees discarded as an initial burn-in’ and astatic heating scheme at four temperatures (1, 1.5, 3, and 1000,000). To ensure the consistency of estimates, we repeated this procedure five times and reported average maximum-likelihood estimates with 95% confidence intervals. The parameters θ and M were estimated using a Bayesian method, which could be employed to estimate the number of migrants per generation (Nm) into each population using the equation 4Nm = θ*M. When estimating the contemporary gene flow using BAYESASS 3.0, the parameters were examined including migration rates (m), allele frequencies (a) and inbreeding coefficients (f) to ensure that the optimal acceptance rates of the three parameters fell within the range of 20-60%. Ten independent runs were executed to minimize the convergence problem. The result with the lowest deviance was adopted according to the method of Meirmans [93], where the 95% credible interval was estimated as m ± 1.96 x standard deviation (SD).
Environmental variables influence analyses
Nineteen bioclimatic variables (Bioclim) representing Grinnellian niches [94], which are defined as the scenopoetic environmental variables of a species required to survive, were downloaded from the WorldClim database (http://www.worldclim.org/) with a resolution of 30 arc-sec (~1×1 km) and extracted using the R package ‘raster’. Subsequently, a correlation analysis of the bioclimatic variables was performed prior to further analysis to reduce multicollinearity. The variance inflation factor (VIF) was computed for climate variables using the “vif” function in the R package ‘usdm’ [95]. Climate variables with VIF > 20 [96], and those highly correlated with other variables (|r| > 0.8) were removed. Finally, five climate variables were retained, namely bio3 (Isothermality), bio6 (Min Temperature of Coldest Month), bio8 (Mean Temperature of Wettest Quarter), bio14 (Precipitation of Driest Month), and bio16 (Precipitation of Wettest Quarter) (Additional file 7: Table S3, Additional file 8: Figure S5).
The statistics and distribution of reserved climate variables were displayed using box-plots grouped by the two species (Additional file 8: Figure S5) and the significance was tested by one-way ANOVA (Additional file 7: Table S3). The principal component analysis (PCA) of independent climatic variables to reduce the dimensionality that defined the niche space allowed for the comparison of the integrity of climate variables between O. longilobus and O. taihangensis.
To test how the geographical distance and environmental differences impacted genetic differentiation, the Mantel test of Pearson correlation was performed between genetic, geographic, and environmental (climatic) distances. Pairwise FST distance calculated in ARLEQUIN 3.5 [74] was used as genetic distance. The geographic distance was estimated using the GENALEX 6.5 [75] according to three-dimensional factors (latitude, longitude, and altitude). The environmental distance was calculated using the Euclidean distance with PASSAGE 2.0 [97].
A partial Mantel test between genetic, environmental, and geographic distances was conducted under the condition of the geographic or environmental effect, respectively. Further, a multiple matrix regression with randomization (MMRR) was performed to test whether the genetic distance responded to variations in the geographic and/or environmental distances. The Mantel test was performed in the R package ‘vegan’ [98], and MMRR analysis was performed using the R package ‘PopGenReport’ [99]. Regression coefficients of the Mantel test (r) and MMRR (r2) and their significance were determined based on 9,999 random permutations.
Distance based redundancy analyses (dbRDA) were performed to elucidate whether the climatic variables conditioned on the geographic distribution explained the genetic differentiation of the populations using the R package ‘vegan’ [98]. Firstly, a distance-based principal coordinate analysis (PCoA) of the genetic data at the species level was performed to generate several principal coordinates (PCs) using the R package ‘ape’ [100]. Next, five retained climatic variables were employed as explanatory variables conditioned on geographic factors, and permutation tests for each climatic variable were performed using the “anova. cca” [101] function in the R package ‘vegan’.
The distribution of climate variables along the ordination axes was further analyzed using a generalized linear model (GLM). Finally, two different methods were employed to select the key factors which might be essential for driving the variation of Opisthopappus populations: (i) by the Akaike Information Criterion (AIC) value and the significance level P value of F-statistic [102] using the “ordistep” function of R package ‘vegan’. During each selecting step, only one factor was retained with the most significant P value. If the P value of the two variables was equal, the variable with the smaller AIC value was selected. (ii) by the double stopping criterion [103] using the “forward.sel” function of R package ‘packfor’ [104]. Significance was determined using 999 permutations in the RDA analysis. The first two RDA axes and the selected key factors were employed to construct the ordination and ordisurf plots of the dbRDA.