Identifying Genetic Variation Associated With Environmental Variation and Drought-tolerance Phenotypes in Ponderosa Pine

Background Genotype-to-environment (G2E) association analysis coupled with genotype-to-phenotype (G2P) association analysis promises exciting advances towards discovering genes responsible for local adaptation. We combine G2E and G2P analysis with gene annotation in Pinus ponderosa (ponderosa pine), an ecologically and economically important conifer that lacks a sequenced genome, to identify genetic variants and gene functions that may be associated with local adaptation to drought. Results We identied SNP markers in 223 genotypes from across the Sierra Nevada by aligning GBS sequence fragments to the reference genome of Pinus taeda (loblolly pine). Focusing on SNPs in or near coding regions, we found 1458 associated with 5 largely-uncorrelated climate variables, with the largest number (1151) associated with April 1st snow pack. We also planted seeds from a subset of these trees in the greenhouse, subjected half of the seedlings to a drought treatment, and measured phenotypes thought to be associated with drought tolerance, including root length and stomatal density. 817 SNPs were associated with the control-condition values of six traits, while 1154 were associated with responsiveness of these traits to drought. to be included in Six out of the eight measured phenotypic traits were signicantly different in the drought treatment versus the wet treatment: height growth (GR), shoot weight (SW), root length (RL), root-shoot dry mass ratio (R2S), stomata density on adaxial side (SD_AD), and number of stomatal rows on abaxial side (NR_AB). We therefore focused on these traits for the G2P association. We measure the association of SNPs to either the control treatment family breeding value (BV) for each trait, or the average change in the trait from wet to dry conditions (drought responsiveness).


Introduction
Genomics promises exciting advances towards exploring adaptive genetic variation and evolutionary potential under rapidly changing and often unpredictable environment [1][2][3]. Intraspeci c genetic variation represents the potential for adaptive change in response to new selective challenges, which is critical for local species persistence under environmental change [4,5]. Adaptation to local climate conditions is common in tree populations [6][7][8][9]. However, tree populations with long life cycles may become maladapted if environmental shifts rapidly [10][11][12]. Understanding the distribution of genetic variation related to environmental responses may help us better predict changes and manage forests in a changing climate [13,14]. This includes selecting seed sources for restoration or breeding that have desirable characteristics such as drought tolerance [15,16].
Landscape genomics offers enormous potential to discover genes responsible for local adaptation by investigating the statistical association between genetic variation at individual loci and the causative environmental factors [17][18][19][20]. This approach is sometimes known as genotype-to-environment (G2E) association analysis. Prior studies in Arabidopsis -the primary plant model organism -have found that environmentally-associated SNPs can predict performance in common gardens [21], and a study in Pinus pinaster suggests this could be true in trees as well [22]. However, G2E studies often don't by themselves reveal why certain allele are more prevalent or favored in particular environments -for example, are they responsible for selectively favored traits? Genotype-to-Phenotype (G2P) association identi es loci linked to a particular phenotype [23,24]. To eliminate the effects of environment on phenotypes, traits must be measured in a common environment, such as a greenhouse. However, G2P association study does not reveal whether a trait variant would be favored in the eld. G2E and G2P association are thus complementary, and by combining them one might identify both the loci and traits that are selectively favored in particular conditions [18,25].
The large genome size of conifer trees (> 19 GBP) represents a challenge for analysis. Most association studies in conifers have focused on SNPs within a few hundred genes [18, 23,24,[26][27][28], or fewer than 2,000 genome-wide SNPs [29]. One notable exception is a recent study on lodgepole pine that made use of a sequence capture dataset created by mapping the Pinus contorta transcriptome to the P. taeda genome sequence [25], but most conifers have neither a published genome sequence nor a full transcriptome. Though targeted sequencing is e cient, candidate gene approaches may miss other important genes with previously unsuspected roles in local adaptation, while focusing on variants within genes may miss important variants within regulatory regions.
Several approaches to identifying more genetic variants for genome-wide association studies (GWAS) utilizing next generation sequencing (NGS) have been proposed in recent years [30,31]. Genotyping-by-Sequencing (GBS), which can generate tens of thousands of SNP markers (Single Nucleotide Polymorphisms) without the need for a reference genome or full transcriptome, has emerged as a costeffective strategy [32,33]. By combining the power of multiplexed NGS with restriction-enzyme-based genome complexity reduction, GBS is able to genotype large populations of individuals for many thousands of SNPs in an increasingly rapid and inexpensive way [31,34].
Despite the high economic and ecological importance of ponderosa pine (Pinus ponderosa) in the western United States [35], no previous study has attempted to identify the relationship between gene sequence variation and drought tolerance in this species. Some studies have investigated P. ponderosa evolutionary history and phylogeography using mitochondrial DNA markers; these re ect the long-term biogeographical process contributing to the modern distribution of the species, but have little adaptive signi cance in themselves [36,37]. Other studies have emphasized the importance of intraspeci c variation of P. ponderosa in environmental responses, but focus on the phenotypic variation within and among populations without identifying the underlying genetic variation [38,39]. California's historic 2012-2016 drought may represent an increasingly common condition as climate changes [40,41]. Such "hot droughts" can lead to mass tree mortality, negatively impacting the sustainability of conifer forests [42]. A deep understanding of the genetic basis of adaptation in ponderosa pine and other western conifers is critical for successful reforestation and conservation programs.
In the 1970's the Forest Service's Paci c Southwest Regional Genetic Resources Program planted clones of 302 wild ponderosa pines from diverse climate conditions in the central portion of the Sierra Nevada mountains in an orchard located in Chico, California. We chose 223 individual P. ponderosa genotypes from the orchard that span the full climatic range included in the collection for the G2E analysis, and seedlings of a subset of 50 genotyped parent trees for a G2P analyses of putative drought-response traits based on a greenhouse experiment. We ran gene annotation to ascribe biological function to the genes that the associated SNPs were in or near. Then we assessed overlap in SNP identity or gene functions among G2E and G2P association analysis that might indicate particular importance for local adaptation.

Results
Genetic diversity and population structure A total of 4,155,896 SNPs were identi ed from GBS data of the 223 genotypes after initial ltering. With these SNPs, we ran both principal component analysis (PCA) and admixture analysis to determine the number of populations (K) represented by these individuals. Two principle components best explained the genetic variation between our samples, but nearly all individuals clustered together ( Fig. 1 and Additional le 1: Table S1), and according to the admixture analysis result the best K value was one (Additional le 1: Fig. S1). We also plotted the admixture of each individual tree and found that "populations" completely overlapped geographically ( Fig. 1 and Additional le 1: Fig. S2). Thus, we concluded that the sampled genotypes belong to one interbreeding population and used K = 1 for the association analysis.
Environmental and phenotypic associations at individual loci Initial gene annotation revealed that many of the 4,155,896 SNPs fall between genes and regulatory regions (in the intergenic regions) -likely have no direct effect on gene expression or function. To eliminate false positives that might arise from this, we ltered out the intergenic SNPs, leaving 927,740 (22.3%) SNPs in or near genes for the association analyses. This is similar to the approach used in Jordan et al. for Eucalyptus [43]. We used latent factor mixed model 2 (LFMM2) for G2E and G2P association analysis with these 927,740 SNPs and K = 1.
The ve 1921-1950 mean climate variables used in the G2E analysis -climatic water de cit (CWD); minimum winter temperature (TMIN); maximum summer temperature (TMAX); monthly winter precipitation (PPTW); and April 1st snow pack (PCK4) -had low-to-moderate correlations with one other (Additional le 1: Fig. S3). After the running of LFMM2 (q < = 0.05) for G2E, we found 1,458 signi cant associations with the environmental variables out of the 927,740 ltered SNPs (Table 1). PCK4 had the most associations by far, with TMIN having the next highest number of associations. The number of SNPs associated with more than one climatic variable was low, with the highest degree of overlap between PCK4 and TMIN (64 SNPs) and between CWD and TMIN (17 SNPs) (Fig. 2). For both PCK4 and TMIN, there were roughly similar numbers of associated SNPs in upstream and downstream regions versus with the gene itself, with 14% of associated SNPs being missense (nonsynonymous) mutations (Table 2). SNPs associated with CWD were also roughly evenly split between anking regions and the main gene sequence, but only 3% were missense mutations. A higher proportion of SNPs associated with TMAX were within the gene, with 22% being missense mutations, while PPTW showed the opposite pattern, with 69% of SNPs being in the anking regions.
Before running G2P analysis, seedlings from a subset of the 223 genotyped trees were grown in the greenhouse and subjected to wet (control) and drought treatments, with multiple phenotypic traits thought to be associated with drought response being measured for both groups. Fifty families were initially selected, although only 42 had su cient germination for measurements to be included in analyses. Six out of the eight measured phenotypic traits were signi cantly different in the drought treatment versus the wet treatment: height growth (GR), shoot weight (SW), root length (RL), root-shoot dry mass ratio (R2S), stomata density on adaxial side (SD_AD), and number of stomatal rows on abaxial side (NR_AB). We therefore focused on these traits for the G2P association. We measure the association of SNPs to either the control treatment family breeding value (BV) for each trait, or the average change in the trait from wet to dry conditions (drought responsiveness).
More SNPs were associated with the trait drought responses (1,154) than with the control traits (817).
While control R2S had the most associations and SW the least (Table 2), the opposite was the case for drought responsiveness ( Table 3). The number of SNPs associated with more than one trait was low in both G2P analyses. The highest degree of overlap was in control traits of RL and R2S (12 SNPs) and of R2S and NR_AB (9 SNPs) (Fig. 3). The proportion of associated upstream SNPs was similar across control traits (32-40%), but proportions of other categories varied widely, with the proportion of missense SNPs ranging from 8-25%. For drought response, the distribution of SNPs in all categories differed, with proportion of upstream being 19-34% and proportion of missense being 7-16% for traits other than R2S. R2S was only associated with 6 SNPs, 5 upstream and 1 downstream.
Total 473 280 28 15 12 9  Gene annotation for the signi cantly associated SNPs Of the 1458 SNPs associated with environmental gradients, functions could be assigned for 788 (54%), while the rest had no matches in available gene ontology databases. We found that 283 SNPs belonged to protein types that have functions that may be directly related to drought tolerance or other environmental responses (Fig. 4). We categorized these genes into ve main functional groups: (a) the ubiquitination pathway, (b) seed, pollen and ovule formation, (c) cell wall formation, (d) stress responses, and (e) cell division and growth.
Many of the SNPs associated with TMAX, TMIN, CWD, and PCK4 were in or near genes in the protein ubiquitination pathway or the jasmonic acid synthesis response pathways ( Fig. 4 and Additional le 2: Table S2), both of which are involved in responses to biotic or abiotic stress [44][45][46]. CWD and PCK4 were also associated with SNPs in or near genes involved in seed dormancy and the abscisic acid (ABA) signaling pathway, both of which have been previously linked to drought responses in trees [47]. Genes involved in reproduction, including pollen and ovule formation, were associated with TMAX, TMIN, and PCK4. CWD and PCK4 were associated with genes involved in cell wall organization. Both TMAX and PCK4 were associated with genes involved in vascular tissue formation, growth regulation, and stress responses, while TMIN and PCK4 were associated with genes involved in stomatal regulation and pathogen responses. Further biotic and abiotic stress response genes were associated with PCK4, as were genes involved in nutrient transport, photosynthesis, respiration, sugar synthesis, and light responses (Additional le 2: Table S2), Of the 817 SNPs associated with seedling control trait values and 1,154 SNPs associated with trait drought responsiveness, 43% and 51% could be assigned functions by gene ontology (Additional le 2: Table S3 and Table S4), respectively. Many of the same functional categories of genes that were found to be associated with the environment were also associated with measured phenotypes, though there was no overlap in speci c SNPs identi ed. This includes ubiquitination, seed development, cell wall organization, stress response, and cell division (Fig. 4, 5, 6).
In the control treatment, the two stomatal traits were associated with genes involved in ubiquitination, cell wall organization or modi cation, growth and development, and ABA response. Control root-to-shoot ratio was associated with biotic & abiotic stress responses, cell wall organization or modi cation, cell division or differentiation, lateral root formation and ubiquitination. Control height growth had no associated SNPs and root length was only associated with one SNP located in a gene involved in ubiquitination (Fig.  5). However, drought responsiveness of height growth, shoot weight, and root length were associated with all ve functional categories (Fig. 6). Drought responsiveness of the two stomatal traits was associated with genes involved in stress responses, cell wall formation/organization, cell division/differentiation, and root formation.
Besides the ve main functional groups of genes with SNPs associated with climatic, phenotypic and drought response variables, several other functional groups were identi ed in both the G2E and G2P annotation results (Additional le 2: Table S2, Table S3, and Table S4). For example, 111 (14%) of the environmentally-associated SNPs, 53 (6%) of SNPs associated control traits, and 121 (12%) of the SNPs associated with trait drought responses were in genes relating to ATP binding or protein kinases. Associated SNPs in genes associated with RNA/DNA binding, metal ion binding, translation, and protein transport were also fairly common.

Discussion
We identi ed 1458 SNPs associated with 5 climate variables, with April 1st snow-pack associated with most of the SNPs. We also identi ed 817 SNPs associated with the control-condition values of six phenotypic traits, while 1154 associated with responsiveness of these traits to drought. No individual SNPs overlapped between the genotype-to-environment (G2E) and genotype-to-phenotype (G2P) analyses. But the associated SNPs did share similar gene functional categories including (a) the ubiquitination pathway, (b) seed, pollen and ovule formation, (c) cell wall formation, (d) stress responses, and (e) cell division and growth. Other shared categories including ATP binding or protein kinases, RNA/DNA binding, metal ion binding, translation, and protein transport. , the GBS sequence fragments were quite short (90-100 bp or less) and were trimmed further before SNP calling, so a linked non-synonymous variant could have been missed. We also found quite a few upstream and downstream SNPs in both G2E and G2P analysis that might either directly affect gene expression or be linked to a protein-altering variant.
For the G2E association, we used 1921-1950 average climate values to estimate the selective environment under which these genotypes established as seedlings and saplings prior to their collection in the 1970s. We chose to focus on raw environmental variables rather than environmental PCA axes, as a number of previous studies have done [17,18], because PCA associations can be di cult to interpret if, for example, the axes include both temperature and moisture variables. We selected ve climate variables that exhibit low correlation with one another across the collection area. The number of SNPs associated with more than one climatic variable was low (Fig. 2), which may indicate that we were successful in selecting semi-independent climatic variables which require different genetic adaptations. The highest degree of overlap was between PCK4 and TMIN (64 SNPs) and between CWD and TMIN (17 SNPs). The former SNP set might be related to adaptation to cold and/or snow depth, while the latter SNP set might be related to how quickly the site warms up in spring, drying out the soil.
In the G2E analysis, over half of the SNPs were associated only with April 1st snowpack (PCK4). Winter minimum temperatures (TMIN) -affecting the depth and duration of snowpack -shows the next highest number of associations. In this Mediterranean climate region, most of the annual precipitation occurs during the winter, and melting of winter snow accumulation at high elevations feeds spring and summer stream ow [49]. However, a heavy snowpack may also delay the start of the growing season for juvenile trees. Consistent with this, at least one of the associated SNPs was in a gene involved in light responses.
In the G2P analysis, most of the SNPs associated with control phenotypic traits were linked with root-toshoot ratio (R2S) and number of abaxial stomatal rows (NR_AB,) while most of the SNPs associated with phenotypic responses to drought were linked with shoot weight (SW), root length (RL), and R2S ratio. Drought-stressed ponderosa pine seedlings allocated more to their root system, with longer root length, higher root to shoot dry mass ratio, less dry shoot mass and less height growth. Other studies in pines have found similar patterns [50][51][52][53]. This may indicate acclimation to drought at the cost of overall low growth of aboveground structures. Many of the SNPs associated with phenotypic drought responses were in genes associated with cell division & differentiation and with root growth, both of which make sense in light of the observed changes in allocation to root vs. shoot growth. The number of SNPs associated with more than one trait was low in both G2P analyses. The highest degree of overlap was in drought responsiveness of RL and R2S and of R2S and NR_AB (Fig. 6). Eckert et al. (2015) found two SNPs associated with both environmental PCAs and measured phenotypes out of 31 and 6, respectively -a low rate, but one which led us to think we might see more with a higher number of associations. We found no overlaps in speci c SNPs between our G2E and G2P analyses, but there was substantial overlap in functional categories, which directly related to drought tolerance or other environmental responses in previous studies [44,46,47,54]. The prevalence of genetic associations related to abscisic acid (ABA)-signaling pathways and ubiquitination in both G2E and G2P analyses is consistent with prior studies of drought response in conifers [47]. Increasing ABA concentrations are used as a signal to keep stomata closed during dry conditions, reducing water loss [55]. In addition, ABA signaling can also affect shoot growth and water uptake [56,57]. Ubiquitination has been found to be involved in drought responses in model species by playing a role in ABA-mediated dehydration stress responses [58, 59], or through the downregulation of plasma membrane aquaporin levels [60]. The study of the role of ubiquitin in conifer drought response is still somewhat limited. A study in black spruce (Picea mariana) identi ed 16 out 313 candidate genes correlated with precipitation, including the genes in the ubiquitin protein handling pathway [61]. The association between ubiquitin protein and roots and stomatal density may indicate previously unidenti ed roles in drought response.
Moreover, genes associated with seeds and seed dormancy can also be directly involved in drought tolerance; for instance, dehydrins can protect proteins from desiccation in both seeds and other plant tissues [47]. However, reproduction-related genes might also show associations with environmental gradients if they are involved in reproductive timing. Genes involved in xylem & phloem differentiation or cell wall formation could play a role in shaping the hydraulic safety of water-transporting cells, something that can be quite plastic in pines (Lauder et al. 2019). Other than these functions directly related to drought tolerance or other environmental responses, the other overlapping functions among G2E and G2P analysis are involved in gene expression (RNA or DNA binding, transcription factors, helicase activity, ribosome components, methylation) or ATP binding (motifs found in membrane transporters, microtubule subunits, enzymes, and other cell components that require energy). Our ndings suggest the e ciency of combining G2E and G2P analysis with GBS to uncover potentially important adaptive genetic variation.
For many of the other loci associated with environmental gradients, gene ontology results were too vague to draw many conclusions about their function or why the association might exist.

Conclusions
In conclusion, by investigating adaptive genetic variation in ponderosa pine with G2E and G2P association analysis, our study found thousands of genomic variants associated with response to climate and physiological traits. Some of these have previously-identi ed functions associated with drought responses, but for others the function -or how that function is relevant for environmental responses -is still unknown. Molecular tools based on the associated genetic markers could be developed to assist breeders and land managers in speeding up selection for drought tolerance or selecting appropriate seed sources for a changing climate. In addition, our results should open up new opportunities for functional studies to determine the molecular roles of the genes underlying these associated genetic makers in in uencing trees adaptation.

Sampling
The original source locations for the 223 P. ponderosa genotypes in the Chico orchard ( Fig. 1) fall within just one of the several genetic subdivisions previously identi ed in ponderosa pine [66-68]. Fresh needles were collected from these individuals and placed in labeled tea bags over silica gel to quickly dry them and preserve the DNA for extraction.
Seeds were collected from a subset of 50 genotyped parent trees in the summer of 2018. Because pines are wind-pollinated and outcrossing [67], seeds from the same tree are mostly half-siblings, occasionally full-sibs. We placed 2-3 mature cones from each mother tree into paper bags and placed them in a warm dry place until seeds were released. The seeds were stored in a refrigerator until the greenhouse experiment was carried out (see greenhouse experiment section below

SNP calling
No reference genome is available for ponderosa pine (Pinus ponderosa), but one does exist for loblolly pine (Pinus taeda) [69,70]. Of the conifers that have been sequenced to date, P. taeda is the most closely related to P. ponderosa [71,72]. Furthermore, the P. taeda reference genome was used to successfully used to design probes for sequence capture in P. contorta [73,74] upstream SNPs, downstream SNPs, synonymous, and missense variants in the gene coding sequence. In Snp Eff, "intragenic" refers to SNPs in introns, while "missense" refers to any non-synonymous mutation in the transcribed region.
Many SNPs identi ed by GBS fall between genes and regulatory regions (in the intergenic regions) and likely have no direct effect on gene expression or function. In addition, because of the low amount of linkage disequilibrium in conifers [79,80], any associations identi ed between such intergenic SNPs and a phenotype or environment of interest are likely false positives, rather than re ecting linkage between the SNP and a causal variant. We therefore ltered out the intergenic SNPs rst before running the association analysis using a Python script (https://github.com/shumengjun/LFMM).

Environmental associations
We used LFMM2, which was developed for G2E association and has been shown to outperform other similar approaches with several orders-of-magnitude faster computing [82], and which also controls for the effects of demographic processes and population structure on the distribution of genetic variation [83]. This approach is robust to high amounts of missing data, such as GBS sequencing tends to produce, when sample sizes are > 100 [84]. LFMM2 regression models combine xed and latent effects with the following equation where Y is a matrix of genetic information measured from p genetic markers for n individuals, and X is a matrix of d environmental variables measured for n individuals. The xed effect sizes are recorded in the B matrix, which has dimension p * d. The E matrix represents residual errors with the same dimensions as the response matrix. The matrix W is a matrix of rank K, de ned by K latent factors where K can be determined by model choice procedures. The K factors represent unobserved confounders -usually geographical structure in the genotypes of the samples -represented as an n*K matrix, U. V is a p × K matrix of loadings. The matrix U is obtained from a singular value decomposition (SVD) of the matrix.

Greenhouse experiment
We used fty half-sib families that span the climatic range of ponderosa pine in California for the greenhouse experiment and G2P analyses. We aimed to have 10 seedlings from each maternal family in both wet and dry treatments, 1000 seedlings in total. During winter 2018, the seeds were strati ed to break dormancy by placing them in aerated water for 48 hours, then surface-drying them and placing them in plastic bags in the refrigerator (~ 1.7°C) for 6 weeks. Forty-eight of the 50 families had enough seeds in their cones to be included in the experiment (Additional le 1: Fig. S4).
Because maximum seedling root length in a pilot experiment conducted in 2017 was more than 110 cm, we used plastic tubes with an 8-cm width and 120-cm depth for planting. The bottom of each tube was capped with mesh to prevent the soil from falling out while allowing drainage, and the lightweight clear tubes were wrapped in black plastic to keep roots in the dark. The planting soil was a mixture of 70% sand, 20% vermiculite, and 10% organic-rich potting mix to mimic the coarse texture of the soil of many Sierra Nevada conifer forests [89]. To keep tubes upright we used PVC pipes to build 10 frames that could each hold 100 tubes. Two seeds from each family were planted in each tube in February 2019, and two tubes from each family were randomly placed within each frame. In April 2019, we replanted more strati ed seeds of the correct family in any tubes without seedlings. All the tubes were watered every other day during the germination and seedling establishment period (February through June).
At the end of June 2019, all but one seedling per tube was removed, and alternating frames were assigned to the wet treatment and the dry treatment (5 frames containing up to 500 seedlings per treatment) (Additional le 1: Fig. S4). The wet treatment group was watered twice every week and the drought treatment group was watered once every three weeks until mid-October (3.5 months). While wild ponderosa pine seedlings would receive little to no precipitation during the summer months, this occasional watering was necessary in the greenhouse environment to prevent complete mortality.
Temperatures inside the greenhouse in the low-elevation environment of Merced, CA reached as high as 37°C on the hottest days and the soil volume of the tubes was limited, with no access to groundwater, both of which make evaporation and drought stress more intense than the no-precipitation condition in the wild.
Multiple phenotypic traits were measured during and after the greenhouse experiment. We calculated shoot growth as nal height minus height at the initiation of the treatments. The length of fresh roots was measured from soil surface to taproot tip immediately after the harvesting, to avoid shrinkage. Following harvest, needles, fresh stem and fresh roots of all the seedlings were separately put into paper lunch bags and dried at 75°C for 48 hours. We measured root dry mass (RW) as well as shoot weight (SW, total of stem and needles). We then calculated root-shoot ratio (R2S) as RW/SW. Speci c Root length (SRL) was calculated as root length/root weight.
Before harvest, we also collected 3-4 fresh needles from living seedlings to calculate stomatal density. In pines, stomata are arranged into longitudinal rows. We put each needle on a slide and photographed it at 100x magni cation using a Leica DME compound microscope equipped with a Leica DFC290 digital camera. All counts were conducted near the middle of the needle to avoid variation that might occur at the base and at the tip. Approximately 1.96 mm lengths of needle were surveyed for number of stomata and stomatal rows on their adaxial (upper) and abaxial (lower) surfaces. Needle width was measured in magni ed images using the line measure tool in the Leica software. Then we calculated the stomata density on each side as the number of stomata divided by 1.96*width of needle. Individual seedling means were calculated by averaging adaxial (AD) stomatal density and number of stomatal rows on both sides (AB & AD) across sampled needles. Only 42 out of 48 mother trees had enough germination to carry out these measurements across both treatments.
Genotype-phenotype association analysis The 42 individual mother trees had already been genotyped, and we used these same SNPs for the G2P association analysis, focusing on the traits signi cantly associated with drought treatments. The breeding value (BV) of a tree re ects the tendency of an individual to produce offspring with high values of that trait and is estimated by measuring relatives [16,90]. For the wet treatment traits, we use the average trait value across all members of each family in the wet treatment as the BV. For the drought response traits, we deduct the average trait value for a given family in the wet treatment from the value for each offspring of that family in the drought treatment and then use the mean difference as the BV. We used LFMM 2 (Caye et al. 2019) to run the genotype to phenotype association analysis, and then identi ed associations based on p (< 10 − 5 ) value.

Gene annotation
After identifying the signi cantly associated SNPs, we aligned the gene sequences for these regions against the nonredundant protein sequences database using UniProt to identify the gene and protein with the implemented Blastx (2.9.0+, E < 1e − 10 ). The Gene Ontology Annotation Database [91,92] was used to further identify the potential functions of the genes. If a SNP is located in the intragenic region, we performed a search by querying the anking sequence 400 bp from the beginning position of the gene.
This had to be done separately because the "start" and "end" positions given for the genes containing the introns were too far apart; no hits could be obtained by Blastx.  -1548562). The funding organizations paid the experimental fees and computing resources for this research, but did not play any role in the design of the study nor in the collection analysis and interpretation of data, nor in the writing of the manuscript.
Authors' contributions MS: Research design, performed research, analyzed data, wrote paper (corresponding author) EM: Research design, edited paper.
All authors have read and approved the manuscript. Figure 1 Location and the admixture analysis of the 223 ponderosa pine genotypes. Left: Original geographic distribution of the 223 ponderosa pine genotypes. Right: Proportion of each individual's genome allocated to "population 1" (green) and "population 2" (orange) by admixture analysis when K=2, illustrating lack of geographical isolation. Trees were subsequently treated as part of a single population.

Figure 2
Venn diagram comparing overlap in environmentally associated SNPs. The number of overlapping SNPs that are associated with four climatic variables between April 1st snow pack (PCK4); mean monthly winter precipitation (PPTW); Mean climatic water de cit (CWD), and mean minimum winter temperature (TMIN). The 50 SNPS associated with mean maximum summer temperature (TMAX) did not overlap with the other four climatic variables, and so are not included in this gure.