RNA-Seq reveals adaptive genetic potential of the rare Torrey pine (Pinus torreyana) in the face of Ips bark beetle outbreaks

The ability of tree species to adapt to water stress and increased frequency of bark beetle outbreaks with climate change may increase with population size and standing genetic variation, calling into question the resilience of small, rare plant populations. The Torrey pine (Pinus torreyana) is a rare, genetically depauperate conifer that occurs naturally in a mainland and island population in southern California. Due to recent declines in the mainland population coinciding with drought and Ips paraconfusus bark beetle outbreaks, the species would benefit from an assessment of adaptive genetic diversity. Here, we use RNA-Seq to survey gene-coding diversity across 40 individuals to (1) characterize patterns of genetic diversity in the species and (2) test for genetic differentiation between trees that succumbed to beetle attack or survived following an outbreak. Consistent with previous studies, we found few genetic variants, with most SNPs occurring as fixed differences between populations. However, we found structure within the mainland and polymorphisms segregating in both populations. Interestingly, we found differentiation in genotypes between attacked and surviving trees and 11 SNPs associated with survival status, several of which had defense-related functions. While low diversity suggests limited adaptive capacity, genetic associations with survival in functionally relevant genes suggest adaptive potential for bark beetle defense. This initial study prompts future research to explore the genetic basis of putative resistance and suggests conservation efforts should protect surviving genotypes and the full spectrum of genetic diversity across populations to preserve the evolutionary potential of the species.


Introduction
A major question in evolutionary and conservation biology is whether species will be able to adapt to changing environmental conditions. This question is particularly relevant for long-lived, immobile trees that may not migrate rapidly enough to track suitable climates (Aitken et al. 2008). In many conifer forests in western North America, prolonged drought and warmer temperatures due to climate change have weakened trees and created favorable conditions for bark beetle outbreaks, leading to large-scale tree mortality (Raffa et al. 2008;Meddens et al. 2012;Anderegg et al. 2015). The ability of tree species to adapt to ongoing beetle outbreaks and concomitant evapotranspirative stress is expected to increase with the strength of selection, population size, and the degree of standing genetic variation (Alberto et al. 2013). Large, highly outcrossing populations typical of trees tend to harbor intermediate to high standing genetic variation (Petit and Hampe 2006;Savolainen and Pyhäjärvi 2007), and may contain at least some resistant individuals, allowing potential adaptation to increasing pressure from bark beetles (Budde et al. 2016). However, for rare plants with small populations, evolutionary potential may be limited by the eventual loss of genetic variants to drift (Leimu et al. 2006). Thus, the resilience of small, threatened tree populations in the face of increasing pressure from climate-driven bark beetle outbreaks is of particular conservation concern, warranting an assessment of their adaptive genetic potential.
For tree populations to adapt to increasing beetle pressure, there must be variation in the population for resistance and such variation must be heritable (Telford et al. 2015). In conifers, resistance includes constitutive or induced mechanical or chemical cues or defenses that deter, kill, or limit the spread of beetles (Franceschi et al. 2005;Telford et al. 2015) or their fungal (Paine et al. 1997) and bacterial (Raffa 2014) symbionts. Conifer secondary chemical profiles signal species, age, stress, nutritional quality (Raffa et al. 2015), and defensive capacity (Emerick et al. 2008;Schiebe et al. 2012) and are used by beetles to locate and select suitable hosts (Raffa and Schlyter 2016). For trees identified as prospective hosts, viscous, terpene-rich oleoresin is a key defense that can physically and chemically defend against beetle attack (Franceschi et al. 2005;Keeling and Bohlmann 2006). Studies have shown that multiple chemical and resin-related traits are genetically variable and heritable (Rosner and Hannrup 2004;Yanchuk et al. 2008;Westbrook et al. 2013), indicating that genes underlying chemical biosynthesis pathways, defense structures, or pathogen detection may be subject to selection. Additionally, prolonged drought can reduce tree defenses against bark beetles (Gaylord et al. 2013;Netherer et al. 2015), suggesting that genes involved in water regulation can further contribute to host resistance if defenses can be maintained under water-limiting conditions.
Field studies have indeed demonstrated that selective tree mortality by bark beetles can shift phenotypic (Schiebe et al. 2012;Ferrenberg et al. 2014;Zhao and Erbilgin 2019) and genetic variation (Six et al. 2018) of natural conifer populations. In a recent study of two pine species that experienced a mountain pine beetle outbreak, Six et al. (2018) identified genetic differences between mature surviving trees and the general population, suggesting an evolutionary response to beetle pressure. While the genetic markers in the study were unable to pinpoint a potential resistance mechanism, genome-wide sequence data may be able to identify putative loci and associated functions differentiating trees that withstand or succumb to beetle attack.
For conifers lacking extensive genomic resources, detecting genetic associations with tree survival following a beetle outbreak presents a challenge. While whole genome sequencing is becoming increasingly tractable for many species (Fuentes-Pardo & Ruzzante, 2017), conifer genomes are exceptionally large (18-35 Gb) (Mackay et al. 2012) and consist predominately of repetitive elements (82% of the Pinus taeda genome) (Wegrzyn et al. 2014), complicating assembly for even well-studied species. RNA-Seq, however, offers a feasible approach for obtaining single nucleotide polymorphisms (SNPs) (sensu Geraldes et al. 2011;Gugger et al. 2016), by acting as sequence reduction method and capturing variants in expressed genes that may underlie adaptive phenotypes. RNA-Seq is a suitable means for studying adaptation in species of conservation interest that often lack genome assemblies.
Torrey pine (Pinus torreyana Parry) is an iconic species of pressing conservation concern that would benefit from RNA-Seq studies to assess its adaptive potential to survive bark beetle outbreaks. The rarest pine in North America, it occurs naturally in only two localities in Southern California; in coastal San Diego County (mainland population) and on Santa Rosa Island off the coast of Santa Barbara (island population). The native California five-spined ips beetle (Ips paraconfusus Lanier, hereafter Ips) poses a substantial threat to the mainland population, although beetles are absent on the island. The Torrey Pines State Natural Reserve, which protects a substantial portion of mainland trees, experienced a 12% population decline from 1988 to 1991 (Shea and Neustein 1995) and a 12% decline from 2006 to 2018 with approximately 3700 individuals remaining (Smith, personal communication). Each decline was a likely consequence of major bark beetle outbreaks, the latter of which coincided with a period of prolonged drought. The capacity for Torrey pines to adapt to re-occurring beetle outbreaks with changing climate remains in question. Allozymes (Ledig and Conkle 1983), chloroplast markers (Waters and Schaal 1991;Provan et al. 1999;Whittall et al. 2010), and the chloroplast genome (Whittall et al. 2010) indicated complete withinpopulation monomorphism and few between-population differences. However, surveying putative adaptive variation across the genome will be essential to understanding the evolutionary potential of the species.
Our goal is to conduct a preliminary study to evaluate the potential of Torrey pines to adapt to increased bark beetle pressure amid climate change. While traditionally measured as additive genetic variance, genome-wide measures of genetic diversity may be reasonable proxies for evolutionary potential, especially for polygenic traits (e.g. resistance) (Harrisson et al. 2014;Ørsted et al. 2019). However, assessing regions of the genome underlying adaptive phenotypes will be most informative (Hoffmann et al. 2017;Mable 2019). To this end, our goals were to use RNA-Seq to: (1) assess patterns of genome-wide functional genetic variation within and among Torrey pine populations as a general indicator of evolutionary potential, and (2) test for genetic differentiation between 'case' trees that succumbed to Ips attack and living 'control' trees to understand the capacity to adapt to ongoing beetle pressure. For the latter goal, we test whether cases and controls are genetically diverged at multivariate genotypes, identify specific variants associated with case/control status, and test for enrichment of functionally relevant gene ontology terms.

Field site and sampling scheme
We sampled trees at the Torrey Pines State Natural Reserve (TPSNR, the mainland population) near La Jolla/ Del Mar in San Diego County, California. The reserve occurs along coastal bluffs and comprises multiple habitat types including coastal pine forest, chaparral, and coastal sage scrub. A lagoon separates the reserve into two sites including the main reserve and the Torrey Pines Extension (hereafter TPSR and TPEX populations, respectively). Our sampling design included pairs of proximate trees with and without signs of Ips in order to reduce environmental variation that may influence beetle attack. Characteristic signs of Ips infestation included small entrance holes exuding red pitch and frass on the trunk and branches, and needle yellowing or browning throughout the tree crown as infestation progressed. We sampled tree pairs throughout the reserve to capture a representative pool of variation for diversity and structure estimates.
We sampled a total of 40 trees from October to November 2017 and from November 2018 to April 2019 ( Fig. 1, Table S1). Fall of 2017 had high Ips activity with widespread tree mortality in clusters throughout the reserve, while the season beginning in Fall of 2018 had lower Ips activity in scattered localities. In 2017, we sampled 22 trees including seven pairs of trees (n = 14) in TPSNR with and without signs of Ips, four trees throughout the reserve to capture additional genotypes, and four trees with a Santa Rosa Island origin planted prior to 1960 at the nearby Sanford Burnham Prebys Discovery Institute (SBPDI). Beginning in Fall of 2018, we sampled nine mainland tree pairs (n = 18) with and without signs of Ips. Because mortality can proceed rapidly once needle yellowing begins, we sampled beetle-colonized trees with sufficient living tissue opportunistically as we identified infested trees. Where possible, we matched beetle-colonized trees to nearby asymptomatic trees of similar diameter at breast height (DBH), and in the second season, we matched environmental variables between tree pairs using 2018 census data provided by TPSNR and geographic information system data. Tree pairs occurred within approximately 30-50 m of one another (or up to 150 m for more isolated trees). A multiple factor analysis indicated comparable size, spatial, and environmental conditions between groups ( Fig. S1) despite sampling across a heterogeneous landscape.
In November 2019, we re-surveyed all 40 trees, assigning dead trees with signs of Ips infestation as 'cases' and living trees as 'controls.' Some controls had potential signs of past beetle attack limited to isolated branches, though no evidence of ongoing infestation. One Ips-infested individual identified in 2017 was still living in 2019 and one asymptomatic individual had succumbed to beetle attack by 2018, and they were reassigned as a 'control' and 'case', respectively. A second bark beetle, Dendroctonus valens LeConte (red turpentine beetle) also occurred extensively throughout . Colors indicate population while shape reflects case/ control status. 'Unpaired' trees indicate additional trees sampled to capture genetic diversity across the species but were not included in association tests. Map created in the R package ggmap (Kahle and Wickham, 2013) the reserve, although it is largely restricted to the lower tree bole. Prior to sampling, 21 of 40 trees displayed signs of Dendroctonus including 13 case trees and 7 control trees (infestation could not be determined for three trees whose lower bole was covered by shrubbery).
For each of the 40 trees, we sampled both twig and needle tissue to maximize transcript diversity. We selected small, terminal branches healthy in appearance when possible, although case trees were in various stages of Ips colonization. From each branch, we sampled needles and twig segments 15-20 cm behind the oldest needle growth. We preserved tissues on liquid nitrogen or dry ice prior to transporting and storing at − 80 °C at the San Diego Zoo Wildlife Alliance until further processing.

RNA extraction, library preparation, and sequencing
We extracted total RNA from twig and needle tissue using the Spectrum Plant Total RNA Kit (Sigma-Aldrich, St. Louis, MO) according to the manufacturer's protocol. For twigs, we used a razor blade to remove the outer bark layer to reduce lichen and other potential contaminants. We verified RNA quality and quantity using a Nanodrop One spectrophotometer (Thermo Scientific, Waltham, MA), Qubit BR Assay kit (Life Technologies, Waltham, MA) and an Agilent 2100 Bioanalyzer, and cleaned extracts with AMPure XP beads as needed. Of two cases with dry twigs, needles were exclusively (sample tpsr0195) or predominately (sample tpsr0912) used for sequencing. However, RNA integrity number (RIN) scores did not differ significantly between groups (cases: 7.1-9.9, mean 8.43; controls: 7.2-9.2, mean 8.35; mixed-effects ANOVA, sample as random effect, F 1,30 = 3.16, P = 0.578).
We prepared mRNA libraries with a TruSeq Stranded mRNA Kit (Illumina, San Diego, CA) and unique dual index adapters following the manufacturer's protocol (October 2013). Libraries were PCR amplified for 15 cycles, quantified by Qubit, and validated on an Agilent 2100 Bioanalyzer. We performed double size selection with AMPure XP beads as needed to reduce minor secondary peaks in the Bioanalyzer trace. We sequenced final libraries as paired end 75 bp reads across four runs on an Illumina NextSeq500 platform at SBPDI (San Diego, CA). We included six libraries as technical replicates across runs to control for potential batch effects and re-sequenced seven low coverage samples (Table S1).

Read filtering, mapping, and variant calling
We filtered raw, demultiplexed fastq files using the HTStream pre-processing pipeline (https:// bioin forma tics. ucdav is. edu/ softw are). We removed 28 s and 18 s Pinus taeda rRNA, PCR duplicates, adapters, and poor quality sequence from 3ʹ read ends (with quality score < 20 in a sliding window of 10 bp). Finally, we removed remaining N characters and any processed reads under 50 bp.
Following the Genome Analysis Tool Kit (GATK) best practice guidelines for RNA-Seq short variant discovery (https:// gatk. broad insti tute. org/), we aligned reads to the Pinus taeda genome v2.01 (Zimin et al. 2017) downloaded from the TreeGenes database (Falk et al. 2018) with modified steps as follows. Due to the large and fragmented nature of the genome assembly (22 Gb, 1.76 million contigs), we used a custom perl script (available upon request) to sort contigs by size and concatenate them into 222 ∼100 Mb 'pseudocontigs' with contigs separated by 10 N's. Using two-pass mode with the splice aware aligner STAR v2.6.0 (Dobin et al. 2013), we aligned processed reads to the P. taeda pseudocontigs with settings -genomeSuffixLength-Max 1000, -outSJfilterCountTotalMin 9 3 3 3, and -align-IntronMax 160,000, where maximum intron length was based on estimates for P. taeda (Wegrzyn et al. 2014). We used Samtools v1.9 (Li et al. 2009) to retain only uniquely mapping reads and to remove PCR/optical duplicates, secondary or supplementary alignments, reads failing QC, and unpaired reads.
To call variants, we ran two iterations of GATK v4.1.4.1 HaplotypeCaller and GenotypeGVCFs with option -standcall-conf 30 and base quality score recalibration (BQSR). During the first variant calling iteration, we applied hard filtering thresholds in GATK to remove SNPs with the following INFO fields: QD < 5.0, FS > 30.0, ReadPosRank-Sum < − 4.0 or > 6.0, BaseQRankSum < − 4.0. We then used VCFtools (Danecek et al. 2011) to remove SNPs with minimum mean depth < 5 and call rate < 50%, resulting in over 419 K SNPs for BQSR. During the second iteration, we performed hard filtering as above with an additional filter to remove SNPs with ExcessHet > 16.43 (Z-score 2). Using VCFtools, we set genotypes with depth (DP) < 5 and genotype quality (GQ) < 20 to missing, and removed sites with mean DP < 15, minor allele frequency < 0.05, > 2 alleles, mean GQ < 44, and call rate < 95%. We modified further filtering steps from the dDocent pipeline (http:// www. ddoce nt. com/ filte ring/) to remove sites with high read depth that may contain potential paralogs (all loci with DP > 2 standard deviations (SD) from the mean; loci with DP > 1 SD from the mean if QUAL < 2 × DP). Finally, we used BCFtools v1.9 to prune SNPs in linkage disequilibrium (r 2 > 0.5 in a window of 5 SNPs) using the original genome coordinates. We subset SNPs with and without LD-pruning by population for use in downstream analyses (Table 1).

Kinship, population structure, and genetic diversity
Our first goal was to assess functional genetic diversity within and among populations of Torrey pines. To test for family groups that could result from sampling proximate case/control pairs, we estimated relatedness among samples with the KING inference method in VCFtools (Manichaikul et al. 2010). We then assessed genetic differentiation controlling for kinship through Principal Components Analysis in Related samples (PC-AiR) (Conomos et al. 2015) in the R package GENESIS v.2.12.4 (Gogarten et al. 2019) for all individuals, mainland individuals, and case/control pairs using the LD-pruned SNP set. PC-AiR first identifies an unrelated subset of individuals in the data representing all ancestries. It then uses those individuals to infer principal components (PCs) and projects the remaining samples onto those PCs based on genetic similarity. We ran one to two iterations of PC and kinship estimation with a scaling constant of 1 when estimating kinship, a kinship threshold of 0.0884 (second degree relatives) for identifying unrelated individuals, and two ancestry informative PCs to estimate kinship accounting for ancestry.
For each sample group, we calculated genetic diversity and structure estimates for all individuals and for unrelated individuals identified by PC-AiR. We estimated the mean expected (H E ) and observed multi-locus heterozygosity (H O ) per population with the R package Adegenet v2.1.1 (Jombart et al. 2010) and the average inbreeding coefficient (F) per individual within population with VCFTools. Using the R package HIERFSTAT (Goudet 2005), we calculated pairwise F ST based on Weir and Cockerham's θ (1984). We performed all analyses in R v.3.5.2 (R Core Team 2018) using the total LD-pruned SNP set.

Multivariate analysis of genetic divergence between cases and controls
To test for divergence between multivariate genotypes of cases and controls, we employed Discriminant Analysis of Principal Components (DAPC) in Adegenet. To control for population structure, we first performed DAPC on the LDpruned SNP set with population (TPEX or TPSR) as group for the 32 paired samples. We then regressed the full SNP set on the first discriminant function (DF) summarizing 2 PCs from the population model. We used residuals from the regression in a second DAPC model with case/control as group and missing data set to the mean. To select the optimal number of PCs for the DF, we performed cross validation, assigning 80% of the data to a training set and the remaining 20% to a validation set in a stratified random manner, with 50 replicates. Highest mean success and lowest root mean squared error estimates indicated an optimum of 12 PCs. To determine whether the difference between case and control groups was significant, we ran a MANOVA Pillai test with 12 PCs as the dependent variable. We repeated DAPC for subsets of the data dropping a PCA outlier or samples with high missing data. All analyses were limited to SNPs with a 95% call rate across the case/control dataset.

Association of individual SNPs with case/control status
To identify individual SNPs associated with case/control status, we ran the linear mixed model GEMMA which calculates an exact association test statistic (Zhou and Stephens 2012). We selected a 'centered' covariance matrix to control for hidden kinship and population stratification and included latitude, longitude, and population (TPEX, TPSR) as additional covariates. Only SNPs with a 95% call rate among the 32 paired samples were included, amounting to 854 LD-pruned SNPs to estimate the relatedness matrix and 1526 SNPs from the full SNP set for the association test. We repeated the association test separately for TPSR, TPEX, and subsets of the data excluding a PCA outlier alone or in combination with samples with high missing data. Using the R package qvalue v.2.14.1, we adjusted P-values from the likelihood ratio test by employing the false discovery rate method to correct for multiple testing (Storey and Tibshirani 2003). Additionally, PCs for differing tiers of 'top SNPs' in the tail of the distribution were visualized based on estimates from the R package

Functional annotation and gene ontology enrichment analysis of top SNPs
We matched SNPs to functional annotations for 51,751 gene models in the Pinus taeda reference genome from the TreeGenes Database (Pita.2_01.final_annotations_lvl0.tsv downloaded April 2020: https:// treeg enesdb. org/ FTP/ Genom es/ Pita/ v2. 01/ annot ation/). The top 5% of SNPs tested in GEMMA lacking a functional annotation in the reference were supplemented with custom annotations (Table S4) To identify GO terms overrepresented among SNPs associated with case/control status, we performed singular enrichment analysis in AgriGO v2.0 (Tian et al. 2017). We ran two sets of enrichment tests, using either GO terms from the P. taeda reference annotations or from custom annotations with EnTAP (633 vs 1170 of 1526 SNPs with at least one GO term, respectively). If a SNP matched multiple isoforms, we selected GO terms from the longest gene or transcript. We separated gene ids and corresponding GO terms into a query set (SNPs identified in the top 5% or 10% of the distribution of the GEMMA association test) and a reference set (all SNPs tested in GEMMA). Only one SNP per gene was considered. We performed Complete GO enrichment analysis separately for biological process, cellular component, and molecular function levels. To assess significance, we used a Fisher's exact test for GO terms matching at least 5 genes and adjusted for multiple testing with the Benjamini and Hochberg (1995) false discovery rate (FDR) method.

Raw reads, mapping, and variant calling
We obtained approximately 14-54 million paired-end reads per sample after quality filtering. On average, 70.2 ± 3.8% of reads mapped uniquely to the Pinus taeda reference genome per library (excluding one degraded library), covering 23.84 million base pairs at greater than 15× average coverage per sample. GATK Haplotype Caller produced over 4.75 million raw SNPs, of which 9599 total biallelic SNPs and 4192 LD-pruned SNPs remained after filtering (Table 1). Minimum depth (DP), genotype quality (GQ), and minor allele frequency filters had the largest effect on number of SNPs retained. The final SNP set exhibited a transition/transversion ratio of 1.51, high mean coverage (71.7 ± 21.9×) and generally high mean GQ (86.43 ± 8.14) and low mean number of missing SNPs per sample (1.04 ± 1.49%, Fig. S2). The number of missing SNPs increased with decreasing DP and GQ, but was not associated with heterozygosity (results not shown). There was no significant difference in DP, GQ, and number of missing SNPs between cases and controls (results not shown), although two cases had an elevated number of missing SNPs (less than 6.5%; Fig. S2C).

Kinship, genetic diversity, and population structure
Consistent with previous studies, Torrey pines displayed low genetic variation within and between populations. Of the 9599 SNPs called across samples, most were fixed differences between the island and mainland populations (Table 1). Only 1888 SNPs were present among 36 mainland individuals and 750 SNPs were present among four island individuals with a minor allele count > 1 (Table 1) (Table 1), despite greater sampling across a wider spatial area (Fig. 1). Surprisingly, although SRI had the lowest H E among groups (0.036), it had higher H O (0.064) and lower inbreeding (-0.565) than either mainland population (Table 1). F ST between the island and mainland was extremely high (0.915) and was high between TPEX and TPSR (0.523) ( Table 2).
To account for potential bias due to hidden kinship, we tested for relatedness among samples and re-estimated genetic diversity and structure among unrelated individuals. The KING inference method indicated relatedness between samples within the island and mainland and within TPEX and TPSR (Fig. S3). However, limiting estimates to unrelated individuals (beyond second degree relatives) identified by PC-AiR had little effect on genetic diversity statistics (Table S2) and F ST (Table S3). Furthermore, PC-AiR plots accounting for hidden kinship revealed distinct clustering within the island and mainland populations along PC1 (84.66% of variation explained; Fig. 2a). When only the mainland was considered, TPSR and TPEX separated into distinct populations along PC1 (44.37% of variation explained; Fig. 2b), with TPEX more diffusely distributed along PC2.
We next examined genetic structure and diversity in cases and controls to ensure that case/control status was not confounded with population structure. PC-AiR indicated no structure between cases and controls (Fig. 2c), which was similarly reflected by low F ST (− 0.013, Table 2). H E , H O , and inbreeding were similar for cases and controls, while greater H E than H O in both groups reflected population structure between TPEX and TPSR ( Table 1). When we considered only unrelated individuals, similar patterns emerged (Table S2,

Multivariate analysis of genetic divergence between cases and controls
The discriminant analysis showed differentiation between multivariate genotypes of cases and controls after controlling for population (Fig. 3). Discriminant function (DF) 1 was largely non-overlapping between groups (Fig. 3a) and the 12 PCs summarized by DF1 were significantly different between cases and controls (F (12, 19) = 2.567, P = 0.0322, Pillai's trace = 0.619) though the assumption of no outliers was not met. Generally, membership probability to group reflected the case/control status of the sample (Fig. 3b). Individuals were correctly assigned to group with an 84.38% probability, with higher assignment probability to controls (93.75%) than cases (75%). Results were largely similar after dropping a PCA outlier alone or in combination with two samples with higher missing data (results not shown).

Association of individual SNPs with case/control status
The GEMMA mixed model identified 11 SNPs significantly associated with case/control status after correction for multiple testing (Table 3, Fig. S4a). It should be noted, however, that the P-value distribution was not uniformly distributed (Fig. S4b, c), which could indicate genome-wide differences unaccounted for between cases or controls or alternatively, could reflect small sample size or number of SNPs. To obtain additional support for variants associated with status, we tested for SNPs consistently arising in the top 5% (76 SNPs) of the P-value distribution for each of four GEMMA models based on subsets of the data (TPEX, TPSR, or excluding a PCA outlier alone or in combination with two samples with high missing SNPs). We identified 15 shared SNPs, many of which overlapped the 11 significant SNPs, amounting to 19 SNPs (Table 3). All SNPs were within HWE in TPEX (P > 0.05), TPSR (P > 0.05), and the mainland population (P > 0.005). PCA indicated clear separation between cases and controls when plotting one SNP per gene among the top 19 SNPs (Fig. 4a) and among SNPs in the top 5% of the distribution (Fig. 4b), indicating their utility for differentiating cases and controls. Among the latter SNPs, population separated individuals along PC2, although it explained less variation in the data than case/control status along PC1. Among top SNPs, one group (cases or controls) tended to be homozygous at a given locus while the other group was predominately heterozygous (Fig. S5).

Functional annotation and gene ontology enrichment analysis of top SNPs
The top 19 SNPs occurred within 11 putative genes, six of which matched annotated gene models in the P. taeda reference and four of which were supplemented with functional annotations in the EnTAP pipeline (Table 3). The most significant SNP occurred in a probable aquaporin gene with GO terms including 'water transport' and 'response to osmotic stress' among others. Notably, genes with potential defense roles included cytochrome P450, DMR6-like oxygenase, polygalacturonase, peroxidase, and a major allergen (a pathogenesis-related protein). Remaining genes were either uncharacterized or displayed GO terms related to metabolic processes.   Table 3 with jitter of 0.005 to visualize overlapping points and b SNPs in the top 5% of the distribution. Only SNPs with the smallest P-value per gene are displayed Gene ontology enrichment analysis suggested that defense response variants may be related to case/control status, although results were not significant after correction for multiple testing. For both analyses using Pinus taeda or custom EnTAP annotations, the top 10% of SNPs tended to show enrichment for biological process GO terms related to response/defense response to bacteria, fungi, other organisms, and stimuli, with cell death additionally among top terms in the latter group. The top 5% of SNPs displayed similar but weaker signals of enrichment. No cellular component or molecular function terms were overrepresented, except for a molecular function term related to oxidoreductase activity among the top 5% of EnTAP annotations (FDR < 0.10). Because tests are limited to annotated SNPs, improved annotations across a larger number of variants would be required to detect definitive signals of enrichment.
Given that aquaporins and many defense-related genes belong to large gene families, we tested whether enriched GO terms could indicate variants in paralogous loci differentially expressed between cases and controls that were merged during read mapping. To test for multi-locus contigs (sensu McKinney et al. 2017;Rellstab et al. 2019), we ran a second SNP filtering pipeline with several pre-filtering steps followed by exclusion of (1) entire exons containing SNPs with excess heterozygosity (EH) and (2) SNPs with allele balance (AB) deviations from the expected 0.50 ratio (Supplemental methods). We then ran a GO analysis of SNPs that failed the EH and AB filters relative to the background set of pre-filtered SNPs. SNPs with EH and AB violations that could indicate potential multi-locus contigs were found among top SNPs and were significantly enriched for five GO terms, including terms related to defense repose and cell death (Supplemental results, Table S5). However, to make meaningful interpretations of our results possible, we retained our original set of 9599 SNPs to avoid discarding biologically relevant defense-related SNPs that are central to our question.

Discussion
With changing climate, conifer populations are predicted to face increased evapotranspirative stress and frequency of bark beetle outbreaks, warranting an evaluation of their ability to adapt to such challenges. Using RNA-Seq to survey functional regions across the genome of the rare Torrey pine, we found low genetic diversity within the species, predicting low adaptive capacity. However, our finding of genetic associations with Ips-colonization status and a suggested pattern of defense-related GO functions among top SNPs suggests potential selection by bark beetles on genetically-based traits and at least some adaptive potential for resistance. These initial findings prompt further controlled studies to dissect genetic variation underlying potential host resistance phenotypes and have implications for management of the species to support its persistence into the future.

Low genetic diversity for a pine with high population structure
Consistent with previous studies on Torrey pines, we found few genetic variants (9599 SNPs) expressed in transcripts across populations, with most SNPs found as fixed differences between the island and mainland. Expected and observed heterozygosity were low within the island and two mainland populations (H E of 0.036-0.053, H O of 0.039-0.064) while F ST was extremely high between the island-mainland and TPEX-TPSR groups (0.915 and 0.523, respectively). This structuring of genetic variation is unusual for outcrossing trees, typically characterized by high standing genetic variation partitioned largely within rather than among populations (Hamrick et al. 1992). For example, various locally and widely distributed conifer species (Dillon et al. 2013;Mosca et al. 2014;Nadeau et al. 2015) displayed H E and H O of 0.20-0.40 per population, with among-population structure below 10% based on EST-derived SNPs. On the other hand, Pinus resinosa, another genetically depauperate conifer, displayed high population divergence (F ST = 0.28) in microsatellites but still less than Torrey pine (Boys et al. 2005). The lack of diversity and high structure in Torrey pines may be due to one or more extreme bottlenecks, a founder event leading to colonization of the island, and genetic drift following population divergence (Ledig and Conkle 1983).
Novel to our study were findings that (1) polymorphisms are segregating within populations albeit at a low level, and (2) there is population structure between TPEX and TPSR within the mainland. The discovery of within-population polymorphisms is not wholly unexpected, given that we examined gene-coding regions on a genome-wide scale. Previous studies examining 59 allozyme loci (Ledig and Conkle 1983), chloroplast markers (Waters and Schaal 1991;Provan et al. 1999), and the chloroplast genome (Whittall et al. 2010) found complete monomorphism within populations. Fewer variants are expected in allozymes and in the haploid, non-recombining chloroplast that evolves at slower rates than the nuclear genome (Wolfe et al. 1987). Our finding of genetic structure between TPEX and TPSR may largely reflect restoration efforts in the mainland in the early 1900's and more recently in 1994 to replace trees lost during a preceding beetle outbreak (Ledig et al. 1998). Given that TPEX had higher heterozygosity and lower inbreeding than TPSR despite fewer individuals sampled, it is possible that individuals reintroduced into TPSR originated from fewer maternal sources than TPEX, that TPEX experienced less intervention, or that we disproportionately sampled reintroduced kin in TPSR. Increased kinship in TPSR could elevate structure between populations, though broader sampling across populations could determine whether these patterns hold on a reserve-wide scale. Additionally, the ~ 1.3 km wide lagoon separating these populations may restrict gene flow. While wind-dispersed pollen is expected to move widely, seed dispersal is often restricted (Latta et al. 1998) and can result in genetic structure (Schuster and Mitton 2000). In Torrey pines, large seeds are adapted for animal dispersal, and jays dispersed 73% of seeds within 50 m in one experimental study (Johnson et al. 2003). Structure on the mainland may thus reflect a combination of elevated kinship among planted trees and restricted seed dispersal across the lagoon.

Genetic differentiation between groups in relevant genes suggests selection by beetles
The discriminate analysis and association test indicated multivariate and single SNP differences between cases and controls, suggesting that beetles are responding in part to genetically controlled differences in the tree population. Beetles may be selectively avoiding well-defended trees or may fail to colonize resistant trees if they attempt attack. Although controlled experiments would be required to elucidate the mechanisms underlying potential host resistance, functional annotations of top genes suggest that defenses and water regulation may play a role. In the GO analysis, while not significant, responses against bacteria, fungi, and other organisms tended to be represented among top genes. Moreover, five of the top 19 genes from the association test have roles in biotic responses. Cytochrome P450, for example, belongs to a well-known gene family involved in synthesis of insect-defensive chemicals through the terpenoid and phenylpropanoid biosynthesis pathways (Keeling and Bohlmann 2006). Polygalacturonases play a role in cell wall remodeling in response to biotic stresses (Sénéchal et al. 2014). DMR6-like oxygenase, peroxidase, and major allergen, alternatively, are involved in active microbial defenses (Kolosova et al. 2012;Zeilmaker et al. 2015). Because bark beetles have known associations with fungi (Paine et al. 1997) and bacteria (Boone et al. 2013) that facilitate host colonization, microbial defenses in conifers could contribute to collective insect resistance. In a previous study in Torrey pines, Zavarin et al. (1967) found little indication that constitutive oleoresin chemistry contributes to resistance variation within populations. However, resistance can be achieved through multiple mechanisms, and our findings suggest other pathways could account for genetic differences between groups.
Notably, the SNP most significantly associated with case/control status was found in a putative gene encoding an aquaporin, a membrane channel protein involved in water, solute, and gas transport. In sweet orange, multiple aquaporins were down-regulated in susceptible genotypes infected with a common citrus pathogen, suggesting that aquaporins moderate disease progression by altering water and nutrient transport involved in normal growth (de Paula Santos Martins et al. 2015). Aquaporins may also facilitate the transport of H 2 O 2 into the cytoplasm, triggering immunity pathways (Tian et al. 2016). In general, drought is known to predispose trees to beetle attack (Netherer et al. 2015), suggesting a benefit to maintaining water potential and defenses under water limitation.
While our findings indicate a genetic basis contributing to bark beetle attack, a few caveats are warranted. Genetic differentiation between cases and controls could occur in merged paralogs, given that aquaporins and many defenserelated genes belong to large gene families. Mapping transcripts to a reference genome of another species, particularly a fragmented genome like that of Pinus taeda, can present challenges for multi-gene families. To account for this, we retained only uniquely mapped reads and applied stringent SNP filters to minimize the incidence of merged paralogs. However, supplemental analysis showed that SNPs with excess heterozygosity and allele balance deviations, signals of potential paralogs, were enriched for defense and cell death functions. While maintaining SNPs in biologicallyrelevant genes allowed us to better assess a large suite of genetically-controlled traits to which beetles may respond, our results should be interpreted cautiously. Genes involved in putative adaptive responses to insects or cell death following beetle colonization are also likely to be differentially expressed between cases and controls. If these differentially expressed genes belong to multi-gene families that map to the same locus, it could generate false genotypic differences between cases and controls. In order to disentangle true SNPs from potential artifacts of differential gene expression and paralogs, a high-quality reference genome and de novo transcriptome assembly for Torrey pine supplemented with long reads would be beneficial.
It is also worth noting that changes in bark beetle dynamics over time due to environmental and biotic factors can alter selection on hosts (Raffa et al. 2008) and thus the genetic composition of cases and controls. Given the larger Ips outbreak among Torrey pines in 2017 than 2018 and the positive relationship between beetle abundance and the ability to overwhelm vigorous hosts (Raffa 2014), selection may have favored trees with greater resistance in 2017. Additionally, biotic interactions can influence beetle dynamics and host selection. Dendroctonus valens was another bark beetle prevalent throughout the reserve that may have affected Ips colonization. In a long-term study of Pinus resinosa (Aukema et al. 2010), probability of attack by two Ips species increased with the number of Dendroctonus colonizing a given tree. While all trees with successful Ips colonization eventually died, mortality was unlikely if Dendroctonus was present without Ips. In Torrey pines, Dendroctonus may similarly affect tree ability to withstand Ips attack and given more time, Torrey pines colonized by Dendroctonus may eventually succumb to Ips. However, we found no further Ips-related mortality among control trees through late 2019, although we re-assigned one control that died by 2018 as a case. Because mortality proceeds rapidly following successful Ips colonization, it is unlikely that there were other asymptomatic, successfully Ips-infested trees among our control samples. Nevertheless, our study assessed selection over a finite outbreak period, and changes in the environment and biotic interactions over time can influence beetle behavior, host susceptibility, and the consequent genetic pool of individuals that survive or succumb to beetle attack. Future studies should link genetic variation to specific resistant phenotypes under different treatments to better understand tree capacity to respond to various degrees of selection pressure.
A final caveat is our study's small sample size. While human genome-wide association studies assessing the genetic basis of disease typically include thousands of individuals (Zondervan and Cardon 2007), studies of adaptive traits in plants may require many fewer samples since positively selected alleles underlying these traits are likely to occur at higher frequency (Ingvarsson and Street 2011;Kloth et al. 2012). This is especially true for pathogen resistance which is often controlled by a few genes of large effect, although the genetic architecture of insect resistance is less well characterized (Broekgaarden et al. 2011). Using a moderate sample size of 96 to 192 inbred Arabidopsis lines, for example, Atwell et al. (2010) identified common alleles of large effect associated with 107 adaptive phenotypes, including 23 pathogen and insect defense-related traits. Similarly, in a study of mountain pine beetle (MBP) susceptibility in Pinus contorta, Cullingham et al. (2020) identified one candidate gene (of 17 initial candidates) significantly associated with attack status in a sample of 356 attacked and unattacked trees. A study by Six et al. (2018) also identified genetic differentiation between survivors and the general population of younger trees among 66 Pinus albicaulis and 40 P. contorta trees sampled after an MPB outbreak. While Atwell et al. (2010) argues for greater sample sizes to increase statistical power, these studies demonstrate that even moderate sample sizes have utility to detect meaningful genetic associations for further study. Here, using a small sample of 32 Torrey pines, we identified genetic variation associated with survival following an Ips outbreak. Our preliminary work prompts large-scale follow-up studies to uncover and validate additional putative variants that may contribute to resistance.

Adaptive potential of Torrey pines
A principal goal of our study was to assess the adaptive genetic potential of Torrey pines, both to general selection pressures and ongoing bark beetle outbreaks. In the absence of information on specific traits involved in adaptation, genome-wide diversity can serve as a proxy for adaptive potential (Ørsted et al. 2019), and is especially relevant for polygenic traits (Harrisson et al. 2014). Low genetic variation across gene-coding loci in Torrey pines relative to other conifer species suggests limited capacity to respond to selection in a changing environment (Ørsted et al. 2019). TPEX, with the highest expected heterozygosity, may have the highest adaptive capacity of the populations. However, given strong genetic differentiation among TPEX, TPSR, and the island, the full evolutionary potential of Torrey pines is captured across the species distribution.
Our findings suggest that despite low genetic diversity, Torrey pines may have some capacity to respond adaptively to ongoing bark beetle pressure. Genetic differences between case and control trees in functionally relevant genes suggest that there is heritable variation in the population for traits that influence beetle attack. Variation at specific functional loci may be better predictors of fitness and a species evolutionary potential than genome-wide diversity (Teixeira and Huber 2021). Other systems have revealed that variation in adaptively important genes can be maintained even when neutral genetic diversity is low (Robinson et al. 2016;Mable 2019). In the case of defense adaptations, costly resistance variants may be maintained at low frequency due to past selection by bark beetles or exposure to similar herbivorous insects and pathogens (Budde et al. 2016). In theory, as beetle outbreaks increase in frequency and severity, selection against susceptible individuals can increase the frequency of resistance alleles and the resulting average fitness of the population (McKinney et al. 2014). However, if only a small percentage of the population has resistance, high mortality during outbreaks may lead to population bottlenecks, drift, and inbreeding (McKinney et al. 2014;Budde et al. 2016). Torrey pines have an uncertain coevolutionary history with Ips paraconfusus, and the native bark beetle is only suspected of being a significant mortality agent in pines since the 1940's (Murray et al. 2013). It is possible that unknown historic selection events by Ips or other insects and pathogens could have maintained resistance variants in Torrey pines that can increase in frequency with new outbreaks. Potential resistance variation may be lower in the naïve island population where bark beetles have been historically absent, unless other important selective agents have co-evolved with this population.

Future directions and conservation implications
We present an initial study using RNA-Seq as a conservation tool to assess gene-coding diversity in the rare Torrey pine as a proxy for adaptive potential to respond to environmental stressors, with a focus on pressure from Ips paraconfusus bark beetles. Low genetic diversity across the genome indicates limited evolutionary potential for the species. However, genetic differences between case and control trees associated with beetle attack suggests that beetles may be responding to genetically controlled traits, indicating potential for the population to shift towards more adaptive phenotypes. Putative candidate genes for resistance are related to defense against microorganisms and water regulation.
We caution that our findings are preliminary but provide reason to further explore the putative genetic basis of resistance in a common garden setting. A long-term common garden would allow screening for multiple resistance phenotypes, assessing their genetic basis, and evaluating the prevalence and degree of resistance in the island and mainland populations with contrasting Ips outbreak histories. Additionally, water treatments could disentangle the contribution of genotype and the environment to resistance phenotypes. A robust genome assembly for Torrey pine, when more technically and economically feasible, would benefit these analyses. It would further facilitate the discrimination of large gene families and allow one to test whether differential gene expression and paralogs are confounding genetic variation in the present study.
From a conservation standpoint, our finding of population structure suggests that genotypes across all populations are needed to preserve the evolutionary potential of the species. Given low genetic diversity and small population size, mixing individuals from TPEX and TPSR may minimize inbreeding and loss of alleles to drift. To validate this approach, fitness consequences of experimental crosses can be evaluated in a common garden. Interestingly, Hamilton et al. (2017) have demonstrated increased fitness among island-mainland F 1 hybrids relative to either parent population in growth and fecundity. Future studies can explore to what extent growth-defense trade-offs affect resistance. Furthermore, given that the mainland population has experienced multiple severe beetle outbreaks, the surviving pool of individuals may theoretically contain an increased proportion of resistance variants. Following further genomic assessment and validation, resistance variants from a large number of maternal sources should be represented in seed bank collections and reintroduction efforts. Finally, methods of long-term storage of viable germplasm for regenerating populations can provide increased assurance that the full extent of genome-wide diversity of the species is available in the future. By maximizing genome-wide and adaptive diversity, this and future studies can help preserve the evolutionary potential of the rare and iconic Torrey pine to respond to continued bark beetle outbreaks and other stresses with climate change.