Exome resequencing and GWAS for growth, ecophysiology, and chemical and metabolomic composition of wood of Populus trichocarpa

doi:10.21203/rs.2.9589/v1

Download PDF

Research article

Exome resequencing and GWAS for growth, ecophysiology, and chemical and metabolomic composition of wood of Populus trichocarpa

https://doi.org/10.21203/rs.2.9589/v1

This work is licensed under a CC BY 4.0 License

Journal Publication

published 20 Nov, 2019

Read the published version in BMC Genomics →

You are reading this older preprint version

Read the latest preprint version →

Background: Populus trichocarpa is an important forest tree species for the generation of lignocellulosic ethanol. Understanding the genomic basis of biomass production and chemical composition of wood is fundamental in supporting genetic improvement programs. Considerable variation has been observed in this species for complex traits related to growth, phenology, ecophysiology and wood chemistry. Those traits are influenced by both polygenic control and environmental effects, and their genome architecture and regulation are only partially understood. Genome wide association studies (GWAS) represent an approach to advance that aim using thousands of single nucleotide polymorphisms (SNPs). Genotyping using exome capture methodologies represent an efficient approach to perform GWAS.

Results: A GWAS using 461 P. trichocarpa clones, representing 101 provenances collected from Oregon and Washington, and 813K single nucleotide polymorphisms (SNPs), identified a variable number of significant SNPs in association with the assessed traits. Associated single-markers (q< 0.1) ranged from 3 to 110 per trait. The SNPs had a cumulative effect of up to 40.6% of the phenotypic variation of any given trait. Similarly, multiple-marker analyses detected between 16 and 291 significant windows for the phenotypes. The SNPs resided within genes that encode proteins belonging to different functional classes as well as in intergenic regions.

Conclusion: SNP-markers within and proximal to genes associated with traits of importance for biomass production were detected. They contribute to characterize the genomic architecture of P. trichocarpa biomass required to support the development and application of marker breeding technologies.

Epigenetics & Genomics

Populus

GWAS

sequence capture

growth

stable isotopes

lignin

cellulose

wood metabolome

Populus species and their hybrids are suitable feedstocks for second-generation biofuel production due to their rapid growth rates and favorable cell wall chemistry [1, 2]. In particular, the model species Populus trichocarpa Torr. & A. Gray (black cottonwood), native to western North America, has been used in breeding for generating commercial cultivars [3]. Biomass yield and chemical quality of P. trichocarpa cultivars, as well as their improvement, depend on multiple biological and environmental factors [4]. Considerable variation has been observed in P. trichocarpa for complex traits related to growth, phenology, morphology, ecophysiology and wood chemistry [5-8]. These phenotypes include diameter and height [9, 10], bud set and flush [6, 11], water-use efficiency (WUE) [12, 13], secondary xylem composition [14] and wood metabolome [5].

Association analyses based on SNPs have been applied in recent years to identify polymorphisms controlling variation in complex traits of interest for biofuel production in Populus species [14-18]. Different approaches (candidate gene or GWAS) as well as genotyping platforms have been used, with single SNP-markers accounting for a low percentage of the phenotypic variation (1-8 %) in studied traits. These results support the polygenic nature and complexity of inheritance patterns and justifies increasing efforts to elucidate the genomic basis controlling those phenotypes.

Among “next-generation” sequencing alternatives, genome complexity reduction by sequence capture represents an efficient approach to performing genome wide analysis [19]. This method can identify both genic and intergenic regions for selection as well as investigations into the diversity, population structure and demographic history of unstructured natural populations among others [20]. Particularly in Populus species, which have experienced a whole-genome duplication event [21], this probe-based analysis can circumvent problems arising from the presence of paralogous genes [22]. It was demonstrated by the application of an exome capture approach for analyzing the genomic architecture of clinal variation in P. trichocarpa [23].

In the present study, we employ exome capture for genotyping and performing a GWAS in a P. trichocarpa population of 461 clones from 101 provenances collected from the US Pacific Northwest (Oregon and Washington). Representatives of these clones were established in a clonal trial in California and characterized, both by traditional field measurements and high-throughput phenotyping, in describing a suite of traits involved in biomass production and wood chemical composition. We coupled these phenotypic measures with sequence capture-based genotyping to identify SNPs underlying observed trait variation. Understanding genetic variation at a genome-wide scale is fundamental for developing genome-based breeding technologies suitable for supporting the development of genetically improved plantations for bioethanol production.

Association population and growth conditions

The association population was comprised of a set of 461 P. trichocarpa clones. These represented 101 provenances, within 14 river systems located west of the Cascade Mountains in Oregon and Washington between 48°54’ N latitude (Nooksack River, Whatcom County, Washington) and 43°47’ N latitude (Middle Fork, Willamette River, Lane County, Oregon) collected by GreenWood Resources [14]. A clonal trial was established at the University of California, Davis, California (38°32’42” N, 121°47’42” W) for phenotypic measurements and sample collection. That trial has been described previously [5]. Briefly, plants were produced from rooted containerized cuttings and established in an array of 1.83 x 1.83 m, following a randomized block design, with three blocks and one ramet per clone per block.

Growth and phenology measurements

Three growth parameters (diameter at breast height [DBH], total height [h] and volume index [Vol]), at age two, were measured in October 2010, as described by Guerra et al. [5]. In particular, Vol was estimated as Vol= π (DBH/ 2)²× h [11]. Additionally, days to bud flush (DBF), were recorded every three days from March to April 2011, as indicated previously [5].

Chemical composition and metabolome of wood

Wood cores collected from tree stems (0.3 m above ground level), at age three, in September 2011, were utilized for analyzing the chemical composition and metabolome of wood [5]. The content of 5 and 6-carbon sugars and lignin, as well as the syringyl:guaiacyl monolignol ratio (S:G) were determined from wood cores by high-throughput pyrolysis molecular beam mass spectrometry (pyMBMS), at the National Renewable Energy Laboratory (Golden, CO, USA). Simultaneously, wood metabolites were quantified from another set of wood cores by gas chromatography coupled with time-of-flight mass spectrometry (GC-TOF-MS), at the West Coast Metabolomics Center, at UCDavis. Methodological details have been described elsewhere [5]. For association analysis, five metabolites were selected. These corresponded to those with the highest estimates of heritabilities according to Guerra et al. [5]: 4-hydroxybenzoic acid (HbA; ), galactinol (Gal; ), adenosine (Ade; ), galactonic acid (GAc; ), and alpha tocopherol (Toc; ). These estimates involved a significant genetic variation across the analyzed genotypes, indicating their importance on wood composition variation. They were selected to maximize the power for detecting significant associations.

Ecophysiology traits

Morphological and ecophysiological characteristics are determinants of biomass productivity [4]. For that reason, independent leaf samples were obtained from the top of each tree, in August 2011 and 2012, and used for stable isotope analyses and leaf area estimation, respectively. Sampled leaves were processed to determine carbon (C) and nitrogen (N) concentrations and C and N stable isotope compositions by continuous flow isotope ratio mass spectrometry, at the UCDavis Stable Isotope Facility, as described elsewhere [5]. Additionally, a subset of leaves was utilized for measuring leaf area and dry biomass to estimate the specific leaf area (SLA), according to Easlon et al. [24]. From these measurements, a set of variables were generated, including: C and N content, C:N ratio, carbon isotope discrimination (Δ), δ¹⁵N, SLA, and the N content per SLA ratio (NArea). In the case of SLA and NArea, a subset of 177 clones, with three replicates per clone, was included in the analysis. These clones were chosen including those with extreme geographical origins and minimal and maximal volume indeces.

DNA isolation

Additional young leaves were collected for DNA isolation. Circle punches were obtained and deposited in plastic vials, along with silica gel packets, until desiccation. DNA isolation was performed using the Qiagen DNeasyPlant Mini Kit (Qiagen, Inc, Valencia, California, USA), according manufacturer’s instructions. Quality was assessed with spectrophotometer (NanoDrop, Thermo Scientific). Acceptable extractions had a 260/280 ratio of 1.7-2.0 with a minimum concentration of 20 ng/μl.

Sequence capture and sequencing

Phytozome version 7.0 annotation and assembly files for P. trichocarpa (corresponding to assembly version 2.2 of the black cottonwood genome) were used to design oligonucleotide baits, complementary to short genomic regions that targeted exons, promoter, and intergenic control regions, similarly to described by Zhou and Holliday [22]. A total of 230,720 baits of 120 bp were designed using SureSelect eArray software (Agilent Technologies, Santa Clara, California, USA). The baits targeted more than 39,000 of the 40,668 annotated protein-coding transcripts of the P. trichocarpa genome. As the cumulative length of the predicted exons exceeded the available baits, following bait design in eArray, we looped through the gene list, selecting one bait for each gene at each pass, until the maximum number of baits (i.e. 230,720) was reached. In addition to exons, baits were included in the design for genes with an annotated 5’-UTR targeting the 240 bp upstream, as well as 1000 baits targeting intergenic regions to be used as selectively neutral control regions. These control regions were selected at random from non-repetitive intergenic intervals at least 1000 bp from any gene model. This strategy of bait design has demonstrated previously that capture efficiency has not been significantly impacted by the presence of paralogous genes [22]. After design, a custom biotinylated RNA bait library was synthesized. Library preparation and target enrichment were performed following the Agilent SureSelect^XT protocol (Version B). Briefly, 3.0 μg of poplar genomic DNA was sheared on a ultrasonicator (Covaris S220) at the Virginia Bioinformatics Institute (VBI), followed by end repair, 3’-end adenylation, adaptor ligation, and amplification. Agencourt AMPure XP beads were used to purify the libraries following each step, and library quality was assessed using an Agilent Bioanalyzer 2100 instrument, with Agilent DNA 1000 chips. Samples were randomly assigned one of 96 available index sequences and subsequently randomly assigned to groups of 16, each of which corresponding to a HiSeq sequencing lane. The prepared libraries were hybridized to the RNA baits in solution at 65 °C in an Eppendorf Mastercycler PCR machine (Eppendorf, Hamburg, Germany), and subsequently purified on magnetic beads. The multiplexed libraries were sequenced using an Illumina HiSeq 2500 System in a 2x100 paired-end format at VBI.

Data analysis and SNP calling

Short reads from poplar samples were pre-filtered using a collection of scripts in Biopieces (https://github.com/maasha/biopieces; version 2.0). First, interleaved pair-ended sequences were filtered based on the Illumina filter flag, and subsequently trimmed of adapters and bases with quality < 35. Following trimming, very short reads were eliminated (length < 35) to prevent ambiguous alignments. Lastly, reads having poor local quality scores (score < 25, window size = 5) were removed from the analysis. The short reads were aligned to the Populus trichocarpa (version 3.0) reference genome with the Burrows-Wheeler Aligner mem algorithm [25]. Resulting alignment files (in Sequence Alignment/Map, SAM, format) were converted via SAMtools v3.1 [26] to their binary versions (Binary Alignment/Map, BAM) for variant calling. Prior to SNP calling, duplicate reads were identified and removed using Picard software (version 2.6; https://broadinstitute.github.io/picard/command-line-overview.html) and the Genome Analysis Toolkit v3.x (GATK; https://software.broadinstitute.org/gatk/), with MarkDuplicates and DuplicateReadFilter functions, respectively [27]. Indels were realigned using the GATK IndelRealigner function. The HaplotypeCaller algorithm of GATK was then used to call SNPs (options: min_base_quality_score > 9, and standard_min_confidence_threshold_for_calling > 29) [28]. Variant calling was performed on individual chromosomes and scaffolds to reduce the run time. After merging all variant calling format (VCF) files, the VariantsToTable tool of GATK was used to produce SNP tables for downstream analysis. SNPs with a minor allele frequency < 5% and departure from Hardy-Weinberg equilibrium were excluded from the analyses. Similarly, individuals (clones) that were missing genotype data across more than 5% of the genotyped SNPs were also removed from following stages. A final set comprised of 813,280 SNPs was utilized for GWAS.

GWAS analyses

The distributions of the different traits were checked for departures from normality. Logarithmic transformations were applied to normalize the variables Vol, DBF, C and N concentration, C/N ratio, SLA, NArea, lignin, C5-sugars and C6-sugars. For SLA and NArea only a subset of 177 clones was included in the analysis as indicated above. Outlier observations were excluded from tests. Clonal means were adjusted by Best Linear Unbiased Predictor, using Proc MIXED in the software SAS v9.2 (SAS Institute, Cary, NC, USA). Prediction model included the factors clone and block, considered as random and fixed effects, respectively. The GWAS was based on Mixed Linear Models (MLM), implemented in the software GCTA v1.25 [29] (http://cnsgenomics.com/software/gcta/index.html). SNP data were first converted to PLINK format using TASSEL v3.0 [30] (http://www.maizegenetics.net/tassel), and then converted to binary PLINK (bed file) using PLINK v1.9 command line tool [31]. The genetic relatedness (kinship) was determined by the genetic relationship matrix (GRM) option at the GCTA’s GRM module, based on identical-by-descent estimates [32]. The population structure matrix (Q) was estimated by a classical multidimensional scaling [33], using the cmdscale function in the package Stats in R (R Core Team, 2014). With the marker (M), kinship (K), population structure (Q) and predictor variables, we ran an “M+K+Q” mixed model and recorded the statistically significant associations with p-value < 10^-4. Marker and Q were assumed fixed effects, whereas K represented a random effect. Association tests were adjusted with false discovery rate (FDR) using the qvalue package [35] in R, with q-value < 0.1. We additionally used the Random Forest method to estimate the percentage variance explained by the top SNPs for each trait (SNP contribution), by using the randomForest package in R [36]. The SNP contribution standard deviation (SCSD) was also calculated by the same package. We set the number of trees (ntree) grown as 2,000 and averaged 50 repeats per trait in our estimates.

Multiple SNP testing for each trait was carried out using an overlapping sliding-window analysis on association results, based on the number of the significant associations (q-value < 0.1) for 10 k window size with 1 k slide, using a custom script. Empirical p-values for each slide were estimated assuming the significant slides follow a Poisson distribution [37]. To this end, Poisson tests were carried out by comparing the rate of total significant SNPs and the total number of SNPs with the similar rate for the tested window using the probability formula below.

Due to technical limitations, Equation 1 has been placed in the Supplementary Files section.

Where lambda (ℷ) is the average number of significant SNPs, e is the constant Euler’s number and k is the average number of significant SNPs for the window being tested. P-values were recorded for each window per trait and then empirical p-values were adjusted with FDR at q < 0.1 and reported for each trait. Finally, p-values were visualized using a Manhattan plot, highlighting only the significant slides (q < 0.01).

Linkage disequilibrium

Linkage disequilibrium (LD) was estimated among pairwise combinations of SNPs per chromosome. It was expressed in terms of the squared correlation of allele frequencies r². The r²value between pairs of SNP markers, within each chromosome, was estimated using TASSEL 5.2 [30], utilizing the option of sliding window (120 SNPs per window). To assess the extent of LD, the decay of LD within physical distance (base pairs) between SNPs, within each chromosome, was evaluated by nonlinear regression analysis of r² values [38]. Analysis was performed applying the NLIN procedure in SAS 9.4.

Gene models, annotations and expression data

Gene models and gene ontologies were obtained from the Phytozome platform version 12 (Populus trichocarpa v 3.0) and Quick GO site (http://www.ebi.ac.uk/QuickGO-Beta/), respectively. Reference information about the expression of Similar to 5'-3' Exoribonuclease (XRN4) (Potri.005G048900) and Leucine Rich Repeat (LRR 1)//Leucine Rich Repeat (LRR 8) (Potri.005G015700) genes was determined utilizing the option “expression” available in the description of Populus trichocarpa gene models at the Phytozome-Phytomine platform.

We used GWAS to identify DNA polymorphisms associated with biomass production and wood chemical composition in P. trichocarpa, which determine its potential as feedstock for lignocellulosic ethanol. This approach complements our previous phenotypic characterization of the same association population [5] by identifying SNPs underlying traits of growth, ecophysiology and wood quality, the primary traits targeted for the development of genetically improved clones suitable for dedicated biomass and bioenergy plantations. An approach based on sequence capture allowed us to detect genotype-phenotype associations across most of the P. trichocarpa gene space.

Phenotyping

The association population used in this study consisted of 461 clones (from 101 provenances) comprising an important part of the natural distribution of P. trichocarpa in the US Pacific Northwest. Significant genetic variation was previously reported for growth, spring bud phenology, water use efficiency, C and N assimilation, as well as lignocellulosic components and metabolome of wood (Table 1) [5]. Expressed as coefficient of variation, this ranged from 3.6 to 78.3 %, for leaf Δ and the wood metabolite Hydroxybenzoic acid (HbA), respectively. Similarly, clonal repeatability, represented in terms of individual heritability estimates, varied among the traits, from 0.07 for C5-sugars to 0.9 for DBF. We hypothesized from this information that multiple polymorphic loci across the genome should be detected in association with phenotypes, and particularly, those with high heritability should reveal a large number of significant SNP-markers.

Genotyping

The processes of exome sequencing and genotyping identified 5.1 million SNPs across the P. trichocarpa genome in the association population. After the filtering steps described above, a set of 813,280 SNPs was used for association analyses (Table 2). The number of selected SNPs was proportional to chromosome size, ranging from 29,287 to 100,299 SNPs, for chromosomes 9 and 1, respectively (Table 2, Fig. 1a). Considering the full genome length, an average of one SNP every 482 bp (Table 2) was included in the analyses. Taking advantage of the full genome assembly, genotyping methodologies such as those based on sequence capture can target entire exons or genes across the genome, avoiding bias arising by a priori selection of candidate loci [20, 22]. In comparison to similar preceding studies that used SNP array platforms [6, 14, 16, 39], the number of SNPs in our analyses represent a significant increase in the power of applied genomic scanning.

Intra-chromosomal linkage disequilibrium

The extent of linkage disequilibrium (LD) was analyzed across each chromosome. On average, the LD over physical distance decayed below r² 0.2 at 26.9 kbp. A representative example, for Chromosome 12, is depicted in Fig. 1 b. The complete set of chromosomes with its LD is included in Fig. S1. The decay varied depending on specific chromosomes, with the most rapid decay observed on chromosomes 7 and 15 (r² 0.2 at 18.9 kbp) and the slowest decay on chromosome 11 (r² 0.2 at 51.6 kbp). Genome-wide LD decay exhibited different extents among chromosomes (Table 2). LD decay to r²<0.2 was observed on average at 26.9 kbp. High variation of LD across the genome (among and within chromosomes) has been reported for this species [20]. The estimated extent of LD decay predicted in our study is higher than the observed by Wegrzyn et al. [14] (r² 0.2 at ~0.5 kbp) and Wang et al.[40] (r² 0.2 at ~8 kbp) for P. trichocarpa. Distinct methodologies, number of markers, population sizes, genetic origins and standard errors among the studies may account for the different findings. Compared with other tree species extent of LD estimated in this study is similar to species belonging to Fraxinus [41], Prunus [42] and Eucalyptus [43] genus.

Single SNP-marker associations

Considering the set of filtered SNPs and variables relating to growth, phenology, ecophysiology and wood chemical components, we conducted a total of 12.2 million tests. In the case of wood metabolites, the total number of tests was 4.07 million. Figure 2 (a, c, e and g) depicts the number of significant associations (p-value < 0.0001, q-value < 0.1) detected per chromosome for a selected set of traits. A detailed list for each trait is provided in Table S1. Similarly, Manhattan plots for each phenotype are included in Fig. S2. In general, and consistently with chromosome length, the highest and lowest numbers of significant associations were observed for chromosomes 1 and 9, respectively. The proportion of significant SNPs of the total analyzed, ranged from 0.01 ‰ to 0.77 ‰ for C:N on chromosome 1, and height on chromosome 5 (Fig. 2a), respectively. In the case of growth traits, 32 and 270 associations were detected for DBH and h, respectively. For Vol, all associations (110) had a q-value over 0.1. The same was observed for the 95 significant markers detected for DBF. Within the ecophysiological traits, the number of significant associations ranged from 5 to 463 for SLA and leaf N-content, respectively. For traits related to the chemical composition of wood, associated SNP-markers (with p-value < 0.0001) ranged from 43 to 63 for C5-sugars and S:G ratio, however, these associations were over the q-value threshold of 0.1. In the case of wood metabolites, considering a selected subset of those with the top five highest heritability estimates, the number of associated SNPs varied from 29 to 79 for Hydroxybenzoic Acid (HbA) and Galactonic Acid (GAc) and Alpha tocopherol (Toc), respectively. No significant associations meeting the FDR criteria were identified for Adenosine (Ade) and Hydroxybenzoic Acid (HbA). The effect of SNPs on phenotypic variation depended on specific traits. The proportion of variation accounted for the cumulative effect of significantly associated SNPs (with q-value <0.1) ranged from 1.5 % for SLA to 38.8 % for GAc (Table 1).

Single nucleotide polymorphisms associated with phenotype were identified in both intergenic and genic regions. SNPs in that last category are part of genes encoding proteins belonging to a variety of functional classes. Considering the three most significant SNPs per trait, across all traits, the most represented classes were Protein Synthesis/Modification and Unknown Function Proteins (24.5 %), DNA/RNA Metabolism (20.4 %) and Energy/Metabolism (12.2 %) (Fig. 3a). A list with the SNPs and genes included in these classes is given in Table S3. An example for the Protein Synthesis/Modification category was a gene encoding a Periodic Tryptophan Protein 1 Potri.007G019500), which was associated with height, and leaf N and δ¹⁵N. Among genes related with proteins involved in DNA/RNA Metabolism, one encodes a cAMP-Regulated Phosphoprotein 21 (Potri.008G021900), significant for NArea and SLA. For genes in the Energy/Metabolism functional class, a representative was one encoding a 4-Hydroxyphenylpyruvate Dioxygenase, which was associated with leaf C variation (Potri.005G205200).

Considering the applied significance threshold (p-value 0.0001 and q-value < 0.1), GWAS performed on single-SNPs was successful in identifying polymorphisms associated with growth traits (DBH and h), leaf C and N-contents, as well as stable isotope parameters (δ¹⁵N) (Fig 2, Table S1). For traits related to phenology (DBF), wood chemical components (C5 and C6 sugars, lignin) and wood metabolites (GAc, Gal and HbA) associations with p-value< 0.0001 were detected, but they did not reach the q-value threshold. The presence or lack of significant SNPs for these traits appears to be independent of heritability estimates for each. For some traits with moderate to high H²_i (e.g. S:G ratio or DBF), GWAS detected a few or no single-SNP associations. On the other hand, for traits with low to moderate H²_i (e.g. leaf C-content, δ¹⁵N, GAc) a relatively higher number of SNPs were identified. Similar situations were observed for phenology traits in previous studies with P. trichocarpa [16]. On average for all traits with associations with q-value < 0.1, 22.8 % of phenotypic variation was accounted for by the cumulative effect of significant SNPs. The influence of multiple SNPs associated with phenotypes is particularly interesting in the context of the development of models for genomic selection, where large numbers of markers are utilized to predict the genetic merit of individuals [44]. Differences among traits in terms of the number of significant SNP-markers suggest the differential effect of both the variable number of SNPs influencing each trait and the individual impact of some particular SNPs. In that sense, some individual SNPs could have a such low effect size that none reach statistical significance. Furthermore, the apparent lack of correspondence between estimates of H²_iand the phenotypic variance collectively accounted for by SNPs, could be explained by non-additive effects (e.g epistasis, GxE effect) or epigenetic factors acting on some traits. These types of effects are usually underestimated because MLM utilized for GWAS only suppose additive interactions [16]. Finally, another factor influencing the number of significant associated SNPs (and their effect on phenotypes) deals with the complexity of analyzing thousands of single markers across the genome. Stringent thresholds for controlling FDR may not be appropriate for p-value adjustment, given the correlated nature of markers along a chromosome [45]. In this study, significant associations (p-value < 0.0001 and even lesser) in traits such as Vol, DBF, Leaf Δ or S:G ratio were over a q-value of 0.1, 0.3 or higher, and they were considered non-significant. It has been suggested that the general applicability FDR suffers from several problems when applied to association analysis of a single trait and proposed alternative significance criterion (the genomewise k-error rate) [46]. Thus, further data analyses will be required to establish if associations excluded by p-value adjustment should be included in the set of SNPs that effectively control some specific traits.

Sliding window analyses

The multiple-marker analysis by sliding-window allowed us to identify genomic regions containing different sets of SNPs jointly associated with each trait. Figure 4a depicts a representative Manhattan plot with the significant windows identified for leaf δ¹⁵N. Manhattan plots for other traits are included in Fig. S3. A variable number of windows per chromosome were detected among the phenotypes (Fig. 3 b, d, f, h). The total number of significant windows (q-value < 0.1) ranged from 26 for S:G ratio, to 291 for N content (Table S2). For most traits, the main contributions were observed by chromosome 1. However, for traits such as DBH, DBF, C:N, δ¹⁵N, and GAc, the most relevant chromosomes in terms of the number of significant windows included to 5, 6, 4, 5 and 10, respectively. The multiple-SNP approach applied by sliding window analysis has been proposed as a robust alternative for identifying clustered significant patterns of SNPs, that are associated with complex traits, in a chromosomal context in humans and plants [37, 47-49]. In our study, significant windows identified a series of SNP clusters which were coincident with coding regions of multiple genes (Table S4). At the same time, and as it is expected, those clusters included SNPs identified by single-marker associations, as depicted in Fig.4. Significant windows were also detected in gene regions where no significant single SNP-markers were present. Additionally, information coming from both detection approaches allowed us to define genome zones with high LD, significantly associated with phenotypic variation, revealing the presence of phenotypically-relevant haplotypes (Fig 4c and 5d). SNP clusters identified by sliding window analysis were coincident with SNPs detected by single-marker tests, which were significant for two traits simultaneously (Fig 5). For example, in the case of the negatively correlated wood traits, C6-sugars and lignin content [5], both approaches detected the Leucine Rich Repeat (LRR1)//Leucine Rich Repeat (LRR8) gene (Potri.005G015700), which is important for the variation of these two wood components (Fig. 5c and d). Although more evidence will be necessary, haplotype blocks defined by this way could be indicative of polymorphic regions with pleiotropic effects.

Considering the top three most significant windows across all chromosomes and traits, the most represented functional classes were Protein with Unknown Function (26.3%), Protein Synthesis/Modification (22.8 %), Energy/Metabolism (17.5 %) and Transcription (14 %) (Fig. 3b). A list with the windows and genes included in these classes are given in Table S4. Some of the detected genes encoding proteins with roles in Protein Synthesis/Modification were those expressing Similar to Threonyl-tRNA Synthetase (Potri.008G145600), Interleukin-1 Receptor-Associated Kinase 4 (Potri.008G145900) and Leucine Rich Repeat (LRR1) (Potri.005G015700) associated with wood C5-sugars, C5-sugars and lignin, respectively. An example of the genes dealing with proteins belonging to the Energy/Metabolism class were Exostosin Heparan Sulfate Glycosyltransferase-Related (Potri.010G197900) and Similar to Aldehyde Dehydrogenase 1 Precursor (Potri.012G078700), associated with δ¹⁵N and DBF, respectively. Among genes encoding enzymes involved in Transcription, Similar to Agamous-like MADS Box Protein AGL12 (Potri.019G076800) and WRKY Transcription Factor 10-related (Potri.013G086000), were associated with Gal and C5-sugars, respectively.

Genes detected by single-SNP association and sliding windows approaches

We also verified the consistency between SNP-markers identified by single-SNP association and sliding window analyses. To this end, we considered as an example the most significant window (#154; q-value 7.97E-23) detected in chromosome 5 for leaf δ¹⁵N (Fig. 4a). The SNPs identified within the window were part of a gene (Potri.005G048900) encoding a Similar to 5'-3' Exoribonuclease (XRN4) Gene, which is involved in disease resistance, response to ethylene, RNAi, and miRNA-mediated RNA decay [50]. According to Populus trichocarpa genome site, this gene is expressed mainly in dormant buds (Fig 4d). The associated window comprised 64 SNPs (Fig. 4b). Fourteen of these markers, were also identified by the single-marker association, indicating the consistency between both approaches. Particularly, markers such as S05_3547832, S05_3547864, S05_3547904 and S05_3548573 were in high LD (r² > 0.75) and produced significant variation at the level of leaf δ¹⁵N means (Fig. 4c). Alternatively, alleles for those SNPs represent intronic and non-synonymous polymorphisms, involving possible effects on transcript splicing, and protein structure and function.

The identification of SNP-markers using both single-SNP association and sliding window approaches was analyzed considering the high correlation between some pairs of traits and the possibility of the presence of pleiotropic effects. For example, Venn diagrams and Manhattan plots identified a set of SNP-markers associated simultaneously with C6 sugars and lignin (Fig. 5 a, b). A significant negative correlation has been observed between C6-sugars and lignin in Populus trichocarpa wood, in both phenotypical (r_p =-0.81) and genetic (r_g =-0.79) terms [5]. These common SNPs belong to a gene encoding a Leucine Rich Repeat (LRR 1)/Leucine Rich Repeat (LRR 8) protein (Potri.005G015700), recently linked to cell elongation and wall extension [51]. This gene included a segment with markers exhibiting moderate to high LD (r² > 0.50), which was associated with significant variation in both traits (Fig 5 c, d). This sort of interaction was also analyzed for DBH and h, and leaf N and δ¹⁵N (Fig. S4). Results indicate the consistency between both association approaches for finding common markers underlying different traits. At the same time, significant sliding windows (q-value <0.1) support detection by single SNP-marker tests, particularly for those traits in which FDR threshold was not reached.

Significant associations for traits underlying growth, nutrient metabolism and xylem formation, among others, define SNPs and genes, which might represent logical candidates for functional studies focused on confirming their role and impact on phenotype of Populus species. Considering their high significance and the simultaneous detection by the single-SNP association and/or sliding windows approaches, we centered part of our analysis on genes Similar to 5'-3' Exoribonuclease (XRN4) and Leucine Rich Repeat (LRR 1)//Leucine Rich Repeat (LRR 8), which included SNPs and windows associated with leaf δ¹⁵N, and wood C6-sugars and lignin, respectively. Gene XRN4 is expressed mainly in dormant buds (Fig 4d), according to information available for P. trichocarpa. The exoribonuclease function of this gene links it to transcription, RNA metabolism and RNA interference in eukaryotes [52]. In plants, it has been related to ethylene signaling [53] and response of plants to abiotic and biotic stresses [54, 55]. Mutation of members of the XNR gene family produced sensitivity to N starvation in Saccharomyces cerevisiae [56] and morphological alterations in Arabidopsis thaliana [57, 58]. Thus, association of XRN4 with leaf δ¹⁵N, an indicator of N use efficiency, could be related to the N metabolism and mobilization at leaves, particularly during the last third of the growing season, when sampling was done. More studies will be required to detect possible effects of SNPs at XRN4 on photosynthesis and biomass production. On the other hand, as mentioned above, gen LRR1//LRR8 encoding a Leucine-Rich Repeat Receptor-Like Kinases is expressed mainly in stems, fully expanded buds and female flowers (Fig. S5). LRR genes play important and diverse roles in growth and development in poplars [59]. Evidence supports the role of these receptors as a signaling components in the regulation of the synthesis of cellulose and lignin that controls the secondary cell wall thickening [60, 61]. This role would explain the association that we detected simultaneously for both chemical components of wood (C6-sugars and lignin).

Chemical composition of wood

Association analyses of the chemical composition of P. trichocarpa wood was previously performed in the same population by our group [14] using a candidate gene approach. We carried out a comparison between the results from both studies considering the 40 candidate genes utilized by Wegrzyn et al. [14], which encodes enzymes from the cellulose and lignin biosynthesis pathways and cytoskeletal proteins. Association results in the present study were significant only under a p-value < 0.0001 threshold. Results indicated both overlap and divergence between the two studies (Fig. 6 and Table S5). SNPs within genes encoding cellulose synthase (CesA1A and CesA2B) were significantly associated with C6-sugars in both studies. For this trait, we also detected SNPs in genes not previously identified by Wegrzyn et al. [14]. These included SNPs within 4CL1, LAC2 and TUA5. In the case of lignin content, different members of this gene family (CesA2B and CesA1A) were differentially detected (differentially) in both studies. Moreover, we identified lignin-associated SNPs belonging to 4-Coumarate:CoA Ligase 4CL1, Laccase LAC2, Serine Hydroxymethyl Transferase SHMT6 and the cytoskeletal protein α-Tubulin TUA5. Finally, for the S:G ratio, our analyses detected 4-Coumarate:CoA Ligase 4CL1, Phenylalanine Ammonia-Lyase PAL5, Caffeoyl CoA O-Methyltransferase CoAOMT2 and Ferulate 5-Hydroxylase F5H1, and the cytoskeletal β-Tubulin TUB16, which were not identified by Wegrzyn et al. (2010). In spite of the genotypes and wood chemical characterization methods were mostly the same in both studies, distinct trial sites, sampling height or differential presence of juvenile wood, among other factors, might explain the differences in the findings reported previously [14] and those described in the present work.

Forest trees are an important source for multiple wood and non-wood products. Genetic improvement aimed to develop such products, including lignocellulosic biofuels, depends on the variation underlying commercial traits. This variation is characterized by a complex genetic control and the influence of environmental factors. In this study, we identified a series of DNA polymorphisms controlling the phenotypic variation of growth, nitrogen use and wood composition of P. trichocarpa clones at different levels. Our results thus provide a starting point to define candidate genes suitable for functional characterizations addressed to confirm their biological role. At the same time, the genome-wide scale applied for the association analyses revealed a large number of SNPs which could be utilized to develop genomic selection schemes. Further efforts to define the utility of SNP polymorphisms in generating genomic breeding values will illuminate the path to breeding programs that incorporate molecular markers for bioethanol production. The upshot will be the estimation of parental hybridization values and an earlier, more precise genotypic selection from segregating F1 hybrid populations, that together will increase the overall magnitude of the realized genetic gain per unit of time.

Authors thank California Agricultural Experiment Station by the provided support.

Funding

This study was funded by the Advanced Hardwood Biofuels Northwest Project, supported by Agriculture and Food Research Initiative Competitive (Grant no. 2011-68005-30407, USDA National Institute of Food and Agriculture) and by the National Science Foundation Plant Genome Research Program (IOS Grant no. 1054444 to JAH).

Availability of data

The datasets are available online at the TreeGenes platform (https://treegenesdb.org/) under the accession number TGDR050.

Author’s contributions

B.J.S., D.B.N and F.P.G. planned and designed the research. F.P.G and H.S. analysed data. D.B.N, F.P.G, J.H. and H.S. interpreted data. F.P.G, J.H.R and R.S. conducted fieldwork and data and sample collection. J.H.R, M.D., O.F., R.F. and R.S. processed and analyzed samples. B.J.S., D.B.N, F.P.G, J.H., J.H.R, H.S. and R.F. wrote the manuscript.

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Author details

¹Department of Plant Sciences, University of California at Davis, CA 95616, USA. ² Department of Forest Resources and Environmental Conservation, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA. ³ Department of Land, Air and Water Resources, University of California, Davis, CA 95616, USA. ⁴ Department of Molecular and Cellular Biology & Genome Center, University of California, Davis, CA 95616, USA. ⁵ Biological Research Group, GreenWood Resources, Portland, OR 97201, USA. ⁶ National Renewable Energy Laboratory, Golden, CO 80401, USA. ⁷Bioenergy Research Center, University of California at Davis, Davis, CA 95616, USA. ⁸Instituto de Ciencias Biológicas, Universidad de Talca, P.O. Box 747, Chile.

1. Porth I, El-Kassaby YA: Using Populus as a lignocellulosic feedstock for bioethanol. Biotechnology Journal 2015, 10(4):510-524.

Davis JM: Genetic Improvement of Poplar (Populus spp.) as a Bioenergy Crop. In: Genetic Improvement of Bioenergy Crops. Edited by Vermerris W: Springer New York; 2008: 397-419.
Stanton BJ, Neale D, Li S: Populus breeding: from the classical to the genomic approach. In: Genetics and genomics of Populus. Edited by Jansson S, Bhalerao R, Groover A, vol. 8. New York: Springer; 2010: 309-348.
Mitchell CP: Ecophysiology of short rotation forest crops. Biomass and Bioenergy 1992, 2(1–6):25-37.
Guerra F, Richards J, Fiehn O, Famula R, Stanton B, Shuren R, Sykes R, Davis M, Neale D: Analysis of the genetic variation in growth, ecophysiology, and chemical and metabolomic composition of wood of Populus trichocarpa provenances. Tree Genetics & Genomes 2016, 12(1):1-16.
McKown AD, Guy RD, Klápště J, Geraldes A, Friedmann M, Cronk QCB, El-Kassaby YA, Mansfield SD, Douglas CJ: Geographical and environmental gradients shape phenotypic trait variation and genetic structure in Populus trichocarpa. New Phytologist 2014, 201(4):1263-1276.
Slavov GT, Leonardi S, Adams WT, Strauss SH, DiFazio SP: Population substructure in continuous and fragmented stands of Populus trichocarpa. Heredity 2010, 105(4):348-357.
Evans LM, Slavov GT, Rodgers-Melnick E, Martin J, Ranjan P, Muchero W, Brunner AM, Schackwitz W, Gunter L, Chen J-G et al: Population genomics of Populus trichocarpa identifies signatures of selection and adaptive trait associations. Nat Genet 2014, 46(10):1089-1096.
Scaracia-Mugnozza GE, Ceulemans R, Heilman PE, Isebrands JG, Stettler RF, Hinckley TM: Production physiology and morphology of Populus species and their hybrids grown under short rotation. II. Biomass components and harvest index of hybrid and parental species clones. Canadian Journal of Forest Research 1997, 27(3):285-294.
Zabek LM, Prescott CE: Biomass equations and carbon content of aboveground leafless biomass of hybrid poplar in Coastal British Columbia. Forest Ecology and Management 2006, 223(1–3):291-302.
Bradshaw HD, Stettler RF: Molecular genetics of growth and development in Populus. IV. Mapping QTLs with large effects on growth, form, and phenology traits in a forest tree. Genetics 1995, 139(2):963-973.
McKown AD, Guy RD, Quamme L, Klápště J, La Mantia J, Constabel CP, El-Kassaby YA, Hamelin RC, Zifkin M, Azam MS: Association genetics, geography and ecophysiology link stomatal patterning in Populus trichocarpa with carbon gain and disease resistance trade-offs. Mol Ecol 2014, 23(23):5771-5790.
Monclus R, Villar M, Barbaroux C, Bastien C, Fichot R, Delmotte FM, Delay D, Petit JM, Brechet C, Dreyer E et al: Productivity, water-use efficiency and tolerance to moderate water deficit correlate in 33 poplar genotypes from a Populus deltoides x Populus trichocarpa F1 progeny. Tree Physiology 2009, 29(11):1329-1339.
Wegrzyn JL, Eckert AJ, Choi M, Lee JM, Stanton BJ, Sykes R, Davis MF, Tsai C-J, Neale DB: Association genetics of traits controlling lignin and cellulose biosynthesis in black cottonwood (Populus trichocarpa, Salicaceae) secondary xylem. New Phytologist 2010, 188(2):515-532.
Guerra FP, Wegrzyn JL, Sykes R, Davis MF, Stanton BJ, Neale DB: Association genetics of chemical wood properties in black poplar (Populus nigra). New Phytologist 2013, 197(1):162-176.
McKown AD, Klápště J, Guy RD, Geraldes A, Porth I, Hannemann J, Friedmann M, Muchero W, Tuskan GA, Ehlting J et al: Genome-wide association implicates numerous genes underlying ecological trait variation in natural populations of Populus trichocarpa. New Phytologist 2014, 203(2):535-553.
Porth I, Klapšte J, Skyba O, Hannemann J, McKown AD, Guy RD, DiFazio SP, Muchero W, Ranjan P, Tuskan GA et al: Genome-wide association mapping for wood characteristics in Populus identifies an array of candidate single nucleotide polymorphisms. New Phytologist 2013, 200(3):710-726.
Fahrenkrog AM, Neves LG, Resende MF, Jr., Vazquez AI, de Los Campos G, Dervinis C, Sykes R, Davis M, Davenport R, Barbazuk WB et al: Genome-wide association study reveals putative regulators of bioenergy traits in Populus deltoides. The New phytologist 2017, 213(2):799-811.
Gnirke A, Melnikov A, Maguire J, Rogov P, LeProust EM, Brockman W, Fennell T, Giannoukos G, Fisher S, Russ C et al: Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing. Nat Biotech 2009, 27(2):182-189.
Zhou L, Bawa R, Holliday JA: Exome resequencing reveals signatures of demographic and adaptive processes across the genome and range of black cottonwood (Populus trichocarpa). Mol Ecol 2014, 23(10):2486-2499.
Tuskan GA, DiFazio S, Jansson S, Bohlmann J, Grigoriev I, Hellsten U, Putnam N, Ralph S, Rombauts S, Salamov A et al: The Genome of Black Cottonwood, Populus trichocarpa (Torr. & Gray). Science 2006, 313(5793):1596-1604.
Zhou L, Holliday JA: Targeted enrichment of the black cottonwood (Populus trichocarpa) gene space using sequence capture. BMC Genomics 2012, 13:703-703.
Holliday JA, Zhou L, Bawa R, Zhang M, Oubida RW: Evidence for extensive parallelism but divergent genomic architecture of adaptation along altitudinal and latitudinal gradients in Populus trichocarpa. New Phytologist 2016, 209(3):1240-1251.
Easlon HM, Nemali KS, Richards JH, Hanson DT, Juenger TE, McKay JK: The physiological basis for genetic variation in water use efficiency and carbon isotope composition in Arabidopsis thaliana. Photosynthesis Research 2014, 119(1-2):119-129.
Li H, Durbin R: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 2009, 25(14):1754-1760.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R: The Sequence Alignment/Map format and SAMtools. Bioinformatics 2009, 25(16):2078-2079.
DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, Philippakis AA, del Angel G, Rivas MA, Hanna M et al: A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 2011, 43(5):491-498.
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M et al: The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 2010, 20(9):1297-1303.
Yang J, Lee SH, Goddard ME, Visscher PM: GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet 2011, 88(1):76-82.
Bradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y, Buckler ES: TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics 2007, 23(19):2633-2635.
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, de Bakker PI, Daly MJ et al: PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 2007, 81(3):559-575.
Zheng X, Weir BS: Eigenanalysis of SNP data with an identity by descent interpretation. Theor Popul Biol 2016, 107:65-76.
Gower JC: Some Distance Properties of Latent Root and Vector Methods Used in Multivariate Analysis. Biometrika 1966, 53(3/4):325-338.
_Team RC: R: A language and environment for statistical computing. In. Vienna, Austria: R Foundation for Statistical Computing; 2014.
Bass A, Dabney A, Robinson D: qvalue: Q-value estimation for false discovery rate control. R package version 2.6.0. In.; 2015.
Liaw A, Wiener M: Classification and Regression by Random Forest. R News 2002, 2:18-22.
Sun YV, Jacobsen DM, Turner ST, Boerwinkle E, Kardia SLR: Fast implementation of a scan statistic for identifying chromosomal patterns of genome wide association studies. Computational Statistics & Data Analysis 2009, 53(5):1794-1801.
Remington DL, Thornsberry JM, Matsuoka Y, Wilson LM, Whitt SR, Doebley J, Kresovich S, Goodman MM, Buckler ES: Structure of linkage disequilibrium and phenotypic associations in the maize genome. Proceedings of the National Academy of Sciences 2001, 98(20):11479-11484.
Muchero W, Guo J, DiFazio SP, Chen J-G, Ranjan P, Slavov GT, Gunter LE, Jawdy S, Bryan AC, Sykes R et al: High-resolution genetic mapping of allelic variants associated with cell wall chemistry in Populus. BMC Genomics 2015, 16(1):24.
Wang J, Street NR, Scofield DG, Ingvarsson PK: Natural Selection and Recombination Rate Variation Shape Nucleotide Polymorphism Across the Genomes of Three Related Populus Species. Genetics 2016, 202(3):1185-1200.
Sollars ESA, Harper AL, Kelly LJ, Sambles CM, Ramirez-Gonzalez RH, Swarbreck D, Kaithakottil G, Cooper ED, Uauy C, Havlickova L et al: Genome sequence and genetic diversity of European ash trees. Nature 2016, 541:212.
Campoy JA, Lerigoleur-Balsemin E, Christmann H, Beauvieux R, Girollet N, Quero-García J, Dirlewanger E, Barreneche T: Genetic diversity, linkage disequilibrium, population structure and construction of a core collection of Prunus avium L. landraces and bred cultivars. BMC Plant Biology 2016, 16(1):49.
Müller BSF, Neves LG, de Almeida Filho JE, Resende MFR, Muñoz PR, dos Santos PET, Filho EP, Kirst M, Grattapaglia D: Genomic prediction in contrast to a genome-wide association study in explaining heritable variation of complex growth traits in breeding populations of Eucalyptus. BMC Genomics 2017, 18:524.
Isik F: Genomic selection in forest tree breeding: the concept and an outlook to the future. New Forests 2014, 45(3):379-401.
Balint-Kurti P, Simmons SJ, Blum JE, Ballaré CL, Stapleton AE: Maize Leaf Epiphytic Bacteria Diversity Patterns Are Genetically Correlated with Resistance to Fungal Pathogen Infection. Molecular Plant-Microbe Interactions 2010, 23(4):473-484.
Chen L, Storey JD: Relaxed Significance Criteria for Linkage Analysis. Genetics 2006, 173(4):2371-2381.
Sun YV, Levin AM, Boerwinkle E, Robertson H, Kardia SLR: A scan statistic for identifying chromosomal patterns of SNP association. Genetic Epidemiology 2006, 30(7):627-635.
Asimit JL, Andrulis IL, Bull SB: Regression models, scan statistics and reappearance probabilities to detect regions of association between gene expression and copy number. Statistics in Medicine 2011, 30(10):1157-1178.
Morrison KM, Simmons SJ, Stapleton AE: Loci controlling nitrate reductase activity in maize: ultraviolet-B signaling in aerial tissues increases nitrate reductase activity in leaf and root when responsive alleles are present. Physiologia Plantarum 2010, 140(4):334-341.
Rymarquis LA, Souret FF, Green PJ: Evidence that XRN4, an Arabidopsis homolog of exoribonuclease XRN1, preferentially impacts transcripts with certain sequences or in particular functional categories. RNA 2011, 17(3):501-511.
Majda M, Robert S: The Role of Auxin in Cell Wall Expansion. Int J Mol Sci 2018, 19(4).
Chang JH, Xiang S, Xiang K, Manley JL, Tong L: Structural and biochemical studies of the 5′→3′ exoribonuclease Xrn1. Nature Structural &Amp; Molecular Biology 2011, 18:270.
Potuschak T, Vansiri A, Binder BM, Lechner E, Vierstra RD, Genschik P: The Exoribonuclease XRN4 Is a Component of the Ethylene Response Pathway in <em>Arabidopsis</em>. The Plant Cell 2006, 18(11):3047-3057.
Merret R, Descombin J, Juan Y-t, Favory J-J, Carpentier M-C, Chaparro C, Charng Y-y, Deragon J-M, Bousquet-Antonelli C: XRN4 and LARP1 Are Required for a Heat-Triggered mRNA Decay Pathway Involved in Plant Acclimation and Survival during Thermal Stress. Cell Reports 2013, 5(5):1279-1293.
Rymarquis L, Souret F, Green P: Evidence that XRN4, an Arabidopsis homolog of exoribonuclease XRN1, preferentially impacts transcripts with certain sequences or in particular functional categories. Rna 2011, 17(3):501-511.
Sinturel F, Bréchemier-Baey D, Kiledjian M, Condon C, Bénard L: Activation of 5′-3′ exoribonuclease Xrn1 by cofactor Dcs1 is essential for mitochondrial function in yeast. Proceedings of the National Academy of Sciences 2012, 109(21):8264-8269.
Kim B-H, Von Arnim AG: FIERY1 regulates light-mediated repression of cell elongation and flowering time via its 3′(2′),5′-bisphosphate nucleotidase activity. The Plant Journal 2009, 58(2):208-219.
Hirsch J, Misson J, Crisp PA, David P, Bayle V, Estavillo GM, Javot H, Chiarenza S, Mallory AC, Maizel A et al: A Novel fry1 Allele Reveals the Existence of a Mutant Phenotype Unrelated to 5′->3′ Exoribonuclease (XRN) Activities in Arabidopsis thaliana Roots. PLOS ONE 2011, 6(2):e16724.
Zan Y, Ji Y, Zhang Y, Yang S, Song Y, Wang J: Genome-wide identification, characterization and expression analysis of populus leucine-rich repeat receptor-like protein kinase genes. BMC Genomics 2013, 14:318-318.
Steinwand BJ, Kieber JJ: The Role of Receptor-Like Kinases in Regulating Cell Wall Function. Plant Physiology 2010, 153(2):479-484.
Huang C, Zhang R, Gui J, Zhong Y, Li L: The Receptor-Like Kinase AtVRLK1 Regulates Secondary Cell Wall Thickening([OPEN]). Plant Physiology 2018, 177(2):671-683.

Table 1. Summary statistics and contribution of significant SNPs (q < 0.1) to phenotypic variation in the studied Populus trichocarpa association population. Columns “Mean”, “Std. Dev.”, “C.V” and “” were extracted from Guerra et al. [5]. “SNP contribution” (Cumulative effect of significant SNPs on phenotype) and “SCSD” (SNP contribution standard deviation; expressed as a percentage of contribution) were estimated by Random Forest analysis. “n” indicates the number of significant SNP utilized for estimation of SNP contribution. “n.s.” no significant at q < 0.1. R.A, relative abundance.

Trait		Unit	Mean	Std. Dev.	C.V. (%)	*H²_c*	SNP contribution^a (%)	SCSD^b(%)	n
Growth	Diameter (DBH)	mm	53.2	7.9	14.8	0.52	27.1	0.2	225
	Height (h)	dm	67.1	4.1	6.1	0.42	12.1	0.3	1689
	Volume index (Vol)	m³	0.016	0.005	31.3	0.53	n.s.	-	-
Phenology	Days to bud flush (DBF)	Julian days	87.4	7.8	8.9	0.9	33.9	0.3	95
Ecophysiology	Leaf C content	% DW	44.4	1.6	3.6	0.09	n.s.	-	-
	Leaf N content	% DW	3.2	0.3	9.4	0.28	n.s.	0.2	-
	Leaf C:N ratio (C:N)	kg C/kg N	14.2	1.3	9.2	0.33	5.6	0.3	145
	Leaf Δ	‰	19.2	0.7	3.6	0.26	n.s.	-	-
	Leaf δ ¹⁵N	‰	2.5	0.4	16.0	0.25	18.7	0.3	864
	Specific leaf area (SLA)	m²/kg DW	12	1.5	12.5	0.27	1.5	0.6	111
	N content : SLA ratio (NArea)	g N/m²	2.8	0.4	14.3	0.28	4.6	0.6	103
Wood chem. components	Wood 5-carbon sugars (C5)	%	36	2.2	6.1	0.07	n.s.	0.3	-
	Wood 6-carbon sugars (C6)	%	42.3	3.3	7.8	0.08	n.s.	0.2	-
	Wood lignin	%	22.7	1	4.4	0.15	n.s.	0.2	-
	Wood syringil:guayacil ratio (S:G)	fold	1.9	0.1	5.3	0.58	n.s.	0.2	-
Wood metabolites	Galactonic acid (GAc)	R.A.^a	0.6	0.4	62.8	0.22	38.8	0.3	82
	Galactinol (Gal)	R.A.^a	144.1	75.0	52.0	0.28	n.s.	0.3	-
	Alpha tocopherol (Toc)	R.A.^a	69.3	31.1	44.9	0.16	40.6	0.3	79
	Adenosine (Ade)	R.A.^a	2.8	1.0	33.4	0.25	n.s.	0.3	-
	4-Hydroxybenzoic acid (HbA)	R.A.^a	5.9	4.6	78.3	0.45	29.8	0.3	29

Table 2. Summary of amount of analyzed SNP markers and intrachromosomal LD decay across the Populus trichocarpa genome. Linkage disequilibrium decay is referred to the physical distance (kbp) where LD = 0.2

Chr.	Size (Mbp)	Analyzed SNPs	Frequency (bp/SNP)	LD Decay^a (kbp)
1	50.5	100,299	503.4	29.99
2	25.3	47,563	531.1	27.49
3	21.8	49,962	436.7	27.19
4	24.3	47,671	509.1	22.36
5	25.9	52,236	495.6	23.35
6	27.9	49,374	565.3	27.21
7	15.6	30,295	515.3	18.85
8	19.5	43,099	451.6	21.99
9	12.9	29,287	442.1	21.63
10	22.6	46,758	482.9	24.21
11	18.5	38,563	479.8	51.63
12	15.8	31,964	493.1	25.13
13	16.3	30,493	535.2	28.07
14	18.9	40,482	467.4	29.65
15	15.3	33,418	457.2	18.85
16	14.5	32,006	452.9	26.22
17	16.1	39,114	411.1	33.83
18	17	34,049	498.1	33.39
19	15.9	36,647	435.0	19.26
Total	394.5	813,280	-	-
		Mean	482.3	26.86

Download PDF

Journal Publication

published 20 Nov, 2019

Read the published version in BMC Genomics →

Editorial decision: Major revision
17 Jun, 2019
Review #1 received at journal
29 May, 2019
Review #2 received at journal
29 May, 2019
Reviewer #1 agreed at journal
17 May, 2019
Reviewer #2 agreed at journal
17 May, 2019
Reviewers invited by journal
16 May, 2019
Submission checks completed at journal
10 May, 2019
Editor invited by journal
28 Apr, 2019
Editor assigned by journal
28 Apr, 2019
First submitted to journal
24 Apr, 2019

You are reading this older preprint version

Read the latest preprint version →

Exome resequencing and GWAS for growth, ecophysiology, and chemical and metabolomic composition of wood of Populus trichocarpa

Status:

Journal Publication

Version 1

Abstract

Figures

Background

Methods

Results and Discussion

Conclusions

Declarations

References

Tables

Supplementary Files

Status:

Journal Publication

Version 1