*Plant material*

The study was conducted on two OP progeny trials: S21F9021146 (F1146) (Höreda, Eksjö, Sweden) and S21F9021147 (F1147) (Erikstorp, Tollarp, Sweden). Both trials were established in 1990 with a spacing 1.4m×1.4m. Originally, the experiments contained more than 18 progenies from 524 families at each of site, but after thinning activities in Höreda and Erikstorp in 2010 and 2008, respectively, about 12 progenies per family were left. In 2011 and 2012, six trees per site (524 * 12 ~ 6000 trees) were phenotyped 30. Standing tree-based measurements with Pilodyn and Hitman were performed on the same trees in 2011 and 2013, respectively, after which further thinning was performed. For this study, in 2018, we generated genomic (SNP) data from 484 remaining progeny trees after thinning which belonged to 62 of the OP families (out of the original 524 families) and on average eight progenies per family. This genotypic data was combined with available phenotypic data for the same trees that were used.

*Phenotypic data*

The phenotypic data was previously described in Zhou et al, 2019 31. Increment cores of 12mm diameter from pith to bark were collected from the progenies in 2011 and in 2012. These samples were analyzed for pith to bark variations in many woods and fiber traits with a SilviScan 32 instrument at Innventia (now RISE), Stockholm, Sweden. This data is referred as increment core-based measurements through the text. The annual rings of all samples were identified, as well as their parts of earlywood, transition wood and latewood, averages were calculated for all rings, as well as their parts and dated with year of wood formation 33.

The aim of breeding is not for properties of individual rings, but properties of the stem at harvesting target age. Therefore, this study focused on predictions of averages for stem cross-sections, and we chose tree age 19 years as the reference age, with models trained on trait averages for all rings formed up to different younger ages. Three types of averages were calculated and predictions compared for density, MFA and MOE: 1) area-weighted averages, relating to the cross-section of the stem, 2) width-weighted, relating to a radius or an increment core, and 3) arithmetic averages, where all ring averages are weighted with same weight. For the calculation of area-weighted average we assumed that each growth ring is a circular around the pith, calculated the area of each annual ring from its inner and outer radii, and when calculating the average at a certain age, the trait average for each ring was weighted with the ring’s proportion of the total cross-sectional area at that age. Similarly, for the calculation of the width-weighted average, the trait average for each ring was weighted with the ring’s proportion of the total radius from pith to bark at that age. Similar results were obtained with the three average methods. For this reason, only the estimates based on the area-weighted method (the most relevant for breeding) are shown. Tree age 19 years was used as the reference age. Thus, all the selection methods investigated for density, MFA and MOE, phenotypic and genetic, were compared based on how well they predicted the cross-sectional averages of the trees at this age, with their last ring formed during the vegetation period of 2009.

In addition, estimates of the three solid wood traits were calculated based on data from Pilodyn and Hitman instruments, measured on the standing trees without removing the bark at age 22 and age 24 years, respectively. Pilodyn measures the penetration depth with a needle pressed into the stem, which is inversely correlated with wood density. Hitman measures the velocity of sound in the stem, which correlates with microfibril angle, MFA 34, 35. MOE is related to wood density and velocity of sound 36-38 and can therefore be estimated by combining the Pilodyn and Velocity data, which estimates we here name MOEind (for standing-tree based). Further details on how this was performed in our study are given in Chen et. al 2015 39. The references show that these standing-tree-based measurements provide useful information and are very time and cost-efficient. However, they do not allow calculation of properties of the tree at younger ages. Therefore, we were not able to investigate from what early ages such data can be uses within genomic selection.

*Genotypic data*

Genomic DNA was extracted from buds or needles when buds were not available. Qiagen Plant DNA extraction protocol was utilized for DNA extraction and purification and DNA quantification performed using the Qubit® ds DNA Broad Range (BR) Assay Kit (Oregon, USA). Genotyping was conducted at Rapid Genomics, USA, using exom capture methodology same as the method used in Baison et. al 201940. Sequence capture was performed using the 40 018 diploid probes previously designed and evaluated for *P. abies*41 and samples were sequenced to an average depth of 15x using an Illumina HiSeq 2500 (San Diego, USA) 40. Variant calling was performed using the Genome Analysis Toolkit (GATK) HaplotypeCaller v3.6 42 in Genome Variant Call Format (gVCF) output format. After that, the following steps were performed for filtering: 1) removing indels; 2) keeping only biallelic loci; 3) removing variant call rate (“missingness”) < 90%; 4) removing minor allele frequency (MAF) < 0.01. Beagle v4.0 43 was used for missing data imputation. After these steps, 130,269 SNPs were used for downstream analysis.

*Population structure*

As a first step, we conducted a principal component analysis to determine the presence of structure in our population. The spectral decomposition of the marker matrix revealed that only about 2% of the variation was captured by the first eigenvector, indicating low population structure. Additionally, in previous study, low genotype by environment (GxE) interaction was detected for wood quality traits on these two trials 30. Therefore, population structure was not considered in the design of cross-validation sets (see Modelling and cross-validation chapter for further details on the cross-validation sets design).

*Narrow-sense heritability (h*2) estimation

For each trait, an individual tree model was fitted in order to estimate additive variance and breeding values:

**See formula 1 in the supplementary files.**

where *y* is a vector of measured data of a single trait, *b* is a vector of fixed effects including a grand mean, provenance and site effect, *b* is a vector of post-block effects and *u* is a vector of random additive (family) effects which follow a normal distribution *u*~N(0,As2u) and *e* is the error term with normal distribution N(0,Is2e). *X*, *Z* and *W *are incidence matrices, A is the additive genetic relationship matrix and I is the identity matrix. s2u equals to σa2 (pedigree-based additive variance) when random effect in equation 1 is pedigree-based in which case *u*~N(0,As2u), and s2u equals to σg2 (marker-based additive variance) when random effect in equation 1 is marker-based in which case *u*~N(0,Gs2u). The G matrix is calculated as **see formula 2 in the supplementary files. **

where M is the matrix of samples with SNPs encoded as 0, 1, 2 (i.e., the number of minor alleles), P is the matrix of allele frequencies with the ith column given by 2(pi − 0.5), where pi is the observed allele frequency of all genotyped samples.

Pedigree-based individual narrow-sense heritability (ℎ𝑎2) and marker-based individual narrow-sense heritability (hg2) were calculated as:

**see formula 3 in the supplementary files**

respectively, 𝜎2𝑝𝑎 and 𝜎2𝑝𝑔 are phenotypic variances for pedigree-based and marker-based models, respectively.

*Selection of the optimal training and validation sets ratio*

Cross-validation was conducted after dividing randomly the whole dataset into a training and a validation set. To find the most suitable ratio between the two, we divided the data into sets with five different ratios between the training and the validation sets: 50, 60, 70, 80 and 90%. 100 replicate iterations were carried out for each tested ratio and trait.

*Statistical method for model development*

In the same context we aimed to find optimal methods. Several statistical methods were compared: pedigree-based best linear unbiased predictions (ABLUP), and four GS methods: genomic best linear unbiased predictions (GBLUP) 44, random regression-best linear unbiased predictions (rrBLUP) 4, 45, BayesB 46, and reproducing kernel Hilbert space (RKHS).

rrBLUP used a shrinkage parameter lamda in a mixed model and assumes that all markers have a common variance. In BayesB the assumption of common variance across marker effects was relaxed by adding more flexibility in the model. RKHS does not assume linearity so it could potentially capture nonadditive relationships 47. R package *rrBLUP*48 was used for GBLUP and rrBLUP, package *BGLR *49 was used for BayesB and RKHS. The pedigree-based relationship matrix was obtained with the R package *pedigree *50.

*PA and accuracy estimation*

The adjusted phenotypes y’=y-*X**b* were used as model response in the genomic prediction models. Model quality was evaluated by predictive ability (PA), which is the mean of the correlation between the adjusted phenotype and the model predicted phenotypes, r(y’,yhat) from 100 times CV. Prediction accuracy (PC) was defined as PA/ (h2) 15, 51. In order to investigate whether GS model training can be conducted at earlier age, PA at each tree calendar age and cambial age were estimated. In this case, cross validation was conducted only using area-weighted values at each age, then the trait values at each age were estimated. PA at a specific age was calculated as the correlation between estimated trait values at that age and area-weighted values from pith to the last ring (for cambial age) and last year (for calendar age), respectively.

Genomic selection for well-performing trees with the use of marker information (G matrix) requires access to previously trained GS models. Thus, model training is a necessary part of GS integration into operational breeding. Model training can be conducted in already existing plantations with trees of relatively high ages, as illustrated in this work. It is, however, expected and desired that such model training can be conducted with high PAs also for younger trees. This would be especially useful if maturity (flower production) can be accelerated, to shorten the total breeding cycle.

Operationally, it is also important to develop protocols to assess wood quality in resources at minimum cost and time, and with minimal impact on the trees. Therefore, on coring, it is not only important to know the minimum age at which useful information can be obtained, but also from how many rings from the bark towards the pith information is required to train models with high predictive ability. To address these two practical questions for operational breeding, we trained prediction models based on data from different sets of rings, in order to mimic and compare PAs obtained when coring at different ages of the trees to different depths into the stem, or more precisely, using data from different numbers of rings, starting next to the bark. All the models were judged on, compared by their ability to predict the cross-sectional average of the trait at age 19 years across all trees in the validation set.