**Plant material. **The study was based on previously published data (Pégard et al., 2019, 2020). Seventeen male and 17 female *Populus nigra* were sampled in France, from 21 natural populations and 13 from breeding programs. These individuals were used as parents in a factorial crossing plan. Clones of both parents and progeny, and a hundred additional individuals from breeding programs, were grown in four experimental trials at the same location (Guéméné-Penfao, France, 47°37’59”N, 1°49’59”W), each with a random block design. In total, the experiment comprised 10,301 trees (including 7,169 trees with genotype information), 42 full-sib families, 5 half-sib families, 1,452 genotypes (each replicated on average 7.09 times), and an Unknown Parent Group (UPG) of 105 genotypes (Table S1). Family size ranged from 1 to 119, with an average of 30.2 genotypes per family. Twenty genotypes and 17 families were shared across the four trials.

**Phenotype measurements.** We focused on two phenotypes of one-year-old trees: height and rust vulnerability (both measured on field). Height was assessed with a graduation rod (in cm), and rust vulnerability was assessed on a 1 to 9 scale (Legionnet et al., 1999). Across the trials, at least 95.29%, 96.05%, and 95.13% of the trees were phenotyped for height, rust vulnerability, or both traits, respectively. Given the low missing rate, we assumed that missing values only marginally unbalanced the block design, and so we used default parameters of the software packages for handling missing values when analyzing the data.

In each trial, every phenotype was first corrected for micro-environmental heterogeneity with a multitrait mixed model, including as explanatory variables a random block effect and a B-spline spatial autocorrelation correction. We used the function *remlf90* (R package *breedR*; Munoz and Sanchez, 2014) with the parameters ‘model’ set to ‘splines’ and ‘method’ set to ‘em’. All further analyses were performed on the residuals of this model, i.e., on spatially adjusted phenotypes.

**Genomic data. **Out of the 1,452 genotypes, 1,034 were genotyped (25 parents and 1,009 offspring), using a *Populus nigra* 12K custom Infinium Bead-Chip (Illumina, San Diego, CA) (Faivre-Rampant et al., 2016). Details of the DNA extraction protocol, bioinformatic pipelines, and SNP mapping on the 19 chromosomes are given in Faivre-Rampant et al. (2016) and Pégard et al. (2019, 2020). It is worth noting that the SNP chip array included SNP associated with, among other traits, rust resistance, but not with vertical growth. We further filtered for a Minor Allele Frequency (MAF) of 5% − corresponding to a threshold of 50 individuals for a genotyping error; in total, 7,129 SNPs were retained for further analyses (out of 7,513 initial SNPs).

**Genomic relationship matrix. **In order to build the relationship matrix necessary for inferring breeding values, we combined pedigree and genomic information (Legarra et al., 2009; Christensen et al., 2010). The pedigree relationship matrix AΓ was built using the tabular rule method (Emik and Terrill, 1949), modified to account for a single metafounder (assuming a single base population for the Unkown Parent Group or UPG; Legarra et al., 2015), with a self-relationship of γ = 8σp2, with σp2 the variance of allele frequencies across markers (Garcia-Baccino et al., 2017). The combined relationship matrix H was summarized in Aguilar et al. (2019), and is:

H-1 = AΓ-1 + (0,0;0; G-1 − A22Γ-1) (1),

where A22Γ was the pedigree relationship matrix for genotyped individuals, G = (1−α) (ωa + ωb x GV) + α A22Γ, with GV the genomic relationship matrix estimated from the SNPs and scaled following the first scaling method of Van Raden (2008), α a scaling parameter (here equal to 0.05), and ωa and ωb chosen to equate the average inbreeding and the average relationships in GV and A22Γ as in Christensen et al. (2012). As the sample size was relatively small, the matrix H-1 was obtained by simply inverting H with the function *solve* (R package *base*).

**Population structure. **In order to control for population structure and avoid spurious associations, we computed a distance matrix 1 − H*, where H* is the correlation matrix obtained from H using the function *cov2cor* (R package *stats*), and the minus sign is to convert similarities to distances. We then used Multidimensional Scaling (MDS) on the distance matrix with the function *cmdscale_lanczos* (R package *refund*; Goldsmith et al., 2022; Miller, 2022) with the parameter ‘k’ set to 5, and ‘eig’ set to ‘TRUE’. Negative eigenvalues were set to null, and the variance explained by an axis was computed as the ratio between its squared eigenvalue and the sum of the squared eigenvalues. Additionally, to assess how well the family structure was captured, we measured the variable VF defined as the average within family variance on the first five axes (the lower the VF, the better the family structure is captured). To make MDSs comparable, we normalized VF by the sum of squares of the first five eigenvalues.

**Breeding value inferences.** The spatially adjusted phenotypes were then used to infer the breeding values with a multitrait single-step Genomic Best Linear Unbiased Predictor (ssGBLUP), as follows:

Yijk = μ + mds1j + mds2j + TRIALi + ANIMALj + eijk , (2)

where ‘Yijk’ is the phenotype of the k-th replicate of the j-th genotype in the i-th trial, μ is the grand mean, ‘mds1j’ and ‘mds2j’ the fixed effects on the j-th genotype of the first and second axes of the MDS respectively, ‘TRIALi’ the random effect of the i-th trial, ‘ANIMALj’ the polygenic effect of the j-th genotype, and ‘eijk’ the residuals. Both the trial effect and the residuals follow a Gaussian distribution with diagonal covariance matrices, and the polygenic effect follows a Gaussian distribution with a covariance matrix equal to k.H x Σ, where k is a scaling parameter equal to 1 - γ/2 (Legarra et al., 2015), H the combined relationship matrix, x the Kronecker product, and Σ the 2x2 covariance matrix between height and rust vulnerability. Estimates of coefficients and variance components of model (2) were obtained with the function *remlf90* (R package *breedR*) through one run with the ‘method’ parameter set to ‘em’ until convergence, then another run with the parameter ‘method’ set to ‘ai’ and the parameter ‘progsf90.option’ set to ‘maxrounds 1’. The output estimate of ‘ANIMALj’ is the Estimated Breeding Value (EBV) of the j-th genotype. The heritability of the traits was estimated as the diagonal of Σ divided by the sum of the diagonals of Σ and the estimated variance of the residuals. The predictive ability (PA) of the model was defined as Pearson's correlation coefficient between the breeding values re-estimated from the effect sizes (see below) and the adjusted phenotypes corrected by the trial effect (i.e., Yijk − TRIALi).

**Genome-Wide Association Study (GWAS).** Following the equations of Aguilar et al. (2019), the effect size of the SNPs was backsolved from the EBVs, along with their standard error and their p-values. This process was performed trait by trait (i.e., without accounting for the multitrait dimension) without loss of information, as the Pearson’s correlation between the ssGBLUP’s EBV and the breeding values re-estimated by the effect sizes was greater than 0.999.

With the estimated effect sizes and the standard errors, we also computed an alternative to the p-value accounting for multiple testing: the local false sign rate (lfsr) estimated with an empirical Bayes approach for adaptive shrinkage using the function *ash* (R package *ashr*; Stephens, 2017). In order to detect GWAS peaks (regions in the chromosomes with a high density of significant SNPs), we used a custom peak detection script on the profile of lfsr (Tiret & Milesi, 2021) with default parameters.

**Neighborhood.** In order to account for the effect of the neighborhood, we estimated the breeding values with an additional fixed effect, the local relatedness, defined as the average relationship (from the matrix H) between a focal individual and its neighbors (eight or less if on boundaries; as the king’s moves on chess). Local relatedness was orthogonal to any micro-environmental effect, because of the random block design. The re-estimated breeding values were denoted nEBV.

**SNP subsets.** We investigated the effect of sampling SNPs on the ssGBLUP (i.e., weighted ssGBLUP). We selected subsets based on population and quantitative genetics features computed after a first run of ssGBLUP: MAF, Ancestry Informativeness Coefficient (AIM; Rozenberg et al., 2003), and for each trait, significance in the GWAS, and heritability. For each feature, we kept one-tenth of the whole set (713 SNPs, a sample size close to previous studies on weighted GBLUP, e.g., Li et al., 2018) with either the smallest or the largest values, ending up with 12 subsets. AIM was estimated per family with the R script provided by Cappa et al. (2022), and SNP heritability was estimated as 2p(1-p)a2, where p is the MAF, and a the effect size (assuming no dominance nor interaction deviation of the genetic variance; p.129, Falconer, 1989). We re-estimated the combined relationship matrix for each SNP subset by re-estimating in the equation (1) the genomic relationship matrix GV with the SNP subset. With the re-estimated matrices H, we re-performed the MDSs, and re-estimated the PAs of the EBVs and nEBVs (where local relatedness were measured with the re-estimated H).

**Statistical tests.** For each SNP subset, we computed the confidence interval of the Predictive Ability (PA) with a 1000 iteration bootstrap (re-sampling individuals), without re-estimating the covariance matrix Σ. Here, the bootstrap was only used to assess the variance, so that we centered the bootstrapped PA around the true PA. To assess the effect of sampling, these subsets were compared to 1000 random samples of 713 SNPs (the size of the SNP subsets). Comparisons were performed with a Student’s one-sample *t-*test (denoted *t*1 with a degree of freedom or df of 999 corresponding to the number of bootstrap iterations), or a Welch two-sample *t-*test (denoted *t*2 with a varying df), both using the function *t.test* (R package *stats*). The sign of the *t *statistic is arbitrarily reported as positive when greater than a baseline (e.g., the true PA), and negative when not. Models were compared with a likelihood ratio test with a df of 1 using the function *lrtest* (R package *lmtest*; Zeileis & Hothorn, 2002). All scripts were written in R v4.1.3 (R Core Team, 2022).