Using multi-source national forest inventory data for predicting tree characteristics of individual stands

doi:10.21203/rs.3.rs-1722741/v1

Download PDF

Research Article

Using multi-source national forest inventory data for predicting tree characteristics of individual stands

https://doi.org/10.21203/rs.3.rs-1722741/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Satellite-based remote sensing data are utilised in forest inventories and the results are typically presented by region and municipality. We evaluated if imagery pixel-based data could be used in generating trees of individual stands, and predicting diameter distributions, stem volumes and tree species proportions. We compared alternative methods (k-nearest neighbors, k-NN) varying k and applying either measured trees (k-NN_trees) or stand characteristics (k-NN_stand). In the k-NN_trees method, a stand was generated based on the measured trees of the National Forest Inventory plots, whereas in the k-NN_stand method a diameter distribution was recovered from the stand characteristics of the same inventory plots. Both methods performed well and resulted in 8–14% differences in the total volume compared with the field inventory of the 27 stands used for the evaluation. The RMSE in total volume ranged from 25% (5-NN_stand) to 31% (1-NN_stand), while the smallest RMSEs in volume by tree species were 61% for broadleaves (3-NN_trees), 65% for pine (3-NN_trees) and 65% for spruce (4-NN_stand). Moreover, the main tree species was correctly predicted for 74% of cases. In the k-NN methods, increasing value of k decreased the difference in most of the evaluated variables. The satellite-based data with NFI plots were useful for predicting trees for pixels of a stand. When generating input data for a long-term simulation, the choice of the method was less influential as the effect of the error in the initial stand characteristics decreased over time during the simulation period.

diameter distribution

k-nearest neighbors

stand simulation

tree species composition

Forest management planning has traditionally been based on field inventories collecting stand-level data on basal area, mean diameter and height by tree species. Thereafter, if tree-level data are needed, trees are generated using diameter distribution models and models predicting tree height (Kilkki and Päivinen 1986; Veltheim 1987; Maltamo and Kangas 1998). These tree-level predictions can then be used as input data for growth and yield simulators (Siitonen 1993; Salminen et al. 2005; Tokola et al. 2006). However, the development of remote sensing approaches, including satellite images and airborne laser scanning (ALS), has produced high-resolution pixel-based predictions of stand characteristics (Packalén and Maltamo 2008; Peuhkurinen et al. 2008). Remote sensing-based data on forest stands can be used in forest planning for quantifying alternative paths and bottlenecks for biomass production.

The aim of sample-based forest inventories, e.g., National Forest Inventories (NFI), has typically been to provide strategic information for national and regional forestry-related political decision-making, as well as for forest sector enterprises. In growth and yield simulations, these sample-based forest inventory data are usually utilized by using the sampled trees or tree distributions derived from stand-level measurements as input data for simulations (Wikström 2001; Ahtikoski et al. 2018). This method provides predictions of the average forest development over relatively large areas.

In recent decades, remote sensing data and other auxiliary information, e.g., digital map data, have been extensively utilized in forest inventories (Næsset et al. 2004; Maltamo et al. 2007; McRoberts et al. 2010; Barrett et al. 2016). In the Finnish NFI, the satellite image based Multi-Source National Forest Inventory (MS-NFI) was introduced at the end of the 1980s (Tomppo 1990; Tomppo et al. 2008). With the help of optical satellite images, numerical map data and the NFI field sample, up-to-date forest characteristics can be derived for 16 m × 16 m pixels covering the whole country. The results are typically presented by region and municipality, as well as numerical forest resource maps. However, the results can be calculated for any given area. As estimates for units of any size can be computed from the pixel-based data, they could also provide information for forest management planning at stand level (Mäkelä et al. 2011; Holopainen et al. 2014).

There are several alternative methods for the tree-level description of a forest stand (Maltamo 1998; Mehtätalo 2004; Packalén 2009; Siipilehto 2011b; Räty 2020). In the parametric methods, theoretic distribution functions are either predicted or recovered from stand characteristics. The most widely used distribution function in forestry due to its flexibility and relative simplicity (Maltamo, 1997; Siipilehto 1999; Poudel 2011) is the Weibull distribution (Bailey and Dell 1973). Using weighted and unweighted mean and median stand characteristics, Siipilehto and Mehtätalo (2013) presented parameter recovery equations for the Weibull function.

In addition to the parametric distribution models, diameter distributions can be based on measured field plots using non-parametric methods, such as k-nearest neighbors (k-NN) and k-most similar neighbors (k-MSN) algorithms (Maltamo and Kangas 1998; Maltamo et al. 2003). The non-parametric k-NN methods have been widely used in predicting forest variables employing satellite images and field plot data (Tomppo 1990; Chirici et al. 2016). In these methods, a set of the most similar sample plots from the training data are found based on a metric on the feature variables computed from auxiliary data (e.g., space-borne or airborne observations). The forest variables being estimated are then computed as a function of the forest variables of the sample plots. In the Finnish MS-NFI, the predictions are weighted averages or modes of the k-NN to the pixel. The weights of each plot to a pixel also enable the original values of the field plots to be retained in the calculations. Another, less common way to use the sample plots is to produce a set of trees directly from the measured trees, i.e., without using the predicted diameter distribution as an intermediate step.

The aim of this study was to evaluate whether the MS-NFI pixel-level data can be used for generating tree characteristics of individual stands. In the study, alternative prediction methods for diameter distribution were compared. For the MS-NFI data, the non-parametric k-NN method was used to predict the stand structure directly based on the trees measured on the NFI field plots. Alternatively, the stand characteristics of the selected k-NN plots were used to recover Weibull distribution to sample trees. The prediction methods were compared with the measured validation field data based on the total stand volume and tree species proportions. The potential of the generated tree sets was also evaluated as an input data for stand-level simulations over a 30-year period.

2.1 The plot- and stand-level validation data sets

The study region was located in central Finland in the municipalities of Multia and Keuruu. In 2014, a set of 30 32 m × 32 m sized field plots were measured (Tomppo et al. 2017), and these plots were subdivided into four 16 m × 16 m sized subplots. The locations of these plots were selected subjectively from stands, in which the estimation with remote sensing methods usually leads to large RMSEs. Each 32m × 32m plot was entirely within one stand and belonged to one of the following three development classes: young thinning stand, advanced thinning stand or mature stand. A two-phase procedure using post-processed Global Navigation Satellite System (GNSS) observations was used to position the plots as close as possible to the planned locations. The stem diameter at breast height (dbh), as well as the distance and direction from the centre of the subplot were measured for every tree with dbh ≥ 2.5 cm, while tree height and age were additionally measured from the sample trees. On average, 37 sample trees per 32 m × 32 m plot were chosen based on the cumulative dbh. For the reference data, additional sample trees were selected from basal area median trees by tree species groups. These plots were used for the preliminary study to find the best k value for the k-NN method as well as for the Kolmogorov-Smirnov (KS) goodness-of-fit tests to evaluate the predicted dbh distributions. For the validation of the distributions, the minimum dbh was set to 4.5 cm, which is the minimum taper tree dimension on the NFI fixed area plots. The plot-level validation was made against stand characteristics, which were basal area (G), number of stems (N), basal area-weighted and arithmetic mean dbh (DG, D) (Table 1).

Table 1

The characteristics of the validation data (27 stands measured in 2020) and plot-level characteristics (32 m x 32 m plots in the same stands, measured in 2014).
Variable	Mean	Stdev	Min	Max
Stand-level characteristics (in 2020)
Age, years	46.7	22.6	19	110
Dominant height, m	18.5	3.2	12.7	25.2
Basal area, m²ha^− 1	26.6	7.6	16.2	45
Stem number, ha^− 1	1278.8	622.9	439	2348
DG* (all species), cm	20.8	4.8	13.7	30
Total stem volume (V), m³ha^− 1	213.6	72.7	101.4	390.5
V for Scots pine, m³ha^− 1	74.8	64.0	0.0	209.5
V for Norway spruce, m³ha^− 1	93.0	79.6	0.0	278.9
V for broadleaves, m³ha^− 1	45.7	35.7	0.0	116.7
Plot-level characteristics (in 2014)
Basal area, m²ha^− 1	23.1	8.5	9.1	42.0
Stem number, ha^− 1	2200.4	1005.5	429.7	4619.1
DG* (all species), cm	15.1	5.6	7.7	28.9
D* (all species), cm	11.7	4.1	6.4	24.5
*DG is basal-area-weighted mean diameter
*D is arithmetic mean diameter

In 2020, the stands as a whole (including the 32 m × 32 m plots) were inventoried (Table 1) using the Trestima smart-phone app and inventory method (Rouvinen 2014). A total of 27 stands were included, because two of the original stands were clear cut and two adjacent stands were merged into one stand. The Trestima system recognises tree species, estimates dbh, and calculates the mean diameters, basal area, number of stems and volume ha^− 1, which can be used for recovering diameter distributions and predicting height curve. Because the Testima system provides diameter frequency distributions by 2-cm classes, but not a tree list, diameter distributions were generated to sample trees and estimate stem volumes at tree level. In the inventory of the stands fifteen sample photographs were systematically taken across each stand to diminish the standard error in basal area (Siipilehto et al. 2016). The Trestima system calculates the standard error in the basal area between the photographs taken within a stand compartment (Rouvinen 2014). In this data set, the standard error in the basal area was, on average, 8% (4.1% – 11.4%).

With regard to tree species, eleven of the stands were dominated by Scots pine (Pinus sylvestris L.), fifteen by Norway spruce (Picea abies (L.) Karst.) and one by broadleaved trees, mainly birch (Betula spp.). The majority of the stands were naturally regenerated but there were almost as many planted stands, i.e., 15 and 12 stands, respectively. Three of the stands were on a grove-like site, 17 on a mesic heath site, and seven on a dryish site, i.e., OMT, MT and VT according to the Finnish site type classification (Cajander 1949). Three of the stands classified as MT were peatland sites of corresponding fertility. Four of the stands were thinned between the measurements in 2014 and 2020.

<Table 1 plot- and stand-level characteristics>

2.2 The Multi-Source NFI data

In the present study, the MS-NFI data was based on MS-NFI-2015 (Mäkisara et al. 2019), which provided pixel-wise estimates of several forest variables on July 31th, 2015. The remote sensing material consisted of Landsat 8 OLI and Sentinel 2 MSI images from 2015 (15 image windows). In addition, numerical map data and an elevation model provided by the National Land Survey of Finland were used to constrain search in the k-NN method. In the training, the field data used was from 2012–2016 (54,551 field plots in forest land, poorly productive forest land, and unproductive forest land). For this study, pointers of the nearest 1 ̶ 5 field plots in feature space for each pixel were produced, which enabled the use of all the measured tree-level data from the field plots, in addition to the average pixel-wise estimates.

All the generated dbh distributions were computationally updated to 2020 with the Motti stand simulator to match the validation data (the Trestima inventory in 2020). In four stands, recent pre-commercial or first thinnings were mimicked in Motti during the update. Motti is a comprehensive analysis tool and decision support system for assessing the impacts of forest management alternatives on stand dynamics (Salminen et al. 2005; Hynynen et al. 2015).

2.3 The k-NN estimation method

The k-NN method was first used in classification tasks (Fix and Hodges 1951). The k-NN regression or estimation is a natural extension of the classification method (Rogers 1978), and the reference data for the k-NN estimation consists of pairs of vectors (x_i,y_i), where x_i consists of the auxiliary data (e.g., spectral channel radiances) and y_i consists of the forest variables associated with the observation i$\in$F, where F is the set of reference observations. When variable values corresponding to a vector of target auxiliary data x are predicted, the distances from the unknown vector to the learning data vectors are first computed and ordered. The set C_k of k prototypes corresponding to the smallest distances are selected. The prediction is then computed as a weighted sum

$$y={\sum }_{i\in {c}_{k}}{w}_{i}{y}_{i}$$

where ${\sum }_{k\in {c}_{k}}{w}_{i}=1$.

The simplest choice is to make the weights equal, but slightly better results may be obtained by using inverse distances or squares of inverse distances as weights.

The details of this basic k-NN method can be varied, e.g., with respect to distance metrics, weights attached to the nearest neighbours, and value of k. The distance measure most often used is the Euclidean distance, but other distance metrics can be used (Chirici et al. 2016). In the k-MSN method (Moeur and Stage 1995), the distance is based on the canonical correlations, which is one way to solve the problems with a large dimension of auxiliary data. Another popular method is to use a genetic algorithm to weigh and select the auxiliary variables (Tomppo and Halme 2004). When finding the smallest distances, the set of the plots used can be limited with some constraints for each target vector, making k-NN a very adaptive method.

In order for results to be good, the parameter k should be selected. If k = 1, the results are real observations and the dependencies between the forest variables in the real world are the same in the predictions. However, one drawback is that the noise in the learning data is directly seen in the predictions. If k > 1, the noise in the learning data is reduced, but the dependencies between the predicted variable values may not hold. A well-known dilemma in k-NN is that error variance decreases and bias increases when k is increased (Geman et al. 1992). The choice of k has been discussed in many previous studies, but there are no universal rules for selecting k for a certain application and data.

In this study, predictions using NFI data were based on each k-NN independently without weighting, i.e., either measured trees from k-NN were used as a prediction of the stand structure for each grid cell or stand characteristics of each k-NN are used to recover Weibull distribution for each grid cell. This way, it was possible to avoid the potential problem that the dependencies between stand characteristics may not hold if they were combined.

2.4 The alternative prediction methods

Alternative methods were compared for predicting tree characteristics of individual stands (dbh and height distributions). The estimates using the MS-NFI data were first derived to grid cell level (16 m x 16 m) and further assembled into entire forest stands based on the stand borderlines of the operative forest management inventories downloaded from the SMK service Metsaan.fi.

The methods were compared with the ground truth data measured in 2014 at plot-level and with the validation data measured by the Trestima approach in 2020 at stand-level. The methods were as follows:

1-NN_trees: using the measured trees of one nearest neighbor NFI plot per grid cell.

1-NN_stand: species-specific stand characteristics of one NFI plot for predicting the grid-level stand characteristics.

3-NN_trees: using the measured trees of three NFI plots per grid cell.

3-NN_stand: combining species-specific stand characteristics of three NFI plots to grid-level stand characteristics.

4-NN_stand: combining species-specific stand characteristics from four NFI plots to grid-level stand characteristics.

5-NN_stand: combining species-specific stand characteristics from five NFI plots to grid-level stand characteristics.

< Fig. 1 > workflow

The stand-level validation data provided estimates for stem number (N), basal area (BA), arithmetic mean dbh (D), basal-area-weighted mean dbh (DG), quadratic mean dbh (DQ) and total volume (Vtot) for the whole stand, as well as for tree species. The dbh distributions were recovered from the 2-parameter Weibull function based on the measured BA, N and DG (Siipilehto and Mehtätalo 2013). The Näslund’s height curves were estimated using the models by Siipilehto and Kangas (2015) using age, DG, basal-area-weighted mean height (HG) and BA as predictor variables. The stand volumes were then calculated from the generated trees by tree species using the models by Laasasenaho (1982).

In the methods based on MS-NFI, k-NN methods were applied using k = 1, 3, 4 and 5. The 1-NN and 3-NN methods were used in two different ways. In the 1-NN_trees and 3-NN_trees, the measured trees of the selected NFI plots were used as such for providing the trees to each grid cell, with each tree representing 1/k per hectare. In contrast, in the 1-NN_stand, 3-NN_stand, 4-NN_stand and 5-NN_stand methods, the dbh distributions to each grid cell were generated using each k species-specific stand characteristics (instead of the weighted average as shown in Eq. 1) of the measured NFI plots. In the k-NN_stand methods, the number of generated trees was restricted to N by selecting only N/k trees from a certain dbh distribution, with each tree representing 1 tree per hectare. Therefore, the difference between the methods was that in k-NN_trees the measured trees were regarded as a realisation of the existing dbh distribution, and the trees were randomly selected from that distribution in k-NN_stand. In both methods, the whole stand consisted of trees generated to each grid cell. Because the calculations in the 3-NN_trees were time consuming (three times the trees in a stand resulted in about 20,000 trees), k-NN_trees for k values over 3 was not used.

In the k-NN_stand method, the trees were sampled from the cumulative probability distribution by randomizing the probability (P) from the uniform 0–1 distribution (Siipilehto et al. 2016). The cumulative Weibull distribution function is F(dbh) = 1 – exp(-(dbh/b)^c), and the tree dbh was solved as dbh = b(-ln(1-P))^(1/c), where b and c are the scale and shape parameters (Bailey and Dell 1973). Thereafter, the tree heights were predicted using the models by Siipilehto and Kangas (2015) for the Näslund’s height curve. Näslund’s height curve for tree height h is h = (dbh/(b₀ + b₁ dbh))^p, in which b₀ and b₁ are the predicted parameters and the power p was set 2 for Scots pine and broadleaves and 3 for Norway spruce.

2.5 Comparison of the methods

The applied methods were first evaluated at plot level by comparing the predicted dbh distributions to the data of the 32 m × 32 m plots including the four 16 m × 16 m grid cells per stands. The analysis was based on the sample plot data measured in 2014 and the MS-NFI data sets including satellite images and field plots. The aim of the plot-level comparison was to find the best number of neighbours (k) for the k-NN method to be used in the subsequent stand-level analyses. The effect of k on the accuracy of stand characteristics (G, N, DG and D) was analysed using k = 1, 2, 3, 4 and 5. Validation criteria were bias (absolute and relative bias%), standard deviation of prediction errors (stdev) and absolute and relative root mean square errors (RMSE, RMSE%). In addition to common RMSE related to the predicted values, we calculated RMSE%_OBS related to the observed ground truth value. In this case, each prediction was related to the same value.

The Kolmogorov-Smirnov (KS) goodness-of-fit test at alpha 0.1 level was used for predicted dbh distributions (Siipilehto et al. 2016). Because large samples were used, the KS test value was calculated as D_n,m = √(-ln(α/2) * (1 + m/n)/2m), where n and m are the sizes of the two samples (i.e., the number of measured and the number of predicted trees) and α is the selected risk value (α = 0.1). Thus, the smaller the KS quotient, the better the fit. In addition, KS-quotient > 1 means the rejected case, i.e., the predicted distribution did not fit to the observed distribution according to the KS test at a = 0.1 level.

At stand level, the applied methods were evaluated by comparing the characteristics of the predicted tree sets, updated to 2020, to the validation data for the whole stand per hectare basis. The differences (validation – prediction) and RMSEs were calculated in the total stem volume for the whole stand and by tree species (pine, spruce, broadleaves). Based on the predicted total stem volumes, how many times a method was the best or the worst was calculated. In addition, the rank sum was calculated after ranking the methods in each stand. The best method was given the rank number 1 and the worst rank number 6.

As well as comparing the methods at the initial state, the manner in which the differences in the stand characteristics evolved over time was studied. The development of all stands was simulated over a 30-year period using the predicted trees as an initial state for the Motti simulator (Salminen et al. 2005; Hynynen et al. 2015). At the end of the 30-year simulation, the differences in the species-specific volumes were checked. The relative difference (RF) in the total volume (Vtot) between the applied methods and the validation data along the simulation in 10-year steps was calculated as follows:

RF = (Vtot(validation) - Vtot(predicted))/Vtot(validation) (2)

3.1 Plot-level results with varying k

It was found that 1-NN provided the least biased estimates for the basal-area-weighted mean diameter (DG), 3-NN provided the least biased estimates for the arithmetic mean diameter (D), while 4-NN provided the least biased estimates for sum characteristics, i.e., number of stems (N) and basal area ha^− 1 (BA) (Table 2). Further, the smallest RMSE for D and DG was provided using 1-NN, for BA using 4-NN, and for N using 5-NN (Table 2). In conclusion, there was no clear best k value according to all the characteristics considered. Therefore, results for k values of 1, 3, 4 and 5 are presented.

Table 2

Plot-level (plot size 32 m x 32 m) accuracy (bias and RMSE) in stand characteristics with varying k in k-NN method. The smallest and largest value in each column is in **bold** and *Italics*, respectively.
	G, m²ha^− 1	N, ha^− 1	DG, cm	D, cm		G, m²ha^− 1	N, ha^− 1	DG, cm	D, cm
1-NN
bias	2.47	904.66	-3.75	-1.99	RMSE	8.83	1368.41	5.26	3.81
bias%	10.67	41.11	-24.8	-17.00	RMSE%	43.11	110.76	27.52	27.48
stdev	8.32	983.54	3.50	3.15	RMSE%_OBS	38.20	62.19	34.8	32.47
2-NN
bias	0.85	857.87	-4.08	-1.80	RMSE	8.04	1323.23	5.30	3.36
bias%	3.69	38.99	-26.97	-15.37	RMSE%	36.19	102.95	27.21	24.60
stdev	7.86	966.84	3.16	2.75	RMSE%_OBS	34.76	60.14	35.03	28.63
3-NN
bias	1.14	862.34	-3.93	-1.59	RMSE	8.05	1314.35	5.09	3.05
bias%	4.93	39.19	-25.98	-13.57	RMSE%	36.74	102.64	26.34	22.72
stdev	7.83	950.88	3.02	2.53	RMSE%_OBS	34.8	59.73	33.64	26.01
4-NN
bias	0.36	842.2	-4.01	-1.77	RMSE	7.71	1303.7	5.26	3.30
bias%	1.57	38.27	-26.53	-15.09	RMSE%	33.91	100.12	27.1	24.24
stdev	7.57	955.31	3.19	2.70	RMSE%_OBS	33.34	59.25	34.77	28.14
5-NN
bias	0.49	846.88	-4.10	-1.79	RMSE	7.36	1313.64	5.32	3.41
bias%	2.12	38.49	-27.10	-15.27	RMSE%	32.55	101.28	27.3	25.02
stdev	7.22	964.18	3.17	2.82	RMSE%_OBS	31.81	59.70	35.19	29.10

The diameter distributions predicted by the alternative methods were first compared based on the 32 m × 32 m plots measured in 2014. The Kolmogorov-Smirnov (KS) test results are given as a KS quotient for each plot (Table 3). In general, the predicted diameter distributions were not satisfactory, because at a 10% risk level, 24% – 48% of the predictions of the methods were rejected (KS quotient > 1). The best results in terms of the least rejected cases (7), the number of best fit (10), and the smallest average KS quotient (0.883) was found with the 5-NN_stand method. The worst average results were found with the 3-NN_trees method (14 rejected, 2 best fits, 1.0 the average KS quotient). Also 1-NN_stand provided smallest number (7) of rejected cases, but simultaneously the number of best fits was low (3 cases). In general, the methods predicted the number of trees in the larger diameter classes satisfactorily, but the predictions for the smaller diameter classes, which contribute less to the stand volume, were less accurate (Fig. 2).

Table 3

The average Kolmogorov-Smirnov quotients at risk level of 0.1 (α = 0.1), the number of rejected cases and its proportion, as well as the number of the **best** and *worst* fits in each stand, using the prediction methods.
Stand	1-NN trees	1-NN_ stand	3-NN trees	3-NN_ stand	4-NN_ stand	5-NN_ stand
1	1.062	1.294	0.726	1.316	1.195	1.152
3	0.637	0.767	1.052	0.739	0.810	0.668
4	0.704	1.084	1.072	1.110	0.791	1.013
5	1.569	0.946	1.266	0.998	1.115	0.901
6	1.445	0.946	1.247	1.015	1.130	0.837
8	0.703	0.419	0.719	0.644	0.661	0.708
9	0.594	0.458	0.526	0.372	0.400	0.385
10	1.246	1.307	1.476	1.668	1.213	1.543
12	0.865	0.763	0.592	0.627	0.539	0.617
13	0.733	0.759	0.930	0.921	1.036	1.003
15	0.422	0.452	0.462	0.715	0.508	0.493
16	0.606	0.783	0.496	0.595	0.645	0.353
17	1.464	0.653	1.151	0.630	0.639	0.572
19	0.854	0.591	0.955	0.604	0.614	0.486
20	0.972	0.815	1.013	0.745	0.558	0.669
21	0.779	0.676	0.757	0.664	0.717	0.656
22	0.940	0.708	1.225	0.621	0.660	0.677
23	3.409	3.535	3.774	2.948	2.972	3.254
26	0.766	1.282	0.803	1.100	0.993	0.974
27	0.635	0.678	0.859	0.549	0.665	0.889
28	0.779	0.422	0.486	0.444	0.607	0.516
31	1.591	1.941	1.165	1.755	1.438	1.874
32	0.493	0.457	0.621	0.490	0.343	0.378
33	0.832	0.558	0.706	0.602	0.699	0.418
34	0.838	0.412	0.588	0.633	0.629	0.542
35	1.049	0.960	1.177	0.891	1.088	0.557
36	0.995	0.919	1.226	0.779	0.936	0.983
38	1.141	1.765	1.165	1.693	1.747	1.911
39	0.709	0.637	0.770	0.715	0.610	0.589
average	0.994	0.930	1.000	0.917	0.895	0.883
rejected	9	7	14	8	9	7
proportion	0.310	0.241	0.483	0.276	0.310	0.241
best fit	6	3	2	3	5	10
worst fit	9	3	10	4	1	2

3.2 Ranking the methods according to the accuracy in the initial volume

In the k-NN_trees method, a stand was generated based on the measured trees of the NFI plots, whereas in the k-NN_stand method a diameter distribution was generated based on the stand characteristics of the same plots. In practice, the same trees are used several times in the k-NN_trees method when generating trees to the grid cells of a stand. In the model-based k-NN_stand method, the random selection of trees from the recovered Weibull distribution smooths the final dbh distribution (Fig. 2). Despite this fundamental difference, these methods resulted in no major differences in the total stand volumes (Fig. 3).

With regard to ranking the methods, the smallest difference (validation – prediction) in the total stand volume (7.8%) to the validation data, as well as the smallest RMSE (24.7%), was given by the 5-NN_stand method (Table 4). The biggest difference in total volume (14.3%) was given by the 1-NN_trees method (Table 4). According to the RMSE in total and species-specific volumes, the 1-NN_stand method was the worst methods to generate the initial stand structure (Table 4). The relative RMSEs in the species-specific stem volumes were relatively high (61% − 69%), almost double the RMSE% for the whole stand (25% − 31%). Furthermore, one should note that the differences in the RMSE% between the species were only minor.

Table 4

The differences between the validation data and the prediction methods in the total volume (observed-predicted). The smallest and largest value in each row is in **bold** and *Italics*, respectively.
	1-NN trees	1-NN stand	3-NN trees	3-NN stand	4-NN stand	5-NN stand
Total
difference, m³ha^− 1	30.6	30.0	25.2	22.3	19.0	16.7
difference, %	14.3	14.1	11.8	10.4	8.9	7.8
RMSE	63.0	65.9	58.9	62.0	59.2	52.8
RMSE%	29.5	30.8	27.6	29.0	27.7	24.7
Std	56.1	59.7	54.3	52.4	50.9	51.0
Scots pine
difference, m³ha^− 1	-7.9	-7.5	-12.4	-7.9	-11.5	-12.5
difference, %	-10.6	-10.0	-16.6	-10.5	-15.4	-16.7
RMSE	50.4	51.7	48.8	49.3	49.3	48.9
RMSE%	67.4	69.1	65.3	65.9	65.8	65.4
Std	50.8	52.3	48.1	49.6	48.8	48.2
Norway spruce
difference m³ha^− 1	32.9	34.2	34.4	30.5	30.3	29.1
difference, %	35.3	36.7	37.0	32.8	32.5	31.3
RMSE	61.9	64.4	63.7	63.2	60.5	60.8
RMSE%	66.5	69.3	68.5	67.9	65.0	65.3
Std	53.4	55.8	54.7	56.4	53.3	54.4
Broadleaves
difference, m³ha^− 1	5.7	3.7	3.2	-0.4	0.3	0.1
difference, %	12.4	8.0	7.0	-0.9	0.7	0.1
RMSE	29.6	30.1	27.9	28.3	27.9	27.9
RMSE%	64.8	65.8	60.9	62.0	61.0	60.9
Std	29.6	30.5	28.2	28.9	28.4	28.4

All methods predicted higher stem volumes for pine compared with the validation data (Table 4). The largest difference by the 5-NN_stand method had an overestimation of 12.5 m³ha^− 1 (17%) and the smallest difference by the 1-NN_stand method amounted to a 10% difference. In contrast, the methods estimated lower volumes for spruce and, on the whole, slightly lower volumes for broadleaved species compared with the validation data (Table 4). The largest difference in the spruce volume was found using the 3-NN_trees method by 34.4 m³ha^− 1 (37%). The smallest difference in the spruce volume (31%) was for the 5-NN_stand method. For the stem volume for broadleaved species, the 3-NN_stand, 4-NN_stand and 5-NN_stand methods clearly provided a smaller difference (0.1–0.4 m³ha^− 1) to the observed volume than the other k-NN methods using k = 1 (Table 4). While the largest difference by the 1-NN_trees method amounted to 12.4%, the smallest difference in the 5-NN_stand method was only 0.1%.

The deviations (Std) in the differences between the observed and predicted volume were the smallest when k was 3 or 4. The 4-NN_stand provided the smallest Std for total and spruce volumes while 3-NN_trees provided the smallest Std for pine and broadleaves (Table 4). The largest Std was systematically given by the 1-NN_stand method (Table 4).

The 5-NN_stand method provided most often the best fit in volume characteristics (Table 5). In most cases (11 times), the worst fit was found using the 1-NN_stand method. The lowest rank sum of 53 was given by the 4-NN_stand method even if it was given the best rank only six times (Table 5). This was because the 4-NN_ stand method was often the second or third best method, and only once the worst. Low rank sum was also given by the 5-NN_stand method.

Table 5

Ranking of the prediction methods by the number of cases, when the method provided the best or the worst result in the total stand volume and the rank sum, when each method was given a rank number for each stand. The smaller the rank sum, the better the result. The best and worst result in each row is in **bold** and *Italics*, respectively.
Method	1-NN trees	1-NN stand	3-NN trees	3-NN stand	4-NN stand	5-NN stand
Best	3	2	2	2	6	12
Worst	4	11	5	2	1	4
Rank sum	87	102	87	74	53	59

The differences between the observed and predicted total volumes for the different methods were often to the same direction in the individual stands, especially if the differences were high (Fig. 4). The differences between the observed and predicted total volumes mostly became smaller with the increasing k (13 cases had clear trend) while the result was vice versa for 4 cases only (stands 10, 23, 27, and 31).

3.3 Species proportion

The main tree species was defined from the proportion of initial volume by species. The main tree species was correctly predicted for 20 to 21 out of 27 cases (70%-74% of cases). Thus, the difference between the best and worst result was only one case. The error matrix of the main tree species is shown for best results using 3-NN_trees in Table 6. The error matrix showed that all the predicted seven spruce stands were spruce stands (Table 6). Also, the observed two broadleaved stands were predicted to be broadleaved dominated. However, four more stands were predicted to be broadleaved dominated and thus, the prediction accuracy was only 33% (Table 6). The best accuracy for pine showed that 92% of observed pine stands were predicted to be pine stand and 79% of the predicted pine stands were pine dominated (Table 6).

Table 6

Example of error matrix for the main tree species groups using *3-NN_trees*.
Predicted	Observed main tree species
main species	Pine	Spruce	Broadleaves	Total	Accuracy
Pine	11	3	0	14	0.79
Spruce	0	7	0	7	1.00
Broadleaves	1	3	2	6	0.33
Total	12	13	2	27
Accuracy	0.92	0.54	1.00

As an example, Fig. 5 shows the species proportion in the validation data and in the best performed 3-NN_stand prediction method. Regardless of the value of k in the k-NN prediction method, the proportion of tree species resembled that of the example with the 3-NN_stand method in Fig. 5. Generally, the prediction methods provided smaller spruce proportions for the spruce dominated stands compared with the validation data. For stand no 13 the observed data did not include pine but the prediction for pine was as high as 53%.

3.4 Performance of the methods during 30-year simulation

Along the simulation period, the relative differences between the prediction methods in the total volume decreased. As an example, the absolute and relative differences of the stand volume between the 5-NN_stand method and the volume of the validation data set over time are shown in Fig. 6. A similar decrease occurred in most stands regardless of the prediction method, especially when the initial difference was large. In some stands when the difference in the predicted and measured initial state was small, the difference slightly increased over time (stands 8, 15, 16, 20 and 34, Fig. 6). There was nothing in common for these stands (varying site types, including mineral soils and peatlands, planted or naturally regenerated, as well as thinned and unthinned stands). Similarly, there was no systematic differences among the stands for the highest initial differences except for the high initial overestimations, which were mostly on VT site.

After the 30-year simulation, the total volume was most similar to the simulation result based on the validation data for the 5-NN_stand method by underestimation of 8.5 m³ha^− 1 (2.3%) (Table 7). The second and third smallest differences using the 4-NN_stand and 3-NN_stand methods underestimated the total volume by 10.3 m³ha^− 1 (2.8%) and 12.1 m³ha^− 1 (3.2%), respectively. The highest mean difference 21.6 m³ha^− 1 (5.8%) was found using the 1-NN_trees method (Table 7). After the 30-year simulation, the initial differences (Table 4) between 17–31 m³ha^− 1 (8% – 14%) decreased to 9–22 m³ha^− 1 (2% – 6%).

Table 7

The differences, RMSEs, and standard deviations of prediction errors (Std) in the volume characteristics between the prediction methods after the 30-year simulation compared to the validation data-based simulation. The smallest and largest values is in **bold** and *Italics*, respectively.
	1-NN _trees	1-NN_ stand	3-NN_ trees	3-NN_ stand	4-NN_ stand	5-NN_ stand
Total
difference, m³ha^− 1	21.56	20.35	15.73	12.12	10.35	8.53
difference, %	5.8	5.5	4.2	3.3	2.8	2.3
RMSE	42.4	41.4	38.2	37.5	35.5	35.1
RMSE%	11.4	11.1	10.2	10.1	9.5	9.4
Std	37.2	36.8	35.4	36.1	34.6	34.7
Scots pine
difference, m³ha^− 1	-4.5	-3.3	-10.6	-2.0	-8.2	-8.8
difference, %	-2.9	-2.1	-6.8	-1.3	-5.2	-5.6
RMSE	96.6	101.8	90.1	94.9	93.2	93.0
RMSE%	61.4	64.7	57.3	60.3	59.3	59.1
Std	98.7	104.3	91.5	97.0	94.9	94.6
Norway spruce
difference, m³ha^− 1	36.4	36.4	37.0	33.1	30.1	28.7
difference, %	21.8	21.8	22.1	19.8	18.0	17.2
RMSE	84.2	86.1	84.8	81.6	82.4	82.7
RMSE%	50.4	51.5	50.7	48.8	49.3	49.5
Std	77.4	79.7	77.8	76.1	78.3	79.2
Broadleaves
difference, m³ha^− 1	8.3	5.4	8.5	3.7	6.6	6.4
difference, %	9.1	5.9	9.3	4.1	7.2	7.0
RMSE	37.1	38.1	34.4	35.3	35.6	35.5
RMSE%	40.7	41.9	37.8	38.8	39.1	39.0
Std	39.3	41.2	36.3	38.0	37.9	37.9

The differences by tree species were the most similar to the validation data for pine (1% – 7% difference) and broadleaves (4% − 9% difference) but the volume for spruce was still considerably underestimated (17% – 22%). The best performing methods varied depending on the considered characteristics. With the 5-NN_stand method, the total volume and volume for spruce were closest to that of the validation data. The 3-NN_stand method provided the best results for the difference for pine and broadleaves (Table 7). The smallest RMSE for spruce was also provided by 3-NN_stand while 3-NN_trees provided the smallest RMSE for pine and broadleaves (Table 7). For k = 1, there were no best results but instead the 1-NN_trees and 1-NN_stand methods were the worst for RMSE for total, pine, and broadleaves volumes (Table 7).

The order in the magnitude of prediction difference between the prediction methods was partly the same over time as for the initial state of stands (see Table 4). Yet, some changes could be found, e.g., the 1-NN_stand and 3-NN_stand methods performed better after the 30-year simulation compared with the initial state.

This study evaluated whether the MS-NFI pixel-level data can be used in generating the tree characteristics of individual stands and compared alternative prediction methods for diameter distribution. In general, the MS-NFI based methods (k-NN_trees, k-NN_stand) performed well for the total stand volume, and the average difference between the predicted and validation volumes ranged from 8–14%. The RMSE% using the satellite-based MS-NFI data varied between 25% – 30%, which was much smaller than in previous studies using Landsat imagery. Indeed, e.g., Mäkelä and Pekkarinen (2004) reported considerably larger 48% RMSE in total volume at stand-level using Landsat imagery. However, in Mäkelä and Pekkarinen (2004) the standwise results were predicted directly whereas in our study the standwise results were computed from predictions for pixels in the stand. Other previous studies have also reported RMSE% from 42–50% for total volume using satellite-based data for stand-level volume (Hyyppä et al. 2000; Hyvönen 2002; Muukkonen and Heiskanen 2005).

Satellite-based predictions have usually resulted in larger RMSE% than those based on ALS. In the preliminary study, the plot-level (32m × 32m) accuracy in stand characteristics were validated for DG, G and N with varying k (1 to 5) in k-NN method. At their best RMSEs were 5.1 cm (26%), 7.4 m²ha^− 1 (32%) and 1304 ha^− 1 (100%), respectively. In the previous study using the same field data, Tomppo et al. (2017) reported the respective best RMSEs of 2.0 cm (13%), 4.2 m²ha^− 1 (18%), and 802 ha^− 1 (36%) using fixed k = 5 and ALS instead of satellite imagery. Thus, the reliability in stand characteristics was considerably better, especially for N, when more accurate ALS was used instead of satellite-based data.

The Kolmogorov-Smirnov goodness-of-fit test at a 10% risk level rejected as much as 24% – 48% of the predicted dbh distributions for the 32 m × 32 m ground truth plots. The main reason for this rejection was the overestimated proportion of small trees (see Fig. 2), which, on the other hand, had a minor effect on the stand volume. Indeed, the 3-NN_trees was rejected most often, but simultaneously it was one of the best methods when validating the volume characteristics (lowest RMSE for pine and broadleaves). The reason for the difficulty in predicting the smaller trees may be due to the fact that this feature could not be detected, e.g., in spectral channel radiances. On the other hand, no published results exist on the accuracy of the Trestima approach to identify the smallest trees of a stand. Vastaranta et al. (2015) studied the Trestima approach for a sample plot measurement (basal area, mean diameter, mean height), but unfortunately not the accuracy in stem number. According to Siipilehto et al. (2016) and Ruusunen (2020), the bias in the stem number was between 2% and 5%, while the RMSE% was between 32% and 34% using the Trestima approach. According to Dunaeva (2017), stem number estimate by the Trestima approach was much more accurate than the estimates by forest experts in a preharvest field inventory. Also, the species composition was accurately estimated by Trestima compared to the harvester-based validation data (Dunaeva 2017).

In addition to the number and size of trees, information on tree species composition is a key parameter describing the stand structure for a wide variety of applications in forest management and conservation. The classification of stands by tree species composition is a complex task using satellite- or ALS-based data (Holopainen et al. 2008; Hovi et al. 2017), and the results of the present study were considered satisfactory. In this study, spruce was the most abundant tree species, pine was slightly less frequent, and broadleaves clearly a minority species. The main tree species was satisfactorily predicted, and the species proportions were generally quite well estimated, especially for broadleaves. Relatively good results for tree species dominance using Landsat TM and Sentinel-2 with NFI plots have been reported by Tomppo et al. (2009) and Breidenbach et al. (2020), especially for conifer species.

The validation data in the present study was small (27 stands) and the differences in the total and species-specific volume estimates were larger than in previous studies based on ALS (e.g., Maltamo et al. 2009, Tuominen et al. 2014). Indeed, the average differences in total volume were 13% for Scots pine and 35% for Norway spruce. Nevertheless, the stem volume for broadleaves in this study was at its best almost unbiased (0.1% – 1%) and otherwise at the same level (7% – 16%) to that by Packalén and Maltamo (2007), namely a bias of 11%. It is presumable that the smaller number of stands and different methods (satellite imagery vs. ALS) are the main reasons for the relatively high bias in this study. In addition, the stands of the present study were selected from stands with an irregular stand structure, in which an estimation with remote sensing methods usually leads to large RMSEs.

Maltamo et al. (2009) and Tomppo et al. (2017) reported increased accuracy in stand characteristics using fixed area instead of relascope plots as training data. Note that only the latest fixed area NFI plots were applied in this study. Also, one reason for the relatively similar results with the prediction methods was the update of the data set from 2015 to inventory year 2020. Even if the update period was considerably short, updating has an averaging feature. For example, Fig. 6 shows how much the first 10-year period decreased the initial differences.

When Holopainen et al. (2010) studied uncertainty in timber assortment estimates predicted from forest inventory data, the main source of error was forest inventory, either stand-wise field inventory or ALS based inventory, while the effects of generated stem distribution errors were minor. In Finland, when the stand-wise management-oriented field inventory was changed to the ALS based inventory, the mean stand characteristics were also changed from basal area-median tree dimensions (DGM, HGM) to weighted means (DG, HG). Recent results showed that DG is more stable in parameter recovery than DGM (Lee et al. 2021). Thus, the parameter recovery method makes the distribution errors negligible.

In the k-NN methods, the increasing k decreased the level of error in most of the evaluated variables. On the other hand, each value of k (1, 3, 4, 5) was best for at least one validated feature. However, after 30-year simulation k = 1 never provided the best result but instead, provided most frequently the worst volume estimates. In Holopainen et al. (2009), the best results were generally given by k = 4, but also the applied k = 3 or k = 5 provided the best results for some characteristics. According to review by Chirichi et al. (2016), the most frequently applied values of k were 1, 5, and 10. Maltamo et al. (2009) and Tuominen et al. (2014) ended up using k = 6. Nevertheless, the increasing k also considerably increased the required computational capacity and simulation time with the k-NN_tree methods. The present study found it too time consuming to simulate more than three times (k > 3) the number of trees (N) when generating a dbh distribution: when k was 3, the maximum number of 21,360 trees were simulated for a stand of 6.35 ha for the 3-NN_tree method. Because the results with the k-NN_tree and k-NN_stand methods were close to each other, the practical solution was to use only the k-NN_stand method when k was greater than 3 and generate only N/k trees from each k distribution. By doing so, the number of trees in the simulation was restricted to the number of trees per hectare multiplied with the area of stand compartment. In the case of the present study, the maximum number of simulated trees using the k-NN_stand method was 6,780 trees. Finally, each k species specific stand characteristic was used for the parameter recovery and thus this successfully mimicked the realization of measured trees. Previously, Maltamo and Kangas (1998) found clearly better results with the k-NN empirical distribution (i.e., k-NN_trees) compared with the prediction of Weibull function. However, the applied prediction model for the Weibull distribution (Maltamo 1997) was not as flexible as the parameter recovery method used in this study (see Siipilehto 2022).

The stand development was simulated over time and the effects of biases was compared in the initial stand structure by the different prediction methods. The 30-year simulation showed that differences between the prediction methods occur as long as the initial state has some influence on stand development, though selecting the prediction method for the initial stand became less significant over time. Even though the rank between the methods did not change much over time, the differences between the methods decreased. Similar results have been reported by Siipilehto (1999), Kangas and Maltamo (2003) and Mäkinen et al. (2010), with the reason most likely due to the feature that individual tree models predict higher diameter growth and less mortality for stands in which stocking level were initially underestimated and vice versa. If the initial state was accurate, the differences could even increase during simulation. Accordingly, Kangas and Maltamo (2003) noticed that a calibrated, more accurate initial state did not improve the accuracy of the predicted future volumes after simulation. The simulations of this study were carried out without intermediate disturbances, e.g., thinnings or damage, which would further reduce the effect of the predicted initial state on stand development. Thus, the selection of the prediction method can be made according to convenience to handle the data.

The MS-NFI pixel-level data proved feasible for generating tree characteristics of individual stands, although the difference in the smallest diameter classes was large. The advantage of using satellite-based data instead of ALS is the up-to-date availability. Moreover, alternative methods to produce a set of trees for describing the stands resulted in no major differences in the stand volumes and they predicted the main tree species of the stands accurately. However, the volume estimates by species were less accurate, which calls for further development in distinction between tree species in the MS-NFI data. With the higher k values method in which the trees were directly picked from the measured trees was computationally more demanding than generating the trees from a dbh distribution. As the effects of the initial stand characteristics decreased over time, the choice of the method when generating an input data for a long-term simulation appeared to be less influential.

Declaration of openness of research materials, data, and code

The data used and code are available on request from the corresponding author.

Authors’ contributions

Data acquisition and processing: Jouni Siipilehto, Matti Katila, Helena Henttonen, Kai Mäkisara; project administration: Harri Mäkinen. All authors contributed to the scientific writing and the revisions of the manuscript and have agreed to the published version of the manuscript.

Acknowledgements, funding and availability of research materials and data

We are grateful to the field staff of Luke, in particular to Jukka Lehtimäki. This study was financially supported by the Academy of Finland (grant no 315495).

Competing interest

The authors declare that they have no known competing financial interests or personal

relationships that could have appeared to influence the work reported in this paper

Ahtikoski A, Siipilehto J, Salminen H, Lehtonen M, Hynynen J (2018) Effect of stand structure and number of sample trees on optimal management for Scots pine: a model-based study. Forests 9(12), 750, 15 p. https://doi.org /doi:10.3390/f9120750.
Bailey RL, Dell TR (1973) Quantifying diameter distributions with the Weibull function. For Sci 19: 97–104.
Barrett F, McRoberts RE, Tomppo E, Cienciala E, Waser LT (2016) A questionnaire-based review of the operational use of remotely sensed data by national forest inventories. Remote Sens Environ 174: 279–289. https://doi.org/10.1016/j.rse.2015.08.029.
Breidenbach J, Waser LT, Debella-Gilo M, Schumacher J, Rahlf J, Hauglin M, Puliti S, Astrup R (2020) National mapping and estimation of forest area by dominant tree species using Sentinel-2 data. Can J For Res 51: 365–379. http://dx.doi.org/10.1139/cjfr-2020-0170.
Cajander AK (1949) Forest types and their significance. Acta For Fenn 56: 1–69.
Chirici G, Mura M, McInerney D, Py N, Tomppo E, Waser LT, Travaglini D, McRoberts RE (2016) A meta-analysis and review of the literature on the k-nearest neighbors technique for forestry applications that use remotely sensed data.” Remote Sens Environ 176: 282–94. https://doi.org/10.1016/j.rse.2016.02.001.
Dunaeva T (2017) Preharvest efficiency of Trestima, airborne laser scanning and forest management plan data validated by actual harvesting results and forest engineer preharvest estimation. Yrkehögskolan NOVIA, Raseborg. Bachelor’s thesis. 98 p.
Fix E, Hodges JL (1951) Discriminatory analysis: nonparametric discrimination: consistency properties. USAF School of Aviation Medicine.
Geman, S, Bienenstock E, Doursat R (1992) Neural networks and the bias/variance dilemma. Neural Comput 4: 1–58. https://doi.org/10.1162/neco.1992.4.1.1.
Hardy GH, Littlewood JE, Pólya G (1988) Inequalities. Third edition. Cambridge University Press. 340 p. ISBN 0-521-35880-9.
Holopainen M, Haapanen R, Tuominen S, Viitala R (2008) Performance of airborne laser scanning- and aerial photograph-based statistical and textural features in forest variable estimation. SilviLaser 2008, 17–19 September 2008. Edinburgh, UK.
Holopainen M, Tuominen S, Karjalainen M, Hyyppä J, Vastaranta M, Hyyppä H (2009) Accuracy of high-resolution radar images in the estimation of plot-level forest variables. In: Sester M, Bernard L, Paelke V (eds) Advances in GIScience. Proceedings of the 12^th AGILE Conference. Springer, pp 67–82.
Holopainen M, Vastaranta M, Rasinmäki J, Kalliovirta J, Mäkinen A, Haapanen R, Melkas T, Yu X, Hyyppä J (2010) Uncertainty in timber assortment estimates predicted from forest inventory data. Eur J For Res 129: 1131–1142. https://doi.org/10.1007/s10342-010-0401-4
Holopainen M, Vastaranta M, Hyyppä, J (2014) Outlook for the next generation’s precision forestry in Finland. Forests 5: 1682–1694. https://doi.org/10.3390/f5071682
Hovi A, Raitio P, Rautiainen M (2017) A spectral analysis of 25 boreal tree species. Silva Fenn 51, article id 7753. 16 p. https://doi.org/10.14214/sf.7753.
Hynynen J, Salminen H, Ahtikoski A, Huuskonen S, Ojansuu R, Siipilehto J, Lehtonen M, Eerikäinen K (2015) Long-term impacts of forest management on biomass supply and forest resource development: a scenario analysis for Finland. Eur J For Res 134: 415–431. https://doi.org/10.1007/s10342-014-0860-0
Kangas A, Maltamo M (2003) Calibrating predicted diameter distribution with additional information in growth and yield predictions. Can J For Res 33: 430–434. https://doi.org/10.1139/x02-121
Kilkki P, Päivinen R (1986) Weibull-function in the estimation of basal area dbh-distribution. Silva Fenn 20: 149–156. https://doi.org/10.14214/sf.a15449
Laasasenaho J (1982) Taper curve and volume functions for pine, spruce and birch. Commun Inst For Fenn 108: 1–74.
Lee D, Siipilehto J, Hynynen J (2021) Models for diameter distribution and tree height in hybrid aspen plantations in southern Finland. Silva Fenn 55(5) id 10612. https://doi.org/10.14214/sf.10612
Mäkinen A, Holopainen M, Kangas A, Rasinmäki J (2010) Propagating the errors of initial forest variables through stand- and tree-level growth simulators. Eur J For Res 129: 887–897. https://doi.org/10.1007/s10342-009-0288-0
Mäkisara K, Katila M, Peräsaari J (2019) The multi-source national forest inventory of Finland – methods and results 2015. Natural Resources and Bioeconomy Studies 8/2019. Natural Resources Institute Finland. 57 p. http://urn.fi/URN:ISBN:978-952-326-711-4.
Maltamo M (1997) Comparing basal area diameter distributions estimated by tree species and for the entire growing stock in a mixed stand. Silva Fenn 31: 53–65. https://doi.org/10.14214/sf.a8510
Maltamo M (1998) Basal area diameter distribution in estimating the quantity and structure of growing stock. Doctoral Dissertation, University of Joensuu, Faculty of Forestry.
Maltamo M, Kangas A (1998) Methods based on k-nearest neighbor regression in the prediction of basal area diameter distribution. Can J For Res 28: 1107–1115. https://doi.org/10.1139/x98-085
Maltamo M, Malinen J, Kangas A, Härkönen S, Pasanen A-M (2003). Most similar neighbor-based stand variable estimation for use in inventory by compartments in Finland. Forestry 76: 449–463. https://doi.org/10.1093/forestry/76.4.449
Maltamo M, Packalén P, Peuhkurinen J, Suvanto A, Pesonen A, Hyyppä J (2007) Experiences and possibilities of ALS based forest inventory in Finland. ISPRS Workshop on Laser Scanning 2007 and SilviLaser 2007, Espoo, 12-14 September 2007, Finland. pp 270–279.
Maltamo M, Packalén P, Suvanto A, Korhonen KT, Mehtätalo L, Hyvönen P (2009) Combining ALS and NFI training data for forest management planning: a case study in Kuortane, Western Finland. Eur J For Res 128: 305–317. https://doi.org/10.1007/s10342-009-0266-6
McRoberts RE, Cohen WB, Næsset E, Stehman SV, Tomppo EO (2010) Using remotely sensed data to construct and assess forest attribute maps and related spatial products. Scand J For Res 25: 340–367. https://doi.org/10.1080/02827581.2010.497496.
Mehtätalo L (2004) Predicting stand characteristics using limited measurements. Finnish Forest Research Institute, Research Papers 929. 39 p. http://urn.fi/URN:ISBN:951-40-1934-2
Moeur M, Stage AR (1995) Most similar neighbor: an improved sampling inference procedure for natural resource planning. For Sci 41: 337–59. https://doi.org/10.1093/forestscience/41.2.337.
Mäkelä H, Hirvelä H, Nuutinen T, Kärkkäinen L (2011) Estimating forest data for analyses of forest production and utilization possibilities at local-level by means of multi-source National Forest Inventory. For Ecol Manage 262: 1245–1359. https://doi.org/10.1016/j.foreco.2011.06.027
Næsset E, Gobakken T, Holmgren J, Hyyppä H, Hyyppä J, Maltamo M, Nilsson M, Olsson H, Persson Å, Söderman U (2004) Laser scanning of forest resources: the Nordic experience. Scand J For Res 19: 482–499. https://doi.org/10.1080/02827580410019553
Packalén P (2009) Using airborne laser scanning data and digital aerial photographs to estimate growing stock by tree spaecies. Dissertationes Forestales 77. 41 p. https://doi.org/10.14214/df.77
Packalén P, Maltamo M (2007) The k-MSN method for the prediction of species-specific stand attributes using airborne laser scanning and aerial photographs. Remote Sens Environ 109: 328–341. https://doi.org/10.1016/j.rse.2007.01.005
Packalén P, Maltamo M (2008) Estimation of species-specific diameter distributions using airborne laser scanning and aerial photographs. Can J For Res 38: 1750–1760. https://doi.org/10.1139/X08-037
Peuhkurinen J, Maltamo M, Malinen J (2008) Estimating species-specific diameter distributions and saw log recoveries of boreal forests from airborne laser scanning data and aerial photographs: a distribution-based approach. Silva Fenn 42: 625–641. https://doi.org/10.14214/sf.237
Peuhkurinen J (2011) Estimating tree size distributions and timber assortment recoveries for wood procurement planning using airborne laser scanning. Dissertationes Forestales 126. 43 p. https://doi.org/10.14214/df.126
Poudel KP (2011) Evaluation of methods to predict Weibull parameters for characterizing diameter distributions. Electronic Thesis & Dissertation Collection. Louisiana State University. Available at: http://etd.lsu.edu/docs/available/etd-06302011-133644/unrestricted/Poudel_Thesis.pdf.
Räty J (2020) Prediction of diameter distributions in boreal forests using remotely sensed data. Dissertation Forestales 294. 47 p. https://doi.org/10.14214/df.294
Rogers WH (1978) Some convergence properties of k-nearest neighbor estimates. Department of Statistics, Stanford University.
Rouvinen T (2014) Kuvia metsästä. [Photos from forest]. Metsätieteen aikakauskirja 2/2014: 119–122. https://doi.org/10.14214/ma.6899
Ruusunen P (2020) Trestiman puustotulkinnan tarkkuus tarkkaan mitatuilla puukarttakoealoilla. [Trestima’s accuracy in accurately measured tree map plots]. Häme University of Applied Sciences. 41 p. https://urn.fi/URN:NBN:fi:amk-202004165202
Salminen H, Lehtonen M, Hynynen J (2005) Reusing legacy FORTRAN in MOTTI growth and yield simulator. Comput Electron Agr 49: 105–113. https://doi.org/10.1016/j.compag.2005.02.005
Siipilehto J (1999) Improving the accuracy of predicted basal-area diameter distribution in advanced stands by determining stem number. Silva Fenn 33: 281–301. https://doi.org/10.14214/sf.650
Siipilehto J (2011a) Local prediction of stand structure using linear prediction theory in Scots pine-dominated stands in Finland. Silva Fenn 45: 669–692. https://doi.org/10.14214/sf.99
Siipilehto J (2011b) Methods and applications for improving parameter prediction models for stand structures in Finland. Dissertationes Forestales 124. https://doi.org/10.14214/df.124
Siipilehto, J. 2022. Runkolukusarjamallit ovat aikansa lapsia — onko eri-ikäisrakenteisiin metsiin soveltuvia parametrisia malleja tai menetelmiä? Metsätieteen aikakauskirja 2022-10703. Tiedonanto. https://doi.org/10.14214/ma.10703
Siipilehto J, Mehtätalo L (2013) Parameter recovery vs. parameter prediction for the Weibull distribution validated for Scots pine stands in Finland. Silva Fenn 47: 1–22. https://doi.org/10.14214/sf.1057
Siipilehto J, Kangas A (2015) Näslundin pituuskäyrä ja siihen perustuvia malleja läpimitta-pituus riippuvuudesta suomalaisissa talousmetsissä. [Näslund’s hight curve models for the dbh-height relationship in Finnish commercial forests]. Metsätieteen aikakauskirja 4/2015 215–236. https://doi.org/10.14214/ma.6584
Siipilehto J, Lindeman H, Vastaranta M, Yu X, Uusitalo J (2016) Reliability of the predicted stand structure for clear-cut stands using optional methods: airborne laser scanning-based methods, smartphone-based forest inventory application Trestima and pre-harvest measurement tool EMO. Silva Fenn50, article id 1568, 24 p. https://doi.org/10.14214/sf.1568
Siitonen M (1993) Experiences in the use of forest management planning models. Silva Fenn 27: 167–178. https://doi.org/10.14214/sf.a15670
Tokola T, Kangas A, Kalliovirta J, Mäkinen A, Rasinmäki J (2006) SIMO – SIMulointi ja Optimointi uuteen metsäsuunnitteluun. [SIMO – Simulation and optimisation to new forest planning]. Metsätieteen aikakauskirja 1/2006: 60–65. https://doi.org/10.14214/ma.5726
Tomppo E (1990) Satellite image-based National Forest Inventory of Finland. The Photogrammetric Journal of Finland 12: 115–20.
Tomppo E, Halme M (2004) Using coarse scale forest variables as ancillary information and weighting of variables in k-NN estimation: a genetic algorithm approach. Remote Sens Environ 92: 1–20. https://doi.org/10.1016/j.rse.2004.04.003
Tomppo E, Haakana M, Katila M, Peräsaari J (2008) Multi-source National Forest Inventory: methods and applications. Springer Science & Business Media.
Tomppo E, Gagliano C, De Natale F, Katila M, McRoberts RE (2009) Predicting categorical forest variables using an improved k-Nearest Neighbour estimator and Landsat imagery. Remote Sens Environ 113(3): 500–517. https://doi.org/10.1016/j.rse.2008.05.021
Tomppo E, Kuusinen N, Mäkisara K, Katila M, McRoberts RE (2017) Effect of field plot configuration on the uncertainties of ALS-assisted forest resource estimates. Scand J For Res 32: 488–500. https://doi.org/10.1080/02827581.2016.1259425
Tuominen S, Haapanen R (2013) Estimation of forest biomass by means of genetic algorithm-based optimization of airborne laser scanning and digital aerial photograph features. Silva Fenn 47, article id 902, 20 p. https://doi.org/10.14214/sf.902
Tuominen S, Pitkänen J, Balazs A, Korhonen KT, Hyvönen P, Muinonen E (2014) NFI plots as complementary reference data in forest inventory based on airborne laser scanning and aerial photography in Finland. Silva Fenn 48, article id 983, 20 p. https://doi.org/10.14214/sf.983
Vastaranta M, Gonzalez Latorre, E, Luoma V, Saarinen N, Holopainen M, Hyyppä J (2015) Evaluation of a smartphone app for forest sample plot measurements. Forests 6: 1179–1194. https://doi.org/10.3390/f6041179
Veltheim T (1987) Puumallit männylle, kuuselle ja koivulle. [Tree-level models for pine, spruce and birch]. University of Helsinki. 60 p. (In Finnish).
Wikström P (2001) Effect of decision variable definition and data aggregation on search process applied to a single-tree simulator. Can J For Res 31: 1057–1066.

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

Using multi-source national forest inventory data for predicting tree characteristics of individual stands

Status:

Version 1

Abstract

Figures

1 Introduction

2 Material And Methods

2.1 The plot- and stand-level validation data sets

2.2 The Multi-Source NFI data

2.3 The k-NN estimation method

2.4 The alternative prediction methods

2.5 Comparison of the methods

3 Results

3.1 Plot-level results with varying k

3.2 Ranking the methods according to the accuracy in the initial volume

3.3 Species proportion

3.4 Performance of the methods during 30-year simulation

4 Discussion

5 Conclusions

Declarations

References

Additional Declarations

Status:

Version 1