Plot-level wood-leaf separation of trees using terrestrial LiDAR data based on a segmentwise geometric feature classification method

12 Background: Detailed and accurate information about tree crown structures is crucial for 13 researching tree physiology. Terrestrial laser scanning (TLS) is a promising technique for retrieving 14 crown structure parameters but still faces bottlenecks in terms of data processing such as wood-leaf 15 separation. Currently, most wood-leaf separation methods use pointwise and supervised 16 classification strategies at the forest plot level, which is very time consuming for large-volume TLS 17 datasets. 18 Methods: In this study, we proposed a novel classification method to separate wood and leaf points 19 based on connected component segmentation and segmentwise classification by geometric features. 20 We tested the proposed method on both needleleaf and broadleaf forest plots and compared the 21 results to those from a widely used pointwise classification method, CANUPO. 22 Results: The results showed that the proposed segmentwise classification method is superior to 23 CANUPO in terms of both classification accuracy and time efficiency, which were improved by 1.5 24 times and over 10 times, respectively. Conclusions: This study indicated that the segmentwise classification strategy is applicable to dense TLS data and can enhance the accuracy and efficiency of wood-leaf separation, which can in turn 2 facilitate the retrieval of crown structure parameters.


Introduction
6 Tree architecture, e.g., the stem form, branching pattern, and spatial distribution of leaves, directly 7 influences tree photosynthesis and evapotranspiration and ultimately influences forest carbon and 8 water storage (Lau et al., 2018). Quantifying the variation in tree architecture across species is 9 important in understanding how tree architecture is related to the physiological function of trees 10 (Disney et al., 2018). Additionally, tree architectural characteristics can be used to explore the 11 allometric relationship among, for example, tree height, diameter at breast height (DBH), and 12 distribution of biomass within trees (Abd Rahman et al., 2017). However, it is difficult and time 13 consuming to accurately quantify tree architecture with manual measurement (Quammen, 2012). 14 Terrestrial laser scanning (TLS) is an efficient technology for collecting dense, highly detailed three-15 dimensional (3D) point clouds of trees. The point clouds can be used for the reconstruction of 16 quantitative 3D tree models and the quantitative analysis of tree architecture features, such as 17 branching structures (Pyörälä et al., 2018) and stem curves (Liang et al., 2013). In the data 18 processing procedures, an important prerequisite step in reconstructing the quantitative tree models 19 from TLS data is wood-leaf separation, since the wood structural reconstruction is based on the wood 20 points (Raumonen et al., 2013). Moreover, the leaf points can be used for the estimation of the 21 canopy gap fraction and the leaf area index, which are very important indicators in vegetation 22 science (Chen et al., 2018). However, separating the wood and leaf points of trees in TLS data is 23 challenging due to the complicated radiometric and geometric characteristics of the LiDAR points 1 from trees, the occlusions within canopy, the roughness of the wood surface, and the influence of site 2 conditions, such as the terrain slope, stem density and the presence of understory vegetation. Over 3 the past decade, various kinds of methods have been proposed to solve this problem, and it has 4 become a research focus in the recent years. In general, the types of features used in wood-leaf 5 separation include radiometric features, waveform features, and geometric features. 6 Radiometric features refer to the intensity and reflectivity of laser returns at specific wavelengths. 7 Côté et al. (2009) differentiated wood and leaf points before the reconstruction of branch structure 8 based on the intensity of laser returns. The threshold values of the point intensity to discriminate 9 wood and leaves were chosen manually. Then, manual classification was performed to further 10 remove unwanted parts from each component. However, the accuracy of the point classification was 11 not reported in Côté's study. It is known that the intensity of laser returns is related not only to the 12 spectral properties of the target but also to the incidence angle, the traveling distance of the laser 13 beam (Kukko et al., 2008;Pfeifer et al., 2007), and the roughness of the reflecting surface (Pesci and 14 Teza, 2008). Therefore, the radiometric calibration of point intensity is also a challenging problem. 15 In recent years, some advanced TLS equipment has been developed and used. Yao et al. (2011) and 16 Yang et al. (2013) used the relative widths of returned waveforms to separate wood and leaf points. 17 Danson et al. (2014) tested a dual-wavelength full-waveform TLS, and Li et al. (2013) explored the 18 feasibility of multiwavelength LiDAR for wood-leaf separation. However, there were no quantitative 19 assessments or comparisons of the classification accuracy in these studies. In addition, the use of 20 dual-wavelength and full-waveform information is limited by the availability of equipment. 21 Therefore, most recently developed methods depend on geometric features, such as eigenvalue-based 22 features and density-based features. Some methods have been proposed for individual trees, and 23 others have been developed for forest plots. For example, Tao et al. (2015) developed a geometric 24 method based on circle detection in point slices and the shortest-path algorithm and tested it using 1 two broad-leaved trees and a virtual tree produced by an L-system. Since the shortest-path algorithm 2 is used to extract the skeletal structures from the point cloud of a single tree, Tao's method is 3 applicable only to individual trees. Similarly, the shortest-path algorithm has also been used for 4 wood-leaf separation in the methods proposed by Xu et al. (2007) and Livny et al. (2010). By 5 contrast, Ferrara et al. (2018) proposed an approach using the density-based clustering algorithm 6 DBSCAN. Ferrara's method was also tested on individual trees; however, it is potentially applicable 7 at the plot level and thus requires plot-level tests. Raumonen et al. (2013) proposed a segment-based 8 approach for wood-leaf separation in individual trees. The point cloud was first segmented into small 9 nonoverlapping cover sets, and then the cover sets were classified as belonging to leaves and wood 10 components based on their geometric features, such as the tangent and normal vectors. However, the 11 size of the cover sets was obtained by the pointwise computation. Wang et al. (2017) compared four 12 machine learning classifiers for the wood-leaf separation of individual trees based on both geometric 13 and radiometric features. The results indicated that the machine learning classifiers can obtain high 14 accuracies for wood-leaf separation on individual trees. 15 For plot-level classification, methods based on machine learning algorithms are more frequently 16 used. Lalonde et al. (2006) introduced a method that characterizes point clouds in terms of three 17 salient features, namely, "scatter", "linear" and "surface". The features of the target objects are 18 learned by fitting a Gaussian mixed model to manually labeled training data. Bayesian classification 19 is then used to label the entire point cloud. Ma et al. (2016) improved Lalonde's method by adding 20 two additional filters based on geometric information, and the overall classification accuracy 21 reportedly improved from 84.28% to 97.8%. Brodu and Lague (2012) extended the local geometric 22 features to multiple scales. The classifiers, which include the least absolute deviation and support 23 vector machine (SVM), were used for sample and classification training. Zhu et al. (2018) combined 24 various local geometric features and radiometric features to separate foliar and woody materials by 25 using a random forest classifier. The reported overall classification accuracies in their study ranged 1 from 80% to 90%. In addition, the deep learning method has also been introduced for the wood-leaf 2 separation problem (Xi et al., 2018). Table 1  As seen from the existing algorithms listed above, most algorithms have adopted local geometric 2 features for the wood-leaf separation of TLS data. This occurs mainly for two reasons. 1) The spatial 3 distribution characteristics of wood points and leaf points are very different. The distribution of 4 wood points is regular and continuous, and the density is relatively large. The leaf points are 5 scattered due to canopy gaps, and the overall point cloud density is relatively smaller. 2) The 3D 6 coordinates are the fundamental information provided in LiDAR point clouds. Algorithms based on 7 geometric features have broader application scenarios and work well with the data provided by 8 LiDAR equipment. Although some methods based on geometric features and machine learning have 9 achieved good wood-leaf separation accuracy, the main disadvantages of these methods are their 10 high computing demand and slow computing speed (Liang et al., 2012) in plot-level classification 11 tasks. There are three main reasons for these disadvantages: 1) the geometric features are calculated 12 point by point, which is time consuming due to the large volume of TLS datasets; 2) the geometric 13 features vary with the radius of the neighborhood, so the latest algorithms usually extract two-scale 14 or even multiscale geometric features by using different neighborhood sizes to achieve better 15 classification results (Brodu and Lague, 2012); (Xia et al., 2015), which greatly increases the 16 computational burden; and 3) a supervised classification method contributes to the improvement of 17 classification accuracy but requires high-quality training samples (Wang et al., 2017). 18 Here, we propose a segmentwise classification method for the rapid wood-leaf separation of standing 19 trees from TLS data at the plot level. Compared with the pointwise method, the segment-based 20 method first divides a point cloud into segments and then classifies the segments based on the 21 features of each segment. This strategy can reduce the computational burden and the uncertainties of 1 point cloud classification. 2 terrain. The CST plot is 30 m × 30 m and includes 37 trees on flat terrain. Figure 1 shows the 12 topography and the forest condition of each test plot. The mean DBH and tree height, stem density 13 and the density of ground vegetation of each sample plot are summarized in Table 2. 14

TLS data 4
The TLS datasets were collected in July 2018 by using a Riegl VZ-1000 (Riegl GmbH, Horn, 5 Austria) terrestrial laser scanner in multiscan mode. The scan angle resolution is 0.03°, and the 6 vertical and horizontal scanning ranges are 30°-130° and 0°-360°, respectively. In each sample plot, 7 five to seven scans were implemented at the center and the periphery of the plots. The scans were 8 registered using the RiSCAN Pro software package (Riegl GmbH, Horn, Austria). The registered 9 point clouds were clipped to exclude trees outside each rectangular plot ( Chinese scholar tree plot. 3 The proposed method adopted a segmentwise classification strategy by using a connected component 7 segmentation algorithm. The wood and leaf points were separated based on the geometric features of 8 each segment formed by connected points. It is not possible to segment the point cloud into small 9 segments that consist of a single classification target, i.e., the wood and leaf point. This is mainly 10 because of the very high density of the TLS point clouds and the tight connections between wood 11 and leaf points. Therefore, we introduced a point cloud decomposing step to divide the point cloud 12 into three parts in order to reduce the point density and break the connections between wood and leaf 13 points. Figure 3 shows an overall workflow of the proposed method. 14 1 Figure 3. The overall workflow of the proposed method. The left column shows the main steps of the 2 method, and the right column shows the annotation of each main step. 3

Methods
The following subsections detailed methods used in the main steps shown in the flowchart. 4

Ground filtering 5
Ground filtering was a prerequisite step in the proposed method for reducing the data volume and 6 breaking the ground connection between stems. In this study, we adopted an open source ground 7 filtering algorithm, Cloth Simulation Filtering (CSF), for its fast processing speed and reliable 8 results. 9 Breaking the connections between stems on the ground.
T 1 , T 2 : two thresholds of local curvature.

Wood Points
Leaf Points 1 simulated cloth over an inverted point cloud, the simulated cloth sticks to the ground points and 2 drapes over the object points due to a certain degree of cloth hardness. The simulated cloth is a set of 3 regular particles with mass. The interconnection between the particles is made of "virtual springs". In 4 each iteration, the particles drop down under the force of gravity and then spring back under the 5 spring force. Finally, the simulated cloth settles to form a digital terrain model of the point cloud. 6 We implemented the CSF with the following parameters. The cloth resolution and distance threshold 7 (classification threshold) of the CSF were both set to 0.1 m, and the maximum iterations were set to 8 50 for all sample datasets. The off-ground point clouds from the CSF were used as the inputs in the 9 next step. 10

Point cloud decomposing 11
The method of point cloud decomposing was based on the local curvature feature of each point. 12 Local curvature features can be used to roughly separate tree components (Zhang et  where are the eigenvalues of a covariance matrix C sorted in descending order as 1 . The covariance matrix C is defined as 2 (2)  3 where is the centroid of P. 4 The SV has a limited range from 0 to 1/3 for any point. To divide the point cloud into three parts, two 5 thresholds (T1, T2) of the SV were chosen according to experiments in previous studies (Zhang et al.,6 2019), i.e., T1 was set to 0.1, and T2 was set to 0.2. The same set of thresholds was adopted for all 7 sample plots. For example, Figure 4 shows the three parts of the divided point cloud of an individual 8 tree. The first part, in which the SV ranged from 0 to 0.1, comprised almost all the stem points, most 9 of the branch points and nearly half of the leaf points. The second part, in which the SV ranged from 10 0.1 to 0.2, was mainly composed of leaf points and a fraction of branch points. The third part, in 11 which the SV ranged from 0.2 to 1/3, was composed of leaf points. The first and second parts that 12 contained wood points were used in the next step for segmentation. based on the proximity of points. As shown in Figure 5, the point clouds were first voxelized using 4 3D grids by building an octree. Then, the 3D grids that contained at least one point were assigned a 5 value of 1. The vacant 3D grids were assigned a value of 0. Finally, the points in adjacent grids with 6 a value of 1 were merged into the same segment. The vacant grids with a value of 0 became the gaps 7 between the segments. The size of the 3D grids determined how many segments were produced and 8 how small the segments were. Smaller 3D grids tend to produce more and smaller segments, which 9 are more likely to contain a single class of tree components; however, these segments will increase 10 the computational burden. Therefore, we set the grid size to a compromise value (0.01 m) in this 11 study. In addition, we set a lower limit on the number of points within the segments at 100 to reduce 12 the number of segments and filter out noise points.

Geometric features 17
We introduced two geometric features to identify the wood segments, i.e., the salient features and the 18 distance between the centroid of each segment and the ground. The salient features were the main 19 features used for the wood-leaf separation, and the distance feature was used to remove the ground 20 vegetation. 21

Salient features 1
The salient features describe the dominant geometry and distribution pattern of a set of points as 2 linear, planar and scattered. The salient features have been used in most wood-leaf separation 3 algorithms for TLS data. However, their calculation in existing algorithms is pointwise and based on 4 neighboring points. This has caused problems with the large amount of computation required and 5 inconsistency in the salient features of a point when using different neighborhood sizes. Therefore, 6 we applied a segmentwise and neighborhood-based computing method to the point cloud segments 7 from the previous step. This method can greatly reduce the computation load, and the salient features 8 of the points within a segment will be consistent and stable. For the points in each 9 segment, we first calculated their covariance matrix C by using Formula 2. Then, the eigenvalues 10 of C were obtained by singular value decomposition and sorted in descending order as 11 . Based on the eigenvalues, the salient features of a segment can be defined as 12 (Demantke et al., 2011) 13 ; ; (3) 14 where L, P, and S refer to the linear feature, planar feature and scattered feature, respectively, and 15 . 16 Figure 6 shows an example of connected component segmentation and the salient features of the 17 segments. It can be found that the stem segment and branch segment were all dominated by the linear 18 feature, which has significantly higher values than the other two features. In contrast, the leaf 19 segment has no dominant feature. Therefore, the segments dominated by the linear feature can be 20 labeled as wood segments. In this study, we introduced an index, i.e., the Significance of Difference 21 where L, P, and S refer to the Linear feature, Planar feature and Scattered feature, respectively. 2 The SoD(L) is an index ranging from -1 to 1. The larger the value of the SoD(L), the more likely it is 3 that a segment is dominated by a linear feature. In this study, segments with a SoD(L) greater than 4 0.7 were considered wood segments. The setting of 0.7 was a conservative threshold for retaining the 5 wood segments as much as possible. Ground vegetation, such as grass and small shrubs, is hard to remove from a data set based on salient 1 features. This is because some ground vegetation also has stems that may be wrongly recognized as 2 tree branches. Although ground vegetation can be entirely removed by setting a threshold of distance 3 to the DTM, the true stem points may also be removed in this procedure. For pointwise wood-leaf 4 separation algorithms, it is difficult to solve this problem. However, for the segmentwise method, we 5 can easily recognize segments as ground vegetation based on the centroid of the segments. The 6 centroid of the segments of trees would have a larger distance to the ground surface. For example, 7 points near the ground surface would be classified as wood points if they belong to a segment that 8 has a higher centroid. The distance to the ground surface is an input threshold of the proposed 9 method and was set to 1 m in this study. The centroid of the segments can be calculated by using the 10 formulas shown in section 2.3.3. 11

Merging the wood segments 12
Following the procedures shown in Figure 2, we identified the wood segments by their geometric 13 features. The input parameters have been introduced in the sections above. As mentioned in section 14 3.1.3, the point cloud was decomposed into three parts, and the third part was composed of leaf 15 points, which were directly removed. Therefore, we applied the proposed wood-leaf separation 16 algorithm separately to the first and second parts of the point cloud. After connected component 17 segmentation, the segments identified in the first and second parts were in separate segment groups, 18 i.e., segments 1 and segments 2. Then, we merged the extracted wood segments from the two groups 19 of segments. 20

Classification accuracy and processing time 22
The accuracy of the wood-leaf separation method was assessed by the percentage of error-classifying 23 points against the referenced classification results. The references were acquired by visual inspection 24 for each tree in each plot. Three types of error (type I error, type II error and total error) and the 1 kappa coefficient were quantified. The type I error (T. I), which is also known as the omission error, 2 was defined as the number of wood points that were wrongly classified as leaf points divided by the 3 number of referenced wood points. The type II error (T. II), which is also known as commission 4 error, was the number of leaf points that were wrongly classified as wood points divided by the 5 number of referenced leaf points. The total error (T.E) was the number of wrongly classified points 6 divided by the total number of points. The kappa coefficient was calculated from the statistics of the 7 wrongly classified points. The calculation of the accuracy indexes is shown in Table 3. 8 Note: a to d refers to the number of points. 10 In addition, the processing times were recorded to assess the time efficiency of the wood-leaf 11 separation algorithms. All tests were conducted on a typical desktop computer with an Intel Core i7-12 6700 CPU (3.40 GHz) and 16 GB of RAM. 13

Method comparison 14
To compare the difference in performance between the proposed segment-based method and the 15 point-based method, we processed the sample datasets using a point-based method proposed by 16 Brodu and Lague (Brodu and Lague, 2012), CANUPO, for the classification of natural scene point 17 clouds. The reason for adopting Brodu's method for comparison is that 1) it is a point-based method 18 using only geometric features that has reported good classification accuracy (98%) in the literature; 1 2) it adopts multiscale geometric features to improve the results, which were more robust in theory 2 than those from algorithms using single-scale and double-scale geometric features; 3) it is based on 3 machine learning classification algorithms and thus is representative of many supervised 4 classification methods; and 4) it has free and open source codes that are easy to access. Therefore, 5 CANUPO was a representative comparison algorithm for the performance evaluation of the method 6 proposed in this study. 7  Table 4 shows the comparison of the accuracy of the two methods in the three sample plots. The 10 classification accuracy of the proposed segmentwise method was higher than that of the pointwise 11 method in all sample plots. The total errors of the proposed method in the three sample plots were 12 5.39%, 4.74%, and 5.93%, respectively, while they were 8.89%, 9.32%, and 10.05% in the pointwise 13 method. On average, the errors of the pointwise method were 1.76 times those of the proposed 14 method. The kappa coefficients showed a similar trend to that in the total errors. In other words, the 15 proposed segmentwise method is superior to the CANUPO pointwise method based on our analysis. 16 T.I, T.II and T.E stand for the type I error, type II error and total error, respectively. 3 Figure 7 shows the results of the two methods in the three different plots. It is easy to see that there 4 were many wrongly classified leaves points in the results of the pointwise method. Fewer leaf points 5 were wrongly classified as wood points in the results of the proposed method compared with the 6 CANUPO results. There were some differences between the two methods in the spatial distribution 7 of the classification errors. The residual leaf points were distributed mainly around the branches in 8 the results of the proposed method; however, the residual leaf points were uniformly distributed in 9 CANUPO results. The spatial distribution of the omission and commission is of utmost importance 10 from the standpoint of using the results of this study to extrapolate models of plant physiology. If 11 you were to model the light transmission of these canopies using the segmentwise method, and 12 extrapolating that to plant photosynthetic resource allocation, it would provide a much different 13 result than by using the pointwise method. However, the main topic of this study is how to separate 14 the wood and leaf components of trees, so we did not take the subsequent application into 15 consideration. Actually, the automatic methods for wood-leaf separation cannot be perfect so far, so 16 the further processing such as visual inspection is needed if the leaf distribution should be modeled 17 accurately. 18

Results and Discussion
What should be note is that ground points have been removed in the ground filtering step, so the bule 1 points under canopies showed in Figure 7 stand for the points on ground vegetation, such as glass 2 and shrub. It is clear that the CANUPO method cannot separate the ground vegetation points from 3 wood points. In contrast, the proposed method can filter out the ground vegetation points according 4 to the distance between the centroid of the segment and the ground, so there were seldom residual 5 ground vegetation points in the wood class. In conclusion, the proposed method achieved better 6 classification results in this study. 7 CANUPO method showed around 2% error in literature (Brodu and Lague, 2012). However, it 8 reached 10% error and even 20% error in our tests. It is mainly because the tested datasets in Brodu 9 and Lague's study were collect in a small non-forest natural scene which includes rocks and low 10 vegetation. The separability between rocks and vegetation points is better than wood and leaf points. 11 In this study, we compared our method against CANUPO method rather than other methods which 12 also reported good classification accuracies, such as the methods proposed by Lalonde et al. (2006) 13 and Ma et al. (2016). It is because the classification accuracies of wood-leaf separation are highly 14 dependent on the characteristics of test data, and we cannot directly evaluate the methods only by 15 literature reports. However, we believe that the CANUPO method is more representative in principle 16 as it used multiscale geometric features and machine learning classifiers. Also, the CANUPO method 17 is easier to access because of its open source codes. That is why a comprehensive comparison 18 between different wood-leaf separation methods based on benchmark datasets is imperative, which 19 should be done by cooperation. 20 1 Fig 7. Visualization results of the classification from the two methods. WB, DL and CST represent 2 the plots white birch, Dahurian larch and the Chinese scholar tree. 3

Time efficiency 4
In this study, both methods were run on the same desktop computer. The processing time of the two 5 methods is shown in Table 5. This time does not include the time spent on data reading and writing 6 or parameter tuning or the time for sample selection and training in the pointwise method. Table 5  7 shows a substantial difference in processing time between the two methods. The proposed method 1 required only 2.57 to 14.41 minutes, while the pointwise method required 28.67 to 235.67 minutes. 2 The time efficiency of the proposed method was approximately 11 times faster than that of the 3 pointwise method in the CST plot, 16 times faster in the DL plot, and 41 times faster in the WB plot. 4 These results fully demonstrated the significant advantage of the proposed method in terms of time 5 efficiency. 6

Conclusion
8 Wood-leaf separation is an important data processing step in the application of TLS to forest scenes. 9 It is required for extracting tree crown parameter data and is also a challenging hot topic that has 10 attracted the attention of many researchers. Because many existing methods require a large amount 11 of calculation, we proposed a novel segmentwise classification method for wood-leaf separation that 12 is fast, accurate and simple. The proposed method requires less than 15 minutes to achieve good 13 classification accuracy. Compared with the CANUPO pointwise classification method, in this study, 14 the classification accuracy was improved by more than 1.5 times, and the processing time was 15 decreased by 11 to 41 times. In addition, the proposed method was robust on sample plots of 16 different species and forest conditions. This study verified the feasibility of the segmentwise 17 classification strategy for the classification of dense terrestrial laser scanning point clouds. 18 Availability of data and materials 6 Not applicable. 7