Colour sorting of red oak, yellow poplar and maple veneers using self-organizing map: comparisons between different camera genres

Colour sorting is a vital process in manufacturing of high-quality wood products. It is however a manual process in a large majority of manufacturing facilities in Malaysia. Automation is an ideal solution; however, costs are prohibitive for small and medium industries (SMI). This project aims to produce a flexible solution that can cater for manufacturers of different scales. Three cameras of different price ranges were used: (i) Hikrobot® MV-CE200-10UC (CE200), (ii) Logitech® C920 HD Pro (C920), and (iii) Sony® RX0 II (RX0 II). After setting up a veneer imaging prototype, human sorted images of American red oak (Quercus rubra), yellow poplar (Liriodendron tulipifera), and maple (Acer spp.) were acquired. After performing image preparations and calibrations, 26 features were extracted from each image. The features were based on the average and standard deviation of the wood basal colour and wood grain colour. Salient features were obtained using Sequential Forward Selection (SFS), which were then used to train a Self-Organizing Map (SOM). The results affirmed that the colour of the basal colour is highly correlated with human sorted colour groups. As expected, CE200 performed the best being of industrial grade. Interestingly, C920 exhibited comparable performance to CE200. RX0 II performed the worst due to its interface software limitations. This proposed system achieved accuracies of 89.0% for red oak, 94.3% for poplar and 96.4% for maple. This research will assist the SMI to develop affordable vision systems for colour sorting.


Introduction
Wood is a natural material. Therefore, manmade wood products inherit all the various natural variations present in the timber they are made from, including variations in colour. Colour sorting of wood is one of the most labour-intensive operations in most modern woodworking mills in Malaysia.
Moreover, maintaining the consistency of the sorting output is challenging due to human factors such as the varying judgements of colour similarities between different workers, fatigue, and the ever-revolving labour workforce.
There is a recent shift towards using computerized vision systems to replace manual colour sorting. Existing industrial equipment in the marketplace are prohibitively expensive, therefore there is a need for solutions that are more affordable especially for the small and medium woodworking industries that may not be able to justify purchase of solutions from established players such as WoodEye (Microtec Linköping, Sweden), Wood Inspector (Poland) and Weinig AG (Germany).
The two most critical elements in any vision system are the sensor hardware (camera system) and the software. The camera system (sensor and lighting) determines the quality and consistency of the image obtained, while the software does all the necessary image enhancement, feature extraction and classification of the images.
In previous research studies, area scan cameras were by far the most common (Bianconi et al. 2013;Kurdthongmee 2008;Nurthohari et al. 2019) as they are inherently easier to setup. However, obtaining evenly lit images requires the use of dome lights (Bianconi et al. 2013). Alternatively, images had to be cropped to where the light intensity is the most visually even. Line scan setups Zhuang et al. 2021) are far superior in terms of obtaining even illumination, with the added advantage of being able to acquire long lengths of timber in a single frame without needing the additional effort to stitch the images together. Line scan setups also are the most common setups in industrial applications due to their high-speed capabilities.
The species in focus of past research correlates very closely to wood frequently used in woodworking mills. Images taken of wood in most scientific literature are typically that of solid timbers. In colour studies, oak (Quercus spp.) appeared to be the most popular species, which correlates with its popularity as the most popular hardwood species in the Western hemisphere, particularly Europe (Krackler et al. 2011) and the United States (Barbu and Tudor 2021;Tan and Ng 2019). Several others focus on high value species such as teak (Tectona grandis) and cherry (Prunus spp.), and European beech which is a popular species in Western Europe (Krackler et al. 2011).
Earlier studies have used typical image processing and classification methods. In general, the features from the image can be separated into colour features and texture features. Among the different techniques that utilize colour features are (i) the use of raw colour histograms (Kurdthongmee 2008), (ii) computation of histograms such as soft colour descriptors, colour percentiles, marginal histograms and 3D colour histograms (Bianconi et al. 2013), and (iii) homogeneity, slope and entropy (Shivashankar and Madhuri 2018). For those employing texture features, Histogram of Oriented Gradient (HOG) (Nurthohari et al. 2019) and colour moments  were used. In terms of classifiers, clustering methods such as K-Nearest Neighbour (kNN) (Bianconi et al. 2013;Shivashankar and Madhuri 2018;Wang et al. 2021) and Self-Organizing Map (SOM) (Kurdthongmee 2008) are popular. Other common classifiers such as Fuzzy Logic (Faria et al. 2008) and Support Vector Machine (SVM) (Nurthohari et al. 2019) have also been used.
Moreover, from anecdotal exchanges of ideas with woodworking personnel, we strongly believe that the basal wood colour plays a large role in the overall colour of the wood, while the grains (amount of it, and how dark it is) play only a secondary role. Two pieces of wood with similar basal colour but different amounts of grain are deemed more similar than two pieces with similar grains but different basal colour.
Based on our review of past research, no attempts have been done to compare the ability of different cameras in performing colour classification of wood, and none have been performed on raw veneers. The aim of this paper is to compare the colour sorting performance between different camera genres, namely an industrial camera, a prosumer action camera, and a webcam. The performance of the camera shall be determined by its ability to produce images that can be classified correctly as per human assessors. Images of raw veneers from three species are used: (i) American red oak (Quercus rubra), (ii) yellow poplar (Liriodendron tulipifera), and (iii) maple (Acer spp.). Features extracted are soft colour descriptors (mean and standard deviation) of the basal wood colour and grain colour. From these descriptors, salient features are found using the Sequential Forward Selection (SFS). Finally, SOM classifier is used to gauge and compare the performance of the different cameras.
Methods used in this paper can also be replicated to test other cameras such as mobile phone cameras, etc. Moreover, the performance characteristics of cameras from different genres particularly the cheaper ones will be beneficial in the development of affordable solutions for smaller workshops. The hardware setup used in this study and sorting algorithm can also be used to colour-sort veneers of other species.

Materials and methods
In this section, an overview of these research methodologies is presented here, summarized in Fig. 1. There are two major components in this study, namely: (i) image acquisition (the hardware), and (ii) image processing (the software).

Hardware setup
An image acquisition rig is constructed as per Fig. 2. Since raw veneers are rarely flat, a flat dome light (CCS LFXV-300SW, colour temperature of 5500 K, 300 × 300 mm aperture) was placed 28 mm above the veneer surface to suppress 1 3 any shadows that may arise due to undulations or raised grains on the veneer surface.
Three cameras were selected for this study, each representing the three common genres of cameras, namely Logitech C920 HD Pro (C920) (3.0 MP webcam, cheapest unit in this study), Sony RX0 II (RX0 II) (15.3 MP prosumer action camera, mid-tier price range), and Hikrobot MV-CE200-10UC (CE200) (20.0 MP industrial camera). For expedience, the three cameras were arranged in line along the axis of the conveyor. The industrial camera was  positioned in the middle, while the other two, having wider fields of view, were located on either side of the industrial camera spaced 80 mm apart. All three cameras were affixed 280 mm from the front tip of the lens to the conveyor.

Capture of veneer images
Red oak, yellow poplar, and maple veneer samples were obtained, generously provided by Weng Meng Industries Sdn Bhd. Images were captured over six different dates (Table 1) to obtain images from different batches of incoming veneers. The veneers had been pre-sorted by human workers (factory staff), and the colour groups of each image were noted. A total of 1289 distinct images of veneers was obtained for each camera (totalling 3867 images for three cameras).
Prior to each image acquisition session, images of X-rite Colorchecker Passport Photo 2 (Colorchecker) colour reference target are taken by each camera for the purpose of calibrating the session's images. Use of a colour reference for calibration is vital in colour research using cameras (Bianconi et al. 2013) as different cameras and lighting produce different image response curves.

Image processing
All analyses were performed using MATLAB ver 9.12 (R2022a) on two PCs with the following specifications: (i) Intel ® Core ™ i7-9700 CPU with 16 GB RAM, and (ii) Intel ® Core ™ i7-11,700 CPU with 16 GB RAM.

Image preparation
The preparation step is a vital one that enhances all the images so that features can be better extracted. Image preparation involves two steps: (i) extraction of wood image (remove the background), and (ii) colour calibration using the Colorchecker.
For extraction, the image was first standardized by locating the common boundaries in images of the same wood taken by the three cameras where they overlap. This was done by taking photos of the same measuring tape and using these images to determine the image's relative position to each other. Boundaries were also adjusted to exclude any visible 'feathering effect' caused by the edges of the dome light. Images were then cropped to those boundaries.
To remove unwanted backgrounds from narrow veneers (where conveyor is still visible in the frame), the image was analysed slice by slice; analysis of moving average's gradient of the red channel was used to determine where the edges of the veneers are, as summarized in Fig. 3.
Using images from the Colorchecker, all images taken during each day session were colour calibrated to the chart from the first day as reference. The colour values of each reference point on the Colorchecker on each day were correlated with the values of the reference image and the graph was examined for any deviations from parity. These deviation values were then used to perform corrections on all images taken on that day.
Images acquired from the CE200 camera underwent an additional gamma correction process (ɣ = 1/2.2) to convert all acquired images from their linear encoded gamma to gamma encoded images to better emulate human visual response curve to light intensities. Both RX0 II and C920 cameras, as with most consumer camera products, produce JPEG files, which by convention are already gamma encoded. Next, using CE200 calibration image as reference, all images taken from C920 and RX0 II were then adjusted to best fit the reference CE200 plot with the assumption that the plots are linear.

Feature extraction
The technique employed for feature extraction is what is termed as Otsu Soft Colour Descriptors (OSCD). Figure 4 shows the steps for this method. The raw image ( Fig. 4a) is converted into grayscale. Then the images' grayscale  cumulative histograms are split into two intensity strata using Otsu thresholding as shown in Fig. 4b). It is assumed (based on the assertion that there are two colour groups in the image: 'basal' and 'grain') that the overall histogram consists of two overlapping normal distributions -one of basal wood colours (higher intensity), and one of grains or any darker streaks on the wood (lower intensity). Depending on the species, the basal colour largely overwhelms the grains, and their mean intensities may be close together, but the bimodality of the plot is assumed to be present. Since Otsu searches for thresholds that bisect the histogram where the interclass variance is the largest, the threshold that it returns should be indicative of where the basal wood intensity and the wood grains bifurcate. It must be emphasized that we were not strictly looking for the bottom of the valley of two peaks, but to determine the separation of basal wood from its grains from a combined distribution of intensities from two unequal normal plots.
Based on the separation threshold as shown in Fig. 4b found using the Otsu method, the wood pixels can be classified as lower colour stratum (Fig. 4c) i), and upper colour stratum (Fig. 4c) ii). Their corresponding pixels in the original image were then extracted into lower and upper strata as can be seen in Fig. 4d. Based on Fig. 4d, the colour histograms for R, G and B were extracted. Their respective means and standard deviations were then calculated and used as features. The images were then halved in resolution, and their Otsu thresholds and soft colour descriptors were obtained. Indices of each feature are indicated in Table 2.

Feature selection
Reducing the number of features can facilitate data understanding and reduce the computational load (thus, processing time) for practical applications. The Sequential Forward Selection (SFS) method is well known to be an exemplary method for feature selection, and since SFS is a wrapper method, a classifier must be implemented together with it. Linear Discriminant Analysis (LDA) classifier is used as it produces acceptable results most of the time. SFS will first find the single feature that gives the best performance. This winning feature is then paired with all other features sequentially to find the best scoring combination. This process is repeated until the stopping criterion is met, which is typically either (1) a fixed N number of features, (2) upon reaching a certain accuracy level, or (3) if there are no more improvements in the accuracy even when more features are added. For this study, the first ten combination of features were found and shall be used (stopping criterion 1).
The five-fold cross validation approached was then used to find the winning features. The data set was first split into five approximately equal sizes. Four of the sets were used for training the LDA while the remaining set was used for testing to gauge the performance of different feature combinations. This was repeated four times so that all the sets are used as testing sets at least once. The final performance was the average performance of all the five testing sets.

Classifier
Self-Organizing Map (SOM) is a very popular unsupervised classifier first described in the 1980s (Kohonen 1982). It is an artificial neural network that can be used to reduce the dimensionality of a huge dataset into an easily visualizable two-dimensional map. It works by first looking for the closest node to the training dataset, and then updating weights of that node and its surrounding nodes (called neighbourhood) to approach that dataset at a certain training rate. This is iterated across the entire dataset, and then reiterated multiple times with a decaying neighbourhood size and training rate. This is sometimes termed as competitive learning, as nodes that have more training datasets associated with it or within its neighbourhood will form clusters. Hence, similar datasets nodes tend to cluster together, while different ones will cluster further away, forming chasms in between these clusters.
There are several ways to determine where each group is in the training set. One is by taking these training datasets and have them manually labelled (for instance, its wood colour group) and fed into the trained map grouped by the labels (the same colour group). Where the majority of these similarly labelled training datasets appear in the map shall be where the clusters are most closely associated with that label. To test an unknown sample, its weights just need to be compared with each node's weight. The closest node, also known as Best Matching Unit (BMU) is where the sample resides, and if the map had already been labelled, the sample belongs to that labelled group. Therefore, SOM is computationally very fast (complexity O(1)), suitable for deployments where speed is necessary.
In this study, salient features derived from SFS shall be used in training the SOM, the results of which shall be compared with the results using the full feature dataset. Performance is measured by observing the map of each human categorized colour group plot and performing a count on misclassifications. A naïve winner-takes-all approach is employed -the colour group with the most members in the node shall occupy that node (node shall be classified as that colour), and any other colour group members with lesser numbers falling in that node shall be counted as misclassifications. The conformance rates (which is the complement of misclassification rates) are observed, and this shall be repeated for different numbers of SOM epochs.
A 6 × 9 hexagonal grid map shall be used in this study, using an initial neighbourhood size of 3 with learning rates of 0.9 and 0.02 for the ordering and tuning phases respectively, and using Euclidean distance as metric for determining the winning nodes. Different numbers of epochs were tested (half of which are for ordering, and half for tuning). Table 3 shows the calibration parameters to correct for the drift that was found on each of the equipment's colour channels. C920 performed reasonably well, within 1% of the reference values respectively. After adjustments, C920 showed very close conformance to CE200's reference points.

Calibration
However, RX0 II's plot was very scattered, and there were signs that the image may have been overexposed. Because the parts of the field of view of the camera include dark areas in the rig, this is evidence that some form of high dynamic range, exposure compensation or white balancing has been performed on the image. Therefore, adjustment parameters derived from the reference image worked poorly with the actual wood image for RX0 II, while CE200 and C920 worked extremely well and looked very similar to each other, as can be seen in Table 4.
There appeared to be slight drifts in calibration images for all the cameras on different days of image acquisition. This could be because the use of a laptop and a PC on different occasions (both CE200 and C920 were powered via USB, while RX0 II was powered via an internal rechargeable lithium-ion battery). Within the same day, minimal drift is observed. Moreover, quite surprisingly, C920 offered the highest consistency in terms of colour stability. This also means the drift is unlikely to be due to the lighting, which would affect all three cameras similarly. CE200 experienced an inexplicable large dip on the second day of data collection, which underscores the importance of calibration, even when using an industrial grade camera. It also is well known that LED lights degrade over time, so frequent calibration is crucial to maintain the integrity of the trained SOM map. Figure 5 shows the performance of the 26 individual features for the different cameras and species. Performances of C920 and CE200 are similar while RX0 II exhibited the worst results of the three cameras with belligerent performance -reasons for this shall be discussed. Among the features particularly for C920 and CE200, we see dips in performance in features 9 to 11 -this corresponds to the lower strata [R, G, B] values, which is in strong agreement with the assessment that grain colour is poorly correlated with human perceived colour similarity. Moreover, elevated performance is noted around features 3-8 (full resolution [R, G, B] values, both stratum), and features 12-14 (half resolution [R, G, B] for upper strata), with most plots scoring peaks at feature 7 for CE200 and C920. This shows that differences in feature 7, which is the green intensity value of the upper strata (basal colour), is the most discriminatory feature for humans. This  Colour Calibrated* *CE200 images underwent an additional gamma correction process is consistent with the fact that in human psychophysiology, our eyes are most sensitive to green light (Dowling 2012). Individual features are not necessarily ideal features to use (in isolation), and combinations of features may produce better results. Figure 6 shows the best performance values of the different numbers of combination of features selected from SFS when applied using the five-fold cross validation of the different cameras on the different types of wood. It is evident that CE200 performed the best for red oak and maple, while C920 performed the best for yellow poplar. Only four features were required to reach the optimal performance for CE200 and C920. RX0 II on the other hand required up to eight features to reach its optimal performance. It is interesting to note that, with close agreement to conventional wisdom, smaller subsets of SFS selected features performed better compared to using all 10 features.

Feature selection
Based on ranked features (Table 5), similar to observations of individual feature performance results in Fig. 5, feature 7 is the dominant feature to use, followed by feature 6 (which is red intensity of the upper strata at full resolution). Red is the most common colour component in brown (pure brown defined as [R, G, B] = [128, 64, 0]), which is the dominant colour in wood.
It is evident from results in Table 6 that for CE200, red oak's yellow and white categories are easily confused with each other, as do the dark and orange categories. For yellow poplar, likewise, white and yellow veneers are easily confused, but there is also some overlap in categorization between yellow and dark categories. This is a perfect illustration of the problems associated with trying to segregate an entire spectrum of wood colour available in nature into extremely limited bins where specimens with boundary characteristics present a wide band of ambiguity.
Additionally, we can see that white and dark groups are almost always mutually exclusive (no confusion) except for two outlier cases with yellow poplar, which lends us confidence in the viability of the features used in this study. The two outliers could be miscategorization on the part of human assessors.
The performance garnered from using this technique of feature selection shows that the features used can reproduce very closely the colour segregation of veneers done by humans. This also shows that Otsu thresholding can reliably split the wood histogram into two colour groups that represent the basal wood colour and the grain colour. It must be acknowledged that there are many thresholding techniques available in the image processing toolkit, and there is room for future studies to test these different variations. Moreover, this technique has only been tested on red oak, yellow poplar, and maple, and may not be a universal feature of all species of timber. Having said that, most timber species have similar characteristics, so this may well work in a large majority of situations. Figure 7 shows the performances measured by conformance rate of various SOM epoch numbers, as well as naïve feature set versus SFS selected feature set. Increasing the number of training epochs does not improve overall performance, in fact in most cases it impairs it. Moreover, SFS feature set clearly trumps that of naïve feature set, which goes to show that the targeted set of features outperforms a full dataset (as suggested by LDA results). Overall, this suggested setup can achieve 89.0% for red oak (C920), 94.5% for yellow poplar (CE200) and 96.4% for maple (CE200).

Self-organizing map
The map for the highest scoring configuration, which is maple using SFS feature set on the CE200 camera, is shown in Fig. 8. It can be seen in the map that the colour groups are very clearly demarcated (using the winner-takes-all model of selecting the node's colour group association), with very few outliers. There is a curious schism in the middle of the map which warrants further discussion. SFS is seen to improve some of the results of the SOM, with the additional bonus of requiring less features to train and test in the neural network. This technique can be used in high-speed applications to boost decision making speeds.
There appears to be some outlier clusters in the resulting SOM maps. This may be due to a few factors, one being misclassification by the human worker. The samples were taken from different days, and classification of the veneers' colours was done by different workers -therein lies the difficulty in standardizing colour sorting among humans. A follow-up  Table 2 study utilizing images of these outlier veneers to determine which colour category they belong to is being planned.
Since SOM performs clustering without prior knowledge of the colour groups, the appearance of schisms within the colour clusters on the map does indicate there may be different ways the colour of the veneers can be sorted using the feature set used. This shall be the scope of future studies.

Sony RX0 II performance deficit
It is discovered that using the Sony Imaging Remote software, the auto-white balancing feature of the Sony RX0 II cannot be turned off, hence the poor results in this study. Further, there may be some additional internal processing that is being done to the captured image -something that is very common with modern consumer cameras that attempt to automatically beautify the scene. It may be necessary to utilize the Sony Camera Remote SDK to enable finer manipulation of the sensor's parameters. Despite calibration using   It is vital that camera settings such as aperture and shutter time are regulated and standardized, and any post-capture processing such as auto-white balance features and any other additional post-capture processing of consumer cameras be turned off to ensure standardized image capture. Alternatively, a standard colour chart should be present within the frame at every image capture so that the resulting captured image can be calibrated to a standard reference.
Therefore, the standard Sony Imaging Remote software may not be suitable for research use. It is therefore suggested that future studies utilizing this equipment use custom made connectors to the camera via the Camera Remote SDK to garner greater control of the camera settings. This goes for any other camera that will be used for any colour related study. Naturally, industrial cameras offer the highest degree of control over the settings, and fortunately, webcam's Windows driver, too, allow auto white balance to be turned off and do not have any fancy processing done on the resulting image. Moreover, results from the RX0 II camera in this study shall be given lower prominence.

Comparisons with previous research
The performance of this algorithm for the three different types of wood ranged from 89% (red oak) to 96% (maple). This is comparable to other studies, such as rubberwood (95%) (Kurdthongmee 2008), oak (95%) (Bianconi et al. 2013;Faria et al. 2015;Shivashankar and Madhuri 2018), teak (90%) (Bianconi et al. 2013), and cedar (90%) (Nurthohari et al. 2019). It should be noted that most of the dataset of these studies are small, ranging from just one sample Fig. 7 Performance comparison based on human-SOM classification rate of conformance (Faria et al. 2008) to 30 samples per colour (Kurdthongmee 2008). Some researchers generated more datasets by extracting overlapping segments from original images. In this research, over 80 distinct images per colour group per species were used. Having more images acquired on different days creates larger sample variability, creating more complexity in classification. The results from the present study shall be more robust in actual industrial applications.
Furthermore, soft colour descriptors (Lòpez et al. 2005) were used as features for the SOM in this study, particularly mean and standard deviations, as they were proven to perform well in RGB colour space with no appreciable difference to using other colour spaces such as HSV or CIEL*a*b* (Bianconi et al. 2013). The approach by Bianconi et al. (2013) was to use 1-NN classifier as opposed to SOM. However, both our approaches use very similar parametric distance comparators (namely Euclidean distance), hence their conclusion on RGB vs HSV and CIEL*a*b* applies to the present study. Furthermore, static lighting conditions and calibration of our images to an index date is analogous to transformation of RGB to CIEL*a*b* which are colour values adjusted to some specified reference white; this renders conversions from RGB to any other colour spaces moot.
Nonetheless, the present technique differs from that of Bianconi; while he took average and standard deviation of the entire image (as well as other higher order statistics), we split the image into two intensity levels and took the means and standard deviations of each stratum. This enables us to characterize the chromatic distribution of basal wood colours and grain colours into separate features for more robust sorting results.

Conclusion
This research confirms the viability of using SOM for colour sorting of red oak, yellow poplar, and maple veneers, especially using features selected using the SFS method.
Thresholding using Otsu appears to provide reasonable results, which affirms the initial belief that wood colour may be split into two strata, with the upper stratum (wood basal colour) weighing heavily in colour sorting applications. Moreover, Hikrobot MV-CE200-UC performed the best among the three cameras, with Logitech C920 HD Pro scoring a close second. Sony RX0 II performed the worst due to software limitations. Therefore, it is reasonable to use a high-quality webcam for colour sorting in place of an industrial camera in situations where budget is of concern, with the caveat that proper calibration is performed. However other considerations such as durability of the equipment were not tested.