Background: In recent years, Representation Learning techniques have proved a valuable resource in many research fields for effectively mapping abstract categories into numeric scales. In this study, we developed a model for learning tree species representations using deep neural networks and data from a Spanish National Forest Inventory. Our approach was based on a supervised species classification of every tree in the dataset using as input a set of phytocentric and geocentric variables derived from forest inventory data and environmental cartography. Derived from this, we produced two sets of representations: 1) tree-level, using the units of the embedding layer, and 2) species-level, using the weights of each category in the output layer.
Results: Among the tested architectures, a model with two hidden layers of 1024 units and an embedding layer of 16 units provided the best apparent and test performances (Matthew's Correlation Coefficient = 0.89). The developed embeddings were evaluated intrinsically by estimating their correlations with supplementary species descriptors not included in the training dataset. This analysis revealed some significant associations that prove the generality of the embedding model. For instance, some latent dimensions could be used for differentiating conifer vs. broadleaved species. Other dimensions were correlated with forest species diversity indexes.
Conclusions: We concluded that the developed embeddings provided accurate and generalizable numeric representations of Spanish tree species that might be useful as a starting point for future research in forest modeling.