1. In recent years, Representation Learning (RL), a subdiscipline of artificial intelligence, has proved a valuable resource in many research fields for mapping abstract categories into numeric scales as a means to boost varied quantitative modeling tasks. Despite the up-and-coming advantages that RL could imply for managing categorical data in ecological modeling, applications in ecology are still lacking. In this study, we proposed a new method for applying RL to forest ecology, labeled TreeSp2Vec, for developing tree species numeric representations (embeddings).
2. Our approach entailed a supervised species classification of individual trees using as input a set of phytocentric (morphometrics and composition) and geocentric (climate, soil, and physiography) variables derived from National Forest Inventory data and environmental cartography. Species classification was carried out using deep neural networks with several fully connected layers, an intermediate embedding layer of up to 32 dimensions, and an output layer with softmax activations.
3. Among the tested neural network architectures, a multi-layer perceptron with two hidden layers of 1024 units and an embedding layer of 16 units provided the best apparent and test classification performances (Matthew’s Correlation Coefficient = 0.89). Additionally, the developed latent representations (W), or embeddings, were evaluated intrinsically by estimating their correlations with supplementary species descriptors that were not included in the training dataset. The evaluation analysis revealed some significant associations that proved the generality of the embedding model. For instance, some latent dimensions (e.g., W6 and W16) helped differentiate species general features, such as conifers vs. broad-leaved species, while other dimensions (e.g., W2 and W5) were related to forest ecosystem characteristics such as competition intensity (relative spacing index) and biodiversity (Simpson index).
4. We concluded that the developed embeddings provided accurate and generalizable numeric representations of the considered tree species, which can be used as a ground for further cutting-edge forest ecology modeling approaches. Moreover, our approach is easily extendable to other ecological research areas, opening a new range of artificial intelligence applications in ecology.