Collection of Pollen
Pollen were collected from all five species of Urticaceae found in the Netherlands. In the genus Urtica, the native species U. dioica L. (common nettle) and U. urens L. (small nettle) are ubiquitous in nitrogen rich moist areas, ditches, woodlands, disturbed sites and roadsides. The exotic Mediterranean species U. membranacea is rarely encountered, though is included in this study since its range is expected to increase due to the effects of global warming. The genus Parietaria is represented in the Netherlands by the species P. judaica L. (pellitory of the wall) and P. officinalis L. (upright pellitory) that both occupy rocky substrates, mainly in the urban environment15. Moreover, P. judaica has shown a big increase in abundance over the past decades, e.g. in the Netherlands (Supplementary Figure S1), but also in many other parts of the world.
Pollen from all Urticaceae species was either freshly obtained or collected from herbarium specimens (Naturalis Biodiversity Center; see Supplementary Table S1). Fresh material was collected with the help of an experienced botanist in the direct surroundings of Leiden and The Hague during the nettle flowering seasons of 2018 and 2019. Original taxonomic assignments for the herbarium specimens were verified using identification keys and descriptions35. A minimum of four different plants were sampled per species, from different geographical locations to cover as much of the phenotypic plasticity in the pollen grains as possible and reflect the diversity found on aerobiological slides.
To produce palynological reference slides, thecae of open flowers were carefully opened on a microscopic slide using tweezers. A stereo microscope was mounted in a fume hood to avoid inhalation of the severely allergenic pollen of Parietaria species. Non-pollen material was manually removed to obtain a clean slide. The pollen were mounted using a glycerin:water:gelatin (7:6:1) solution with 2% phenol and stained with Safranin (0.002% w/v). These represent the same conditions also used in airborne pollen collected by a Hirst type sampler. Cover slips were secured with paraffin.
Pollen Image Capture
A total of 6,472 individual pollen grains were scanned, using a minimum of 1000 individual pollen grains for each species, though numbers varied (Supplementary Table S1). The system used for the imaging was a Zeiss Observer Z1 (inverted microscope) linked to a Hamamatsu EM-CCD Digital Camera (C9100), located at the Institute of Biology Leiden (IBL). Grayscale images were used, since the pollen were stained to increase contrast and not for species recognition.
The imaging procedure was as follows: on each microscope reference slide containing only pollen of one species of Urticaceae, an area rich in pollen was identified by eye and this area was automatically scanned using multidimensional acquisition with the Zeiss software Zen BLUE. For areas that were very rich in pollen, a user-defined mosaic was created consisting of all the tiles to be scanned (e.g. 20x20 tiles), while a list of XY positions was used for microscopic slides less rich in pollen. Because pollen grains are 3-D shapes, catching all important features can only be achieved using different focal levels, so-called ‘Z-stacks’. A total of 20 Z-stacks were used in this study with a step size of 1.8 µm. The settings used for scanning were a Plan Apochromat 100x (oil) objective and numerical aperture 0.55 with a brightfield contrast manager. To maintain similar conditions in the image collection process, the condenser was always set to 3.3V with an exposure time of 28ms.
Reference Pollen Image Library
All images were post-processed in ImageJ (Fiji)36 using the script Pollen_Projector (https://github.com/pollingmarcel/Pollen_Projector). The input for this script is a folder containing all raw pollen images (including all Z-stacks), and the output is a set of projections for each individual pollen grain that are subsequently used as input for the VGG16 deep learning model.
Pollen_Projector identifies all complete, non-overlapping pollen grains and extracts them as stacks from the raw Z-stack. This is achieved using binarization on the raw images to detect only those rounded objects with a circularity >0.3 and a size larger than 5µm. Out-of-focus images within each group of 20 Z-stack slices were removed using a threshold for minimum and maximum pixel values. The conventional input of a convolutional neural network is a three-channel image. In colour images RGB channels are commonly used, but since we use grayscale images, three different Z-stack projections were chosen to represent the three different channels. The projections used are Standard Deviation, Minimum Intensity and Extended Focus. Standard Deviation creates an image containing the standard deviation of the pixel intensities through the stack, where positions with large differences appear brighter in the final projection. Minimum intensity takes the minimum pixel value through the stack and uses that for the projection. Finally, the Extended Focus projection was created using the ‘Extended_Depth_of_Field’ ImageJ macro of Richard Wheeler (www.richardwheeler.net)37. This macro takes a stack of images with a range of focal depths and builds a 2D image from it using only in focus regions of the images. A schematic overview of the processes behind the Pollen_Projector script is shown in Supplementary Figure S2.
Convolutional Neural Network
Convolutional Neural Networks (CNN) are widely used in the field of computer vision for image classification, object detection, facial recognition, autonomous driving, etc. For our work we used the pre-trained VGG16 network38 in Keras39. Compared with traditional neural networks and shallow convolutional neural networks, VGG16 has deeper layers that extract more representative features from images. A feature extractor and classifier are two key structural parts of the CNN that perform the classification task. In order to train a CNN model, a large number of tagged data sets are fed into a model, to train the model to learn more features to be able to distinguish the images.
The VGG16 network contains 13 convolutional layers that form five blocks, which generate features from images in the feature extraction phase. During the training process, parameters of convolutional layers were derived from the pre-trained network on the ImageNet dataset. Subsequently, three fully connected layers were built and added to the convolutional layers to classify the different classes (Supplementary Figure S3). To improve the effectiveness, robustness and generalization ability of the VGG16 model, as well as to prevent overfitting, 10-fold cross-validation was applied in the training process. The complete pollen image dataset was split into a training data set (90%) and a test data set (10%). The training data set was split into ten subsets and used for 10-fold cross-validation. For each fold, the number of epochs was set to 30. The accuracy of the model converged at this point and the model is therefore found not to be overfitting. For each fold, a model was trained using nine of the folds as training data and validated on the remaining part of the data. After the 10-fold cross-validation, 10 models were obtained and tested on the test dataset. In order to try to improve the accuracy of the model further, we further compared hard voting (or majority voting) to the standard average measure for the ten models. Hard voting is summing the votes for class labels from each model and predicting the class with the most votes.
In order to quantify model accuracy, several commonly used performance measures were used:
where TP refers to true positives, TN to true negatives, FP to false positives and FN to false negatives. Recall is the number of true positives divided by the total number of elements that belong to the correct class, which is the sum of the true positives and false negatives. The F1-score is the weighted average of the precision and recall. The correct classification rate (CCR) reflects the accuracy of the model.
Data augmentation
A large number of images for each class is required to train a deep learning model, as the performance will increase when more variation is fed to the model. Due to the nature of the images investigated in this study, the model was sensitive to small changes, since the differences between the genera are very subtle. Therefore, data augmentation was tested to try to increase the variety of pollen images used as input. We selected the augmentation options brightness and flip. Brightness range was set from 0.1 to 2, with <1 corresponding to a darker image and > 1 to a brighter image. Horizontal- and vertical flip were also applied randomly (Supplementary Figure S4). The results were compared to running the model without data augmentation.
Test Cases
For each aerobiological sample an area representing 10% of the total deposition area was scanned manually for Urticaceae pollen grains (i.e. 8 full transects at 100X magnification) resulting in 112 pollen grains from the sample from Leiden (LUMC), 63 from Lleida and 26 from Vielha (both ICTA-UAB). One aspect of the Catalonian aerobiological samples was the presence of pollen from families that produce similar pollen to Urticaceae, that are rarely encountered in the Netherlands. These included Humulus lupulus L. (Cannabaceae) and Morus sp. (Moraceae) which were not included in our training dataset. These can be distinguished from Urticaceae, however, in the case of H. lupulus by their much larger size (up to 35µm) and the very large onci and, in the case of Morus by the more ellipsoidal shape. These pollen grains were removed from the dataset before they were fed to the CNN for classification.