Specimen sampling and depository.
The research was carried out on 95 individuals of D. chrysitis (43) and D. stenochrysis (52). Male and female imagines were caught in light traps between 1995 and 2018 in Poland. Legislative and determinative features, including the location and trapping date of D. chrysitis and D. stenochrysis individuals, are presented in Supplementary Table S1. All sampled specimens were deposited in Department of Entomology and Environmental Protection of the Poznań University of Life Sciences.
In Poland, the populations of eastern and western moths occur sympatrically, constantly mixing, and therefore, we can exclude that the moths came from a few homogeneous populations. This is often a problem in studies involving relatively small areas for species that extend over much of the Palearctic. In addition, moths have been collected over a period of more than 20 years, which further diversifies the risk of "selective choice”. For these reasons we examined the potential influence of the age of specimens in collection on the quality of discrimination of species. Similarly, we assessed whether male and female individuals differ spectrally. For this purpose, we used the same discriminant analyzes described later in the paper. We notice that currently we cannot train successful learners predicting neither age-based fading nor sex of specimen. Also, the full effectiveness of the RF classifier (described in section Results) in separating species indicates that those two properties does not affect the features favored by the classification models.
The species of each individual was determined on the basis of following features: morphology of genitalia and colorization of the front pair of wings. In a laboratory, the body parts and the external genitalia were dissected in a standard way for each individual. The abdomen was first removed and dipped for 24–36 hours in 10% caustic potash (KOH). Genitalia were then removed from the softened surrounding tissues. The aedeagus was removed, and the external genitalia was partially dehydrated with ethanol and mounted on glycerine between the microscope slides and cover slips. Because of their three-dimensional shape and fragile structure, the endophalli were stored in liquid glycerine. The species determination in the collection described above was carried out by entomologist, professionally dealing mainly with moths from the Noctuidae family. Latest available publication of Ronkay et al.2, and also the drawings of the genitalia shown in the several papers served as a key to species determination3,4,45.
The coloring of the wings is an ambiguous feature. D. chrysitis differs externally from D. stenochrysis by its unbroken brown median area and more indistinct subterminal line2. Depending on the individual, species traits may be very strong, strong or weak. The individuals with the typically marked species features on the forewings of D. chrysitis (Fig. 1a) and D. stenochrysis were classified as having very strong features. D. chrysitis (Fig. 1a) and D. stenochrysis (Fig. 1b) individuals with typically marked species features on the wings of the first pair were classified as having very strong features. The strong features of D. chrysitis butterflies typically had a species-specific not divided central field, but a sinuous subterminal line at which the metallic area ends, resembling the subterminal line characteristic of D. stenochrysis. Strongly featured D. stenochrysis individuals had a clearly divided brown median area by a band of golden iridescent scales over 1 mm wide. Individuals with weak features were marked on the basis of the structure of their genitals.
Expert analysis is always falsifiable. For this reason, the results of this part served as the ground truth reference for the spectral and chemometric procedures.
Spectra were measured with a system consisting of the ASD FieldSpec 3 spectrophotometer (FieldSpec Analytical Spectral Devices, Inc., Boulder, Colorado, USA) attached by optical fiber to microscope NU 2 (VEB Carl Zeiss Jena, Jena, Germany). The spectrophotometer recorded the reflected electromagnetic radiation in the wavelength range from 350 to 2500 nm with a spectral sampling of 1.4 nm from 350 to 1000 nm and 2 nm from 1000 to 2500 nm. The spectral resolution in VIS was 3 nm and at 1400 and 2100 nm was 10 nm. The spectrophotometer was calibrated with a Spectralon (Labsphere) white standard before each measurement series. A plan apochromat objective (25x) and a coaxial illumination with a halogen lamp were used in the microscope.
Spectral measurements were carried out on the two dominant color areas on the wings, brown and golden iridescent on the dorsal side of forewing, which are typical of D. chrysitis and D. stenochrysis. In the case of the brown area the reflectance was measured from the median area at the front edge of the forewing while golden iridescent included subterminal area. In measurement location wings are covered with scales of two types. Cover scales are visible from the outside and brown ground scales below them (Supplementary Fig. S1). Brown area is covered with brown melanin-pigmented cover scales. While on the shimmering area, cover scales are actually colorless melanin-deprived and referred to as glass scales. As a result of light interference and diffraction, they form a physical color ranging from bluish to dominant gold to copper. Glass scales differ significantly in structure from brown scales, which are perforated and higher in cross-section (Supplementary Fig. S1 and S2). The base brown scales are also perforated. The scales are arranged in many layers on the wing.
The spectra at wavelengths below 400 nm and above 2100 nm exhibited high levels of noise, and they were not used in further analysis. The measurements were made in triplicate on the dorsal forewing of each of the 95 individuals and then the values have been averaged for each item. Results of spectra measurements are presented in Fig. 2.
Spectral measurements provided 95 cases, each with what means 3402 potential predictors (two types of scales times 1701 bands). The collected dataset is small from a data science point of view, especially compared to the number of estimators. Such a situation is typical for most life science projects, where the costs limit the number of cases. In the first step, to highlight the spectra shape descriptors, data was transformed using Savitzky–Golay filter46. The filter removes unnecessary effects like baseline shifts and noise resulting from the non-ideal sampling process. The SG filter requires three parameters and by experiments we found that configuration: differentiation order = 2, polynomial order = 2, and window size = 5 provides the best results. Data after SG transformation are input features for all classifiers. The entire procedure is summarized in Fig. 3.
Typical machine learning procedures require dividing the collected data into at least two sets: the training - used to train the model, and the testing set - used for an independent evaluation of the model accuracy. With a small number of cases and many features, the role of features in the trained model may vary significantly for various training and testing sets. The consequence of a small number of cases is the model's potential overfitting to the provided data47. In our work, we attempt to reduce overfitting and detect essential predictors by two independent methods. The first is fully stochastic both at the sampling and training level and works with a complete spectrum. The second is fully deterministic, and it first selects a limited subset of potentially useful spectral features and then discriminates species by an exhaustive search inside this subset only. Both approaches have their own disadvantages, especially on small datasets. Stochastic approach, despite its utility may result in random, non-optimal configuration, difficult to reproduce. Deterministic approach, through dimensionality reduction processes may make it difficult to find less obvious but more optimal solutions. For this reason, the species discrimination process will be controlled by two independent algorithms.
The first discriminating procedure uses Random Forest which is a stochastic algorithm and its results may change between each recurrence. Moreover, the standard procedure of machine learning training requires to hold part of the cases for the testing set. It means that the model is trained based on 60–80% of the entire dataset. Machine learning models gain their highest performance when the distributions of selected predictors (features) in the training and validation sets are close to each other. With large datasets this is not the case, but when the dataset is small, such an assumption is difficult to fulfill. It means that there is a risk that depending on the composition of the training set, the selected predictors will not be relevant for the cases included in the testing set. It is mainly related to high efficiency on the training set and low efficiency on the test set.
To determine the risk of overfitting, we applied a bootstrapping procedure, including 300 iterations. If such overfitting risk is high, we can expect that the testing set's performance will vary from low to high between iterations. If such a risk is low, each iteration shall return with similar performance. Moreover, if the trained model will select the same predictors at each iteration, we can expect that correct and incorrect classification will include the same cases each time. At each iteration, the entire dataset is randomly divided into the training set including 70% of cases and the testing set with remaining 30%. It means that we have 300 different training sets assessed by 300 different testing sets and each case had a chance to be included in the testing set 60 times on average. It is a large enough number to assess the stability of the process of discrimination. The RF algorithm also provides "feature importance" a Gini index describing how each variable decreases the impurity of RF internal splits. The index varies between 0 and 1, where 0 denotes that the variable cannot increase the purity, while 1 means that variable allows to split the dataset into pure subgroups.
In the second procedure we first attempted to define a group of the most significant spectrum bands for the distinction of species. As “the most important”, we define a limited, possibly small number of predictors that successfully separate the species under investigation. First, all bands were sorted in descending importance order. We used the value of two-sample Kolmogorov-Smirnov (K-S) test as a variable importance index48. The usefulness of this statistic to feature selection results from the absence of the assumption on the form of the distributions of compared sets. For each wavelength D-statistic was calculated using Eq. (1):
where: Fdc and Fds empirical distribution of D. chrysitis and D. stenochrysis subsets, and sup is supremum function i.e. the choice of the largest operand value. D-statistic is 0 when two distributions fully overlap, values between 0 and 1 when partially overlap, and 1 if two distributions are completely disjoint. All wavelengths were sorted from the highest to lowest D-statistics, it means from the most to least separating features. Next, we searched for the minimal combination of predictors providing perfect separation between species. To evaluate the separation we used an accuracy of Linear Discriminant Analysis (LDA), which is considered, as one of the best tools to find a minimal effective combination of spectral features49.
We used the R programming language50 for the analysis; the Savitzky-Golay filter from prospectr51 package to transform spectral curves; the ranger52 package to train the random forest models; linear discrimination from the MASS53 package.