How Spectral Properties and Machine Learning Can Categorize Twin Species - Based on Diachrysia Genus


 Confirmation of distinctiveness or taxonomic affinity awaits new evidence in many twin species. In our work we used noninvasive point reflectance spectroscopy in the range from 400 to 2100 nm coupled with machine learning to study scales on the brown and golden iridescent areas on the dorsal side of the forewing of Diachrysia chrysitis and D. stenochrysis. The basis for the study a statistically significant collection of 95 specimens gathered during 23 years in Poland. The numerical part of an experiment included two independent discriminant analyses: stochastic and deterministic. The more sensitive stochastic approach achieved average compliance with the species classification made by entomologists at the level of 99-100%. It demonstrated high stability against the different configurations of training and validation sets, hence strong predictors of Diachrysia siblings distinctiveness. Both methods resulted in the same small set of relevant features, where minimal fully discriminating subsets of wavelengths were three for glass scales on the golden area and four for the brown. The differences between species in scales primarily concern their major components and ultrastructure. In melanin-absent glass scales, this is mainly chitin configuration, while in melanin-present brown scales, melanin reveals as an additional factor.


Introduction
In the last decades, numerous attempts were made to revise the insect's taxonomy. We became interested in the Noctuidae family with about 25 000 known species 1 representing one of the most species-rich family of Lepidoptera. The noctuids subfamily Plusiinae (Boisduval, 1828) with Plusia (Ochsenheimer, 1816) as its type is distributed worldwide except in the Antarctic 2 . This popular and easily recognizable noctuid group is, as one might expect, monophyletic with numerous morphological, bionomical and biogeographical evidence for their monophyly. However, the situation is so far not clear, despite the intense studies made on the Plusiinae in the last two decades 2,3 . The taxonomic interpretation of twin species: Diachrysia chrysitis (Linnaeus, 1758) and Diachrysia stenochrysis (Warren, 1913) ( Fig. 1) is one of the biggest problems in this subfamily. In 1961 Kostrowicki distinguished Plusia tutti as a new species from D. chrysitis on the basis of certain external and male genital features 4 . It turned out, however, that P. tutti and D. stenochrysis are the same species. This gave rise to a series of studies chronologically stimulated by Lempke 5 , Urbahn 6,7 and Rezbanyai-Reser 8 . As a result of these studies which involved breeding and morphological examination of the genital organs, it was concluded that some individual moths could not be attributed to one or another species due to too large differences in genitalia, wing pattern and caterpillar coloration. Despite this, the taxonomic separateness of both species was questioned, which generated a number of further studies. With the use of speci c sex pheromones by Priesner 9 , taxonomic research on Diachrysia spp. entered a new path. They were continued, among others by Bruun 10 , Svensson et al. 11 , Löfstedt et al. 12 and Inomata et al. 13 , however, did not solve the problems in species classi cation. Soon, new insight into the taxonomy of these twin species was made possible by research on reproductive isolation of pheromone-trapped male Diachrysia using allozyme analysis initiated by Svensson et al. 11 . Further researches were continued with the use of the cytochrome oxidase subunit I 14,15 . However, the problem of taxonomy within the Diachrysia genus remained unresolved, and Hausmann et al. 16 indicated that further research is needed at a broad geographical scale.
The declining number of taxonomists experienced in morphological insect evaluation 17 results in an increased use of automatic identi cation methods 18, 19 . Re ectance spectroscopy (RS) methods are complementary to expert or genetic classi cation 20 . On the contrary to the visual evaluation, RS provides broader spectra including the ultraviolet, near and mid-infrared range. The advantage of RS is the potential to quickly obtain a large amount of information about the studied object during a single and possibly non-invasive measurement.
In recent years, RS has been applied in various disciplines of biology 21,22 . Several papers in entomology involving visible (VIS) and near-infrared (NIR) spectroscopy applications coupled with numerical methods have been recently reported (see: Johnson and Naiker 19 for a comprehensive list).
Many studies refer to spectral properties of butter ies and moths but most of them have considered only re ectance in human-visible wavelengths (380-780 nm) in which coloration is perceivable. The color of the wing can be pigment-dependent and a separate phenomenon is the structural colors of the wing, which was the subject of in-depth research conducted also on the Noctuidae family [23][24][25] .
Coloration enables expert classi cation as well as can be suitable for automatic species identi cation.
Kaya and Kayci 26 using neural networks trained on the RGB images and its texture created a classi er which distinguishes 14 species of butter ies with over 90% accuracy. Their collection covered only examples with a relatively different color pattern of wings. However, there exist many lepidopteran species which cause identi cation problems because wing coloration is not a diagnostic feature.
Few studies considered butter ies and moths wing re ectance properties beyond visible range of spectra 22,27 . This region of the spectrum is useful in the detection of stored-grain insects 28, 29 . The NIR and SWIR (short-wave infrared) ranges comprise approximately 50% of total incident radiation 30 . In this region of the spectrum, information on the structure, water and fat content, and the presence of chitin in the studied objects is encrypted 31,32 . Nevertheless, only a few chemical compounds are identi ed, mainly bonds occurring in the structure. Gebru et al. 33 showed three premises that can be useful in the detection and identi cation of insects: low absorption of radiation by melanin over the entire SWIR range, strong absorption by water in the band 1470-1550 nm, and 1320 nm wavelength, which has been indicated as unaffected by both melanin and water. Combining information from SWIR and NIR was useful for insect species classi cation as well as physiological status, age and sex assessment [33][34][35] . However, studies to date do not provide information on species determination in the Lepidoptera order based on the wing re ectance spectrum in the NIR and SWIR regions.
Detailed information about the optical properties and structure of butter y wing scales is provided by microscopic studies. For many years, the subject of such research has been the phenomena of light scattering and diffraction on butter y scales 24,36 . The wings of butter ies and moths have various structures, color patterns and types of scales which requires in the study focusing on relatively small parts of the wing, e.g. eyespot 37 . There are scienti c reports that the wing scales ultrastructure shapes the pattern of the re ectance spectrum even in closely related species 38 . We suppose that the microscopic spectral characteristics of selected areas of the D. chrysitis and D. stenochrysis wing is worth exploring over a wide spectrum and may provide information useful in determining these species.
Remote sensing methods produce many features (spectral wavelengths), which means that rigorous numerical analysis is an indispensable step of the analytical procedure. Numerical analysis is necessary with all data sources, including genetics, RGB photos, and detailed hyperspectral measurements of various morphological characteristics such as shape, color, and wing pattern [39][40][41] . Works also have been applied to determine sex, age, color diversity or taxonomical differentiation, but only a few of them report a truly successful classi cation 42,43 .
A common feature of "sibling species" is that the visible differences between individuals assigned to one species may be more distinct than the differences between species themselves 44 . Surprisingly, no attempt has yet been made to distinguish species complexes of Lepidoptera by studying parts of a wing using the full spectral range and advanced numerical techniques. In our work, we present the experiment results, comparing the detailed expert classi cation of D. chrysitis \ D. stenochrysis with results of the chemometric analysis of the full range of re ectance spectrum between 400 and 2100 nm, obtained under the microscope from two distinct groups of scales on the forewing which have completely different melanin content.

Methods
Specimen sampling and depository.
The research was carried out on 95 individuals of D. chrysitis (43) and D. stenochrysis (52). Male and female imagines were caught in light traps between 1995 and 2018 in Poland. Legislative and determinative features, including the location and trapping date of D. chrysitis and D. stenochrysis individuals, are presented in Supplementary Table S1. All sampled specimens were deposited in Department of Entomology and Environmental Protection of the Poznań University of Life Sciences.
In Poland, the populations of eastern and western moths occur sympatrically, constantly mixing, and therefore, we can exclude that the moths came from a few homogeneous populations. This is often a problem in studies involving relatively small areas for species that extend over much of the Palearctic. In addition, moths have been collected over a period of more than 20 years, which further diversi es the risk of "selective choice". For these reasons we examined the potential in uence of the age of specimens in collection on the quality of discrimination of species. Similarly, we assessed whether male and female individuals differ spectrally. For this purpose, we used the same discriminant analyzes described later in the paper. We notice that currently we cannot train successful learners predicting neither age-based fading nor sex of specimen. Also, the full effectiveness of the RF classi er (described in section Results) in separating species indicates that those two properties does not affect the features favored by the classi cation models.
The species of each individual was determined on the basis of following features: morphology of genitalia and colorization of the front pair of wings. In a laboratory, the body parts and the external genitalia were dissected in a standard way for each individual. The abdomen was rst removed and dipped for 24-36 hours in 10% caustic potash (KOH). Genitalia were then removed from the softened surrounding tissues. The aedeagus was removed, and the external genitalia was partially dehydrated with ethanol and mounted on glycerine between the microscope slides and cover slips. Because of their threedimensional shape and fragile structure, the endophalli were stored in liquid glycerine. The species determination in the collection described above was carried out by entomologist, professionally dealing mainly with moths from the Noctuidae family. Latest available publication of Ronkay et al. 2  Spectra were measured with a system consisting of the ASD FieldSpec 3 spectrophotometer (FieldSpec Analytical Spectral Devices, Inc., Boulder, Colorado, USA) attached by optical ber to microscope NU 2 (VEB Carl Zeiss Jena, Jena, Germany). The spectrophotometer recorded the re ected electromagnetic radiation in the wavelength range from 350 to 2500 nm with a spectral sampling of 1.4 nm from 350 to 1000 nm and 2 nm from 1000 to 2500 nm. The spectral resolution in VIS was 3 nm and at 1400 and 2100 nm was 10 nm. The spectrophotometer was calibrated with a Spectralon (Labsphere) white standard before each measurement series. A plan apochromat objective (25x) and a coaxial illumination with a halogen lamp were used in the microscope.
Spectral measurements were carried out on the two dominant color areas on the wings, brown and golden iridescent on the dorsal side of forewing, which are typical of D. chrysitis and D. stenochrysis. In the case of the brown area the re ectance was measured from the median area at the front edge of the forewing while golden iridescent included subterminal area. In measurement location wings are covered with scales of two types. Cover scales are visible from the outside and brown ground scales below them ( Supplementary Fig. S1). Brown area is covered with brown melanin-pigmented cover scales. While on the shimmering area, cover scales are actually colorless melanin-deprived and referred to as glass scales. As a result of light interference and diffraction, they form a physical color ranging from bluish to dominant gold to copper. Glass scales differ signi cantly in structure from brown scales, which are perforated and higher in cross-section (Supplementary Fig. S1 and S2). The base brown scales are also perforated. The scales are arranged in many layers on the wing.
The spectra at wavelengths below 400 nm and above 2100 nm exhibited high levels of noise, and they were not used in further analysis. The measurements were made in triplicate on the dorsal forewing of each of the 95 individuals and then the values have been averaged for each item. Results of spectra measurements are presented in Fig. 2.
Numerical analysis.
Spectral measurements provided 95 cases, each with what means 3402 potential predictors (two types of scales times 1701 bands). The collected dataset is small from a data science point of view, especially compared to the number of estimators. Such a situation is typical for most life science projects, where the costs limit the number of cases. In the rst step, to highlight the spectra shape descriptors, data was transformed using Savitzky-Golay lter 46 . The lter removes unnecessary effects like baseline shifts and noise resulting from the non-ideal sampling process. The SG lter requires three parameters and by experiments we found that con guration: differentiation order = 2, polynomial order = 2, and window size = 5 provides the best results. Data after SG transformation are input features for all classi ers. The entire procedure is summarized in Fig. 3.
Typical machine learning procedures require dividing the collected data into at least two sets: the training -used to train the model, and the testing set -used for an independent evaluation of the model accuracy. With a small number of cases and many features, the role of features in the trained model may vary signi cantly for various training and testing sets. The consequence of a small number of cases is the model's potential over tting to the provided data 47 . In our work, we attempt to reduce over tting and detect essential predictors by two independent methods. The rst is fully stochastic both at the sampling and training level and works with a complete spectrum. The second is fully deterministic, and it rst selects a limited subset of potentially useful spectral features and then discriminates species by an exhaustive search inside this subset only. Both approaches have their own disadvantages, especially on small datasets. Stochastic approach, despite its utility may result in random, non-optimal con guration, di cult to reproduce. Deterministic approach, through dimensionality reduction processes may make it di cult to nd less obvious but more optimal solutions. For this reason, the species discrimination process will be controlled by two independent algorithms.
The rst discriminating procedure uses Random Forest which is a stochastic algorithm and its results may change between each recurrence. Moreover, the standard procedure of machine learning training requires to hold part of the cases for the testing set. It means that the model is trained based on 60-80% of the entire dataset. Machine learning models gain their highest performance when the distributions of selected predictors (features) in the training and validation sets are close to each other. With large datasets this is not the case, but when the dataset is small, such an assumption is di cult to ful ll. It means that there is a risk that depending on the composition of the training set, the selected predictors will not be relevant for the cases included in the testing set. It is mainly related to high e ciency on the training set and low e ciency on the test set.
To determine the risk of over tting, we applied a bootstrapping procedure, including 300 iterations. If such over tting risk is high, we can expect that the testing set's performance will vary from low to high between iterations. If such a risk is low, each iteration shall return with similar performance. Moreover, if the trained model will select the same predictors at each iteration, we can expect that correct and incorrect classi cation will include the same cases each time. At each iteration, the entire dataset is randomly divided into the training set including 70% of cases and the testing set with remaining 30%. It means that we have 300 different training sets assessed by 300 different testing sets and each case had a chance to be included in the testing set 60 times on average. It is a large enough number to assess the stability of the process of discrimination. The RF algorithm also provides "feature importance" a Gini index describing how each variable decreases the impurity of RF internal splits. The index varies between 0 and 1, where 0 denotes that the variable cannot increase the purity, while 1 means that variable allows to split the dataset into pure subgroups.
In the second procedure we rst attempted to de ne a group of the most signi cant spectrum bands for the distinction of species. As "the most important", we de ne a limited, possibly small number of predictors that successfully separate the species under investigation. First, all bands were sorted in descending importance order. We used the value of two-sample Kolmogorov-Smirnov (K-S) test as a variable importance index 48 . The usefulness of this statistic to feature selection results from the absence of the assumption on the form of the distributions of compared sets. For each wavelength D-statistic was calculated using Eq. (1): where: F dc and F ds empirical distribution of D. chrysitis and D. stenochrysis subsets, and sup is supremum function i.e. the choice of the largest operand value. D-statistic is 0 when two distributions fully overlap, values between 0 and 1 when partially overlap, and 1 if two distributions are completely disjoint. All wavelengths were sorted from the highest to lowest D-statistics, it means from the most to least separating features. Next, we searched for the minimal combination of predictors providing perfect separation between species. To evaluate the separation we used an accuracy of Linear Discriminant Analysis (LDA), which is considered, as one of the best tools to nd a minimal effective combination of spectral features 49 .

Software.
We used the R programming language 50 for the analysis; the Savitzky-Golay lter from prospectr 51 package to transform spectral curves; the ranger 52 package to train the random forest models; linear discrimination from the MASS 53 package.

Results
Stochastic discriminant analysis.
In this procedure, all spectra, both of golden iridescent and brown fragments of the wing surfaces, were used simultaneously for classi cation. The classi cation achieved average compliance with the species classi cation made by entomologists at the level of 99% ± 1%. 91 individuals out of 95 were correctly classi ed during each of 300 testing iterations, and the remaining four specimens with over 98% correctness. It demonstrates the high stability of models trained against different training sets. Four cases where a match with expert classi cation was less than a hundred percent was observed. For D. chrysitis it was two cases, one recognized with weak and one with very strong con dence. For D. stenochrysis it was similar, one recognized with weak and one with strong con dence. This indicates that the expert self-con dence does not translate into the performance of RF classi ers because expert analysis and our method use a different set of characteristics and thus spectrometric properties are more stable than phenotypical features.
We also calculated the Kendall's rank correlation between expert con dence (Supplementary Table S1 show that only D-statistics larger than 0.4 allow to accept the null hypothesis that distributions inside species for the given band are different. This condition was true for no more than 500 bands. Only nine bands, all measured from the glass scales, have D-statistic greater than 0.7, with a maximum slightly exceeding 0.82 for band 1378 nm. Twenty wavelengths with highest discriminating potential are also listed in Table 1. It is worth pointing out that 10 best wavelengths indicated by D-statistics are the same as 10 best resulting from the stochastic procedure, but in slightly different order. All these 10 wavelengths were measured from glass scales. Table 1 List of 20 best wavelengths indicated by Random Forest feature importance and Kolmogorov-Smirnov Dstatistics. There is no single spectral feature which fully distinguishes species. It means that we must search for a minimal number of bands from the entire spectrum whose linear combination allows such distinction. The discrimination process was undertaken by sorting features, starting from the most important. We found that linear combination of the rst nine most important wavelengths (all from glass scales) perfectly differentiate species in LDA space, thus limiting the number of possible band combinations (Fig. 5). We also tested the potential of spectra derived from the brown scales only. The discriminatory potential of the individual wavelengths is smaller compared to those sensed from glass scales, but their linear combination already shows similar performance to the bands from brown scales (Fig. 5).

Method
Minimal discriminatory combination.
In the last step of the analysis, we searched for the smallest possible group of wavelengths for each of the scales separately, which allow for full separation of both species. It should be emphasized that this is not an performance of LDA classi er measured on an independent testing set but only a group of bands for which there is a hyperplane in the multidimensional space completely separating D. chrysitis and D. stenochrysis in the collected dataset. There exist such four independent subsets for glass scales -one subset with three and three subsets with four wavelengths, and two subsets for brown scales -one with four and one subset with ve wavelengths. All are presented in Table 2. Other, higher dimensional con gurations are just a supersets of those mentioned above. In that way we can indicate the most important wavelengths -for glass scales there are: 1378, 1767, 716, 1385, 1117, 1318, nm and for brown: 1637, 1367, 894, 1942, 716, 1767 nm. Band 1535 nm is omitted from the brown scale list because it does not contribute to any subset that fully separates species. The isolation of the wavelengths of greatest diagnostic importance prompts us to insight into the biophysical and biochemical reasons for their relevance. The above thesis becomes all the more important because both the stochastic and deterministic analysis showed the existence of the same 10 most important bands for glass scales. Moreover, the LDA revealed the signi cance of two wavelengths (716, 1767 nm) in both brown and glass scales. Table 2 The minimal combination of wavelengths to achieve full discrimination of species in LDA space. *Denote the D-statistic which is equivalent for a single band. The wavelengths marked in bold appear for the rst time in the given subset.

Discussion
Previous studies on the spectral features of wings in moths of the genus Diachrysia focused on explaining the phenomenon of golden iridescence in D. chrysitis 54,55 . This arises as a result of interference, scattering and absorption in the structures of the glass scales. Based on our results we can apply the same for D. stenochrysis, whose wings have not been the subject of spectral studies so far. It should be noted that in the visible range there is a large variation in the iridescent color formed on colorless scales in the population of both species, from pale blue to pale copper. These color forms of moth wings were distinguished at the beginning of the 20th century 56 . Using the re ected light spectrum recorded in the VIS-SWIR range, we have collected other, potentially more useful information than that provided by the visible range. The light has a relatively low ability to penetrate the inner wing structures under a layer of scales and the radiation is strongly re ected. Based on that, we recognized that the spectrum recorded by the spectrometer was shaped by the scales structures on the upper side of the wing.
Our approach to species identi cation based on spectral characteristics of spotty-de ned areas of the wing and machine learning is original and new. So far, microscopic spectra studies have focused on the identi cation of phenomena and comparative characteristics of wing scales 24,36 . The analysis of the spectra is so complicated that even an trained expert can have problems distinguishing between these species without using chemometric methods. In our case, machine learning was crucial for decoding the information written in the form of the re ectance spectrum and to isolate information of signi cant importance for the species determination of Diachrysia siblings. Comprehensive comparison between expert classi cation and iteratively trained stochastic RF classi ers revealed that models very rarely incorrectly assign labels to validation data, regardless of the changes of the training sets composition. This indicates that the diagnostic features favored by RF repeats in all individuals of the collection. It was also con rmed by Kolmogorov-Smirnov D-statistics which indicated the same group of predictors. This means that with the optimal size and quality of the training set, spectral measurements coupled with machine learning can be an effective tool not only for mass species differentiation but also can provide insight into the reasons for this variation.
Since the identi cation of insects based on chemometric analysis is a new issue, an applicable spectral measurements library has not yet been developed. The only source of knowledge about the relationship between spectral bands and the properties of scales are previous studies. However, in-depth study of Lepidoptera wing scales that would explain the phenomena behind the meaning of the spectral features does not yet exist. In the developed model, the SWIR range turned out to be the most important in terms of species diagnostics. This is also seen in Fig. 6 where the visual differences between species spectra are clearly visible. In the optimal model we have three or four unique bands subsets for glass scales and four or ve unique subsets for brown (Table 2). Few bands are of particular importance: 1767, 716 nm and located in close proximity 1378/1367 nm appear in unique sets, both in glass and brown scales. Those three bands suggest the presence of features important for the separation of moth species and independent of the type of scales. The remaining bands are important mainly for one type of scales. The amplitude of the spectra near these bands is small, indicating small differences between the two species.
The skeleton of the butter y's scales is made of a composite material. There are three main components of Lepidoptera wing scales: chitin, proteins and melanin 57 . The overall characteristic of the spectrum of Diachrysia scales, especially glass scales, is very similar to the general spectrum of chitin presented by Apetroaei et al. 58 . Glass scales are colorless and melanin-deprived 59 . Thus, chitin in glass scales seems to be the most important factor differentiating the siblings Diachrysia. And although the light re ected from the surface covered with glass scales may carry a weak signal coming from the brown ground scales below. We should also note that in the range of the spectrum we analyze, the protein components of the scales are not identi ed, although they may mask other signals. There are studies on the insects cuticular chitin spectral characteristics and its role in insect detection 28,60,61 . Two signi cant wavelengths for glass scales: 1767 and 716 nm were repeated for brown scales, so we claim to be chitin dependent.
Based on Schröder-Turk et al. 62 we assume that one of the factors differentiating Diachrysia siblings may therefore be qualitative composition and the chiral structure of the chitin molecules. However, the main differentiating factor for both Diachrysia is ultrastructure of the scales, which signi cantly in uences the shape of re ectance spectrum between butter y species 36,38 . Studies on structural color analysis proved that the chitinous structure undoubtedly shapes the re ectance spectra which can be estimated from it 63,64 . Azofeifa et al. 64 observed that with the increasing complexity of the structures of the analyzed chitin material, the simulated spectra showed weakening of the wave pattern. Thus, not only the overall shape of the curve but also the subtle changes in monotonicity of the curve depend on the complexity of chitin layers. Figure 6 presents location of the selected wavelengths indicated in "Minimal discriminatory combination" section of "Results" against medians of the raw spectra. All values correspond to the location of changes in monotonicity of the spectral curves. This means that wavelengths relate to subtle physicochemical differences between scales of D. chrysitis and D. stenochrysis, rather than the qualitatively identi ed presence or absence of selected chemical components.
Recently, scale ultrastructure can be described numerically on the basis of scanning electron microscope studies as shown by Day et al. 37 Fig. S1 and S2). Perforated and pigmented brown scales scatter and absorb light more intensively than non-perforated 66 . Thus melanin, being one of the biochemical determinants of the formation of scale ultrastructure, may therefore indirectly in uence its differentiation between siblings moths. Although we cannot clearly explain the role of all the indicated wavelengths, we consider that the features denoted as important for species separation relate to the scale ultrastructure and its major components, chitin and melanin. Moreover, it should be emphasized that two wavelengths: 1767 and 716 nm, were identi ed in both types of scales, which minimizes the risk that the indicated wavelengths are random.

Conclusions
Biologists dealing with systematics and evolution of Lepidoptera are constantly looking for new sources of characteristics and methods to distinguish taxa. Our approach is the opposite of expert analysis, where the rst step is to identify important features and then make a decision as to species a liation. The novelty of our approach is that the combination of re ectance spectroscopy and machine learning has been used to separate hardly distinguishable species: D. chrysitis and D. stenochrysis. We used microscopic spectroscopy to obtain pure spectra of two distinct groups of scales on the forewing. Applied methods of species discrimination proved a decent accuracy and time savings, especially compared to the expert approach. The advantage of the RF is a simplicity of procedure and no preliminary assumptions about the importance and distribution of individual predictors. Especially the latter means that none of the essential features will be omitted in the decision-making process. The feature selection procedure allows indicating a relatively small set (between three to six) of predictors with the highest discriminatory potential.
Presented paper proved that spectral investigation of scales can be good and low-cost discrimination of selected moth species. Spectral differences between Diachrysia siblings are mainly the result of ultrastructure of scales in uenced by chitin and melanin, the nature of which is not achievable for spectral studies. The differences are subtle and quantitative, nonetheless statistically signi cant, and allow, using just several predictors, to fully discriminate these two species. The role of chitin in the discrimination of siblings Diachrysia is highlighted by the wavelengths separated for melanin-deprived glass scales. Two of these wavelengths of 1767 and 716 nm appear in unique sets, both in glass and brown scales. Since chitin-melanin cuticular skeleton occurs in Lepidoptera scales and are generally common in the world of insects, the proposed method may be useful and effective for the determination of other species of Lepidoptera.

Declarations
Data availability The input data (legislative data and spectral measurements) with a description, and the results of this study are available in the GitHub repository (https://github.com/kadyb/diachrysia-classi cation) under the MIT license.

Code availability
The fully functional and reproducible codes have been published publicly in the GitHub repository (https://github.com/kadyb/diachrysia-classi cation) under the MIT license.