Despite intense interest in the results produced by advanced, deep-learning ML methods, these systems remain quite difficult to query in order to understand exactly what aspects of the patterns they are used to analyse (e.g., organismal morphologies) are being keyed upon to produce the (often amazing) group identification/discrimination results of which they are capable. The standard response most computer scientists and data analysts will give when asked about how advanced ML algorithms are able to provide the outstanding group identification results they clearly are capable of providing tends to be some variant, or combination, of “It’s complex”, “It doesn’t matter so long as the correct answer is produced reliably.”, and/or “You just have to trust the machine.”. In many instances, this range of responses is genuinely the best that can be provided, at least at present. Interpretation of ML results is a very active area of current computer-science research and much progress has been made. At the heart of this difficult issue are the number of components used to construct the ML system (e.g., the largest neural net systems at present are composed of literally tens of billions of artificial neurons each of which plays a role in all class assignment decisions, see ), the design complexity of the algorithms employed in advanced ML applications such as CNNs, and the non-linear mathematical spaces in which many of these systems operate.
Permutation feature importance (PFI) , saliency maps , local interpretable model-agnostic explanations (LIME) , Shapley values [83,84], scoped rules or anchors [85,86] and neural-backed decision trees (NBDT)  are among the current procedures available to evaluate the feature-based targets of inter-group discriminations of images. However, all of these approaches were designed originally to undertake evaluations of numerical datasets. When they are applied to large morphological datasets — and especially to image datasets — they test the contributions made by relatively large (and often somewhat randomly defined) segments of the data/images under consideration. While these approaches may be sufficient to identify the general regions in which group-diagnostic features are located, to date they have not been able to provide levels of spatial detail commensurate with those described routinely in taxonomic diagnoses. This is not to say that the ML algorithms themselves are incapable of basic discriminations at levels commensurate with (or even much finer than) those of taxonomic experts; only that, owing to the level of inter-pixel connectivity required to represent a diagnostic feature, coupled with spatial uncertainty as to where the boundaries of such features might lie, the relative sizes of the regions used by these and other algorithms in their attempts to identify group-diagnostic morphological or image features are, generally speaking, too large to be of much use to taxonomists (see [88,89] for examples). These approaches can be useful for ensuring group-diagnostic image features belong to parts of the image that pertain directly to the specimens being imaged (e.g., as opposed to some aspect of the background). But beyond this it appears we must await further developments in the field of ML interpretation before such algorithms can make a substantial contribution to revealing the morphological features they are sensing in order deliver their superior group-identification capabilities. One way to this interpretability issue might be addressed while we await further technical developments in this area is to employ a “middle range” strategy that bases exploratory morphometric investigations of the direct analysis of images using the linear, eigenanalysis-based procedures that lay at the heart of multivariate analysis in general and GM in particular (e.g., PCA, canonical variates analysis or CVA, see [34,44,63,64].
Damm et al.  reconstructed ancestral character states for Trithemis landscape and water-body habitat characters using the stochastic analysis method of Bollback . While it was not possible to infer all ancestral character states with complete certainty, these results were able to confirm that the transition from inhabiting open landscapes to forested landscapes, and from hunting above temporary/standing waters to flowing waters, arose multiple times in this clade. Our results, coupled with those of Damm et al.  suggest that, on each of those occasions, morphological changes in the phenotype were much more extensive than had been realized previously, involving not only those characteristics important for species identification, but also features – such as wing morphology – whose innate complexity had prevented their detailed ecomorphological analysis to date. In addition, our results suggest that, despite the independent origins of individual species (e.g., T. persephone, T. purinata) and sublineages (T. grouti-T. stictica-T. nuptialis-T. aenea-T. aequalis), common sets of morphological modifications characterize species that occupy both ancestral and derived ecological zones. Of course, it may also be the case that each of these derived ecological radiations also incorporates morphological variations unique to that species and/or to that radiation. The testing of this hypothesis must await more detailed analysis and, most likely, the collection of larger samples. Nonetheless, any species-specific or radiation-specific morphological traits in wing morphology were not of sufficient scope and/or consistency to obscure the common similarities within, and differences between, the wing forms that characterize these ecological guilds.
In 2013 Outomuro et al.  published an independent analysis of Trithemis ecomorphology that employed an approach to the analysis of wing morphology that differed from the ones we chose. That study concluded that (1) forewing and hindwing shape exhibited a statistically significant “phylogenetic structure”, (2) no significant association existed between wing shape and water body habitat, (3) male forewings and female hindwings differed with regard to the contrast between open and forest-dwelling species with the latter exhibiting characteristically broader wings, (4) hindwings of both sexes with greater coloration exhibited a broader base with this effect being more pronounced in males and (5) wing shape exhibited sexual dimorphism across all species considered. Obviously, the results obtained in our investigation contradict those of Outomuro et al.  in many respects even allowing for the different species compositions and foci of the two studies. Quite aside from the issue of what relations between wing morphology and ecological habitat actually characterize Trithemis, the different conclusions reached by these two analyses highlight the critical roles data collection and data-analysis strategy play in determining the results of all morphometric investigations.
The Outomuro et al.  study employed 32 species of which 25 (78%) were also included in our species list. Only single representatives of male and female forms for each species were employed and three of the species considered by Outomuro et al.  lacked females. In contrast, our dataset included multiple representatives of each species (see Table 1). The Outomuro et al.  investigation quantified wing morphology using 11 “biologically constant” forewing landmark points with a single semilandmark used to quantify “wing curvature” along the posterior mid-wing margin. For the hindwings seven landmarks along the boundary outline were used to quantify wing form along the anterior margin (including two marking the position of the pterostigma) and distal half of the posterior margin. Owing to the lack of landmarks along the proximal half of the posterior wing margin, a somewhat arbitrary set of five semilandmarks were used. These were defined by an inter-landmark angular spacing of 9°, 18°, 36°, 63° and 90° from the chord joining the anterior wing attachment landmark to the nodus landmark (see , Fig. 1) In practice, this sampling scheme, which is reminiscent of that used in radial Fourier analysis, has the deficiency of the geometric resolution it offers being dependent on the length of the chord joining the specified semilandmarks to the wing nodus, which defined the angular vertex . In any event, only these 11 (forewings) and 12 (hindwings) landmark/semilandmark points located along the wing periphery were used to quantify wing form. In contrast, our GM-style wing sampling scheme employed six landmarks (five of which were also employed by Outomuro et al. ), 25 semilandmarks (with constant inter-semilandmark spacing within landmark-defined wing-periphery regions) and seven landmarks located at topologically homologous positions in the wing interior. In addition to these data we also employed direct representation of the total wing morphology in the form of digital images. Accordingly, both our landmark-semilandmark and digital-image datasets incorporated more information related to wing morphology than the Outomuro et al.  study.
In terms of data analysis-strategy, Outomuro et al.  employed the method described by Klingenberg and Gidaszewski  to assess the strength of the phylogenetic signal on their wing landmark-semilandmark data whereas we employed the multivariate extension of the K-statistic method described by Adams (, see also ). The Klingenberg-Gidaszewski method fits phenotypic data to the tree using squared change parsimony and then obtains an estimate of the phylogenetic signal by summing the squared trait changes across all branches. Under this scenario the smaller the sum the greater the conformance with phylogeny.
However, as Adams  points out, this Klingenberg-Gidaszewski method (i.) relies on ancestral-state reconstruction which usually involves high levels of uncertainty  and (ii.) is unsuitable for use in evaluating phenotypic traits owing the the fact that geometric scaling is not taken into consideration and changes systematically as trait variation among species and/or with the number of traits increases. The Klingenberg-Gidaszewski method also incorporates a matrix inversion that limits its utility with datasets composed of a large number of phenotypical characteristics and a comparatively small number of species. Adams’  multivariate K-statistic approach circumvents these limitations and is designed specifically for use with high-dimensional phenotypic data.
Finally, in terms of statistical testing, Outomuro et al.  relied on standard parametric tests whose accuracies rely on assumptions regarding the form of data distributions and equivalence of variances among variables, all of which are rarely met by morphometric data. In contrast, our study employed bootstrapping and jackknife variants of standard statistical and data-analysis tests to ensure the results of our hypothesis tests were robust to violations of distributional assumptions.
With regard to the test of phylogenetic signal strength in Trithemis forewing and hindwing morphometric data, our observed Kmult-statistic value fell well within the range of values expected from 1000 random permutations of the pruned Damm et al.  ultrametric tree, suggesting that our more completely realized representations of species-specific Trithemis wing morphology have a very low, and statistically non-significant, ratio of the range of phenotypic variation observed in both forewing and hindwing morphology and the range expected under a Brownian motion model. Indeed, for the Trithemis forewing dataset the observed Kmult value trends toward that which would be considered significant statistically for exhibiting a less than expected range of phenotypic covariation. These forewing and hindwing results are supported further by our calculation of the PC-based phylomorphospaces for both Trithemis species and their reconstructed ancestors based on the pruned Damm et al.  ultrametric tree. As illustrated by Adams , phylomorphospaces calculated from morphometric data that exhibit a strong covariance with phylogeny exhibit an organization in which closely-related species, along with their hypothetical ancestors, are grouped together in different regions of the Procrustes PC space with few tree branches that cross one another. Our Trithemis phylomorphospace results (Fig. 6) exhibit an overall pattern of morphospace distribution that is the opposite of this expectation, with closely related species projecting to positions in vastly different parts of the space and many crossed tree branches. Moreover, given the constraint (observed in our results) that estimated ancestral wing morphologies tend to occupy regions closer to the origin of the Procrustes PC space than the terminal taxa, the range of shape variation displayed by the former group is such that it is difficult for us to imagine any configuration of inferred hypothetical ancestral wing shapes that would be consistent with the expectations of a strongly supported phylogenetic covariation pattern. This ordination geometry indicates that our results, and our interpretations, are robust to the imprecision of ancestral node inferences, as noted by Losos .
The finding that many biological datasets do not exhibit significant patterns of phylogenetic covariation is well established – including for morphometric datasets – and can arise for many different reasons (see  and references therein). Notwithstanding the results reported by Outomuro et al. , Trithemis forewing and hindwing morphology appears to fall into this broad category. Our phylogenetic-signal results are also consistent with our finding of substantial ecomorphological covariation in Trithemis forewing and hindwing morphology as well as being somewhat inconsistent with the more limited findings in this area reported by Outomuro et al.  given the fact that the mappings of both landscape and water body characteristics of these species are distributed across the available Trithemis cladogram.
With regard to our finding of statistically significant wing shape differences for both Trithemis open versus forested landscape habitat guilds, and for temporary/standing versus running water body habitat guilds, the fact that our results differ from those reported by Outomuro et al.  may be the result of either a single-factor, or interactions between multiple-factor, differences in the two investigations. Owing to our failure to document a significant phylogenetic signal in either our Trithemis forewing or hindwing datasets, we did not follow Outomuro et al.  and subject any of our datasets to phylogenetic least-squares “correction”. This operation removes substantial amounts of information from the data and can only be justified when there is a clear data analysis-based concern with overall data independence. It may be that Outomuro et al.  were misled in their interpretation of the degree of phylogenetic signal present in their data by their use of the Klingenberg-Gidaszewski phylogenetic signal test and so performed a data-standardization operation where none was actually required. Possibly this explains their, rather unusual, finding of significant differences among forest and open landscape-dwelling species for males, but not females. Sexual dimorphism was not a target of our study, but our dataset was, on the whole balanced in terms of the representation of male and female morphologies (44% females, 38% males, 14% uncertain) with all species being represented by individuals from both sexes. Accordingly, our findings of significant wing-morphology differences between landscape and water body groups imply that these differences pertain equally to both males and females, which is the more usual and expected pattern.
Alternatively, for our GM-style dataset, the difference between our habitat group findings and those of Outomuro et al.  may be a simple function of the degree to which, and the manner in which, we sampled the wing morphologies. In the Outomuro et al.  sampling scheme the spatially densest information was collected from the wing apex region and spatially sparsest from the anterior margin. Our sampling scheme achieved a much more even coverage of all parts of the peripheral outline. More importantly, the extended eigenshape sampling protocol we used to determine how the wing periphery should be sampled automatically places more semilandmark points in those regions that exhibit the greatest shape variation across the dataset as a whole. This had the effect of weighting our morphometric analysis toward those regions that exhibit the greatest amount of shape variation, thus ensuring appropriate advantage is taken of the information contained in those regions. No equivalent effort to focus the landmark-semilandmark data collected from Trithemis wings was used in the Outomuro et al.  study.
Our summary of the distribution of shape differences among the different habitat groups (Fig. 8) indicated that the wing periphery regions with the largest landmark/semilandmark displacements differed for the forewings and hindwings. In the case of the former the largest sampled-point displacements occurred mid-wing along the posterior, or trailing, margin, along the distal anterior margin, and along the proximal posterior margin, especially very close to the posterior wing attachment. While the same patterns are present in both the landscape and water body habitat-contrast results, displacements are much more pronounced for the landscape contrast. Yet, these regions were very weakly and unevenly sampled by the Outomuro et al.  wing morphology sampling scheme. Similarly, in the case of the hindwings, the largest displacements occurred along the proximal posterior margin, especially close to the point of maximum wing-periphery curvature (= the prominent proximate posterior “corner”), followed by the proximate anterior wing margin and the posterior mid-wing margin. As with the forewings, landmark-semilandmark displacements are much more pronounced for the landscape contrast. But again, these are areas where the Outomuro et al.  scheme obtained few, and unevenly distributed, samples of morphological variation.
At this point it is worth noting (again) that, in calling attention to the discrepancies between our findings and those of Outomuro et al. , and in attempting to offer explanations for those discrepancies, we are in no way leveling any criticisms at the authors of that study or at their interpretations. Indeed, we are perfectly happy to state that the results and interpretations offered by Outomuro et al.  are correct, accurate, and fully justified for the data they obtained and data-analysis results they produced. Rather, the points we wish to make is that (1) representations of biological morphologies, especially those sampled by sparse sets of landmark-semilandmark points, cannot, should not, and must not be mistaken for the morphologies themselves and (2) different data-analysis procedures differ in the assumptions they make about the data that have been collected, the mathematical models applied to those data, and the power those models have to reveal patterns of similarities and differences within datasets; especially in the case where the point of the analysis is to compare groups defined a priori.
There are many ways to sample or represent any complex morphological structure. But once sampled, the results obtained from the analysis of those data pertain only to the samples that have been collected, not necessarily to the far more complex structure itself. Of course, all systematists and all morphometricians strive to obtain an adequate and accurate representations of the morphologies or structures they investigate. In some cases, and for some structures, this is straightforward. In others it is exceedingly difficult. If the question under examination is specific and tied intrinsically to an explicit aspect of the morphology in question (e.g., Are the forewings of male Trithemis annulata longer than those of females of the same species?) the data relevant to the hypothesis test can be obvious. But if the question under examination is non-specific and not tied intrinsically to any particular aspect of the morphology in question (e.g., Do the forewings of Trithemis species that inhabit forested landscapes differ in some way from those that inhabit open landscapes?) it often is difficult to know what to compare, what data to collect, and how to interpret the results of data-analysis procedures in terms of the original question of interest.
In attempting to address this more difficult type of question, Outomuro et al.  chose a wing morphology sampling scheme that achieved a representation of Trithemis forewing and hindwing morphology, but did so in quite an approximate manner. To some extent, their approach reflected, and was possibly encouraged by, conventions that have grown up around geometric morphometrics which prioritizes the representation of complex structures thorough the digitization of small sets of independently defined landmark points. Originally, GM even objected to the collection and use of boundary outline semilandmarks [14,94] though this prohibition has now been relaxed to some extent, largely for practical reasons (see ). But even given the belated acceptance of semilandmark points as useful means of sampling complex morphologies, few systematists would be comfortable with the proposition that the sampling scheme devised and employed by Outomuro et al.  was either an accurate, or entirely satisfactory, representation of a Trithemis dragonfly wing. That scheme quantifies some aspects of the wing morphology, but ignores the vast majority of the information available.
Any set of data can be collected from any insect wing, subjected to analysis and used to produce a result. That result is guaranteed to reflect patterns present in the data submitted for analysis. But can the interpretation of those results be extended beyond the data through which they were generated? The answer to this question must depend on the character of the result, the representativeness of the sample of individuals (drawn from a much larger population) and the fidelity with which those aspects of the morphology that were sampled truly represents the morphology in question. In precisely the same way that a biased or unrepresentative sample drawn from a population of individuals may compromise the ability of numerical data analysis to tell us anything interesting or useful about a population of interest, irrespective of the fact that a data-analysis result will always be generated, a biased or unrepresentative sample of the morphologies in question may compromise the ability of numerical data analysis to tell us anything interesting or useful about patterns of morphological variation existing in a sample, irrespective of the fact that a morphological data-analysis result will always be generated. The simple act of collecting morphological data is insufficient, by itself, to ensure a reasonable and/or accurate answer to generalized questions about any set of morphologies has been, or can be, obtained. This prescription is especially pertinent when negative hypothesis-test results are obtained as they were in the case of the Outomuro et al.  study.
Outomuro et al.  found that (1) no significant association existed between wing shape and water body habitat for Trithemis species, (2) male, but not female, Trithemis forewings differed with regard to the contrast between open and forest-dwelling species, and (3) female, but not male, Trithemis hindwings differed with regard to the contrast between open and forest-dwelling species. These conclusions were presented as though they pertained to the forewing and hindwing morphologies themselves rather than to the shape-coordinate locations of 11 forewing landmarks and 7 hindwing landmarks augmented by 5 hindwing semilandmarks, both sets of which were confined exclusively to the wing peripheral margin. We contend that the differences between these results and the results obtained by our investigation are largely the result of differences in the way the two research teams chose to quantify Trithemis wing morphology, augmented by differences in sample composition, sample size, and the manner in which the data were analyzed. As they pertain to the data used to represent Trithemis wing morphology, there is no disagreement between these two sets of results. Both are correct summaries of patterns in the data collected by each research team. But, in terms of the larger question involving Trithemis ecomorphological wing-shape variation among binary parsings of landscape and water-body guild states, we believe our results are the more fully representative because they are based on more complete assessments of Trithemis forewing and hindwing morphology and because they produced broadly consistent, as well as progressively more fully realized results, under two different data-collection regimes and two different data-analysis strategies.
As a final discussion topic it is interesting to note the implications our study has for the practice of morphometrics. Nowadays it is common to read and hear reference made to the “morphometrics revolution”, which is to say the advent of geometric morphometrics which took place more than 30 years ago [11,12,13,14,16,95] (see also [17,18,96]). That revolution was actually a synthesis between three aspects of morphometric practice that had been pursued more-or-less separately until the mid-1980s: the representation of form through the use of sparse sets of topologically corresponding landmark-points (that served and the end-definitions of linear distances originally), the alignment of these geometric point-locations through use of a least-squares Procrustes fitting algorithm as sets of deviations from the mean configuration, and the representation of patterns of morphological variation via linear multivariate analysis. While advances in addition to these did figure in the development of geometric morphometrics (e.g., centroid size, bending energy-based shape decomposition, graphic representation of shape deformation via use of thin-plate splines), and acknowledging that the GM synthesis has grown since its original formulation (e.g., admission of semilandmarks as useful morphology-sampling devices), these three core aspects are those most often used and referred to in GM investigations. This synthesis is powerful, enabling morphological analysis to be pursued quantitatively and at levels of detail, coherence and interpretability unprecedented by the formerly separate schools of morphometric practice. Owing to that power, the geometric morphometric synthesis has proven to be highly effective in addressing a wide range of problems in systematic and comparative morphology, as well as being quite popular among communities of biological, paleontological, systematics and evolutionary researchers. However, the geometric morphometric approach, like all data-analysis approaches, has its weaknesses as well as its strengths. Perhaps even more importantly the field of data analysis rarely remains static for long.
Over the last 20-25 years alternative – some might say a rival – to GM has appeared in the form of ML. Unlike GM, ML approaches were not developed by researchers whose primary interest was in the analysis of biological morphology. Nevertheless, one of the primary, and most popular, uses of ML approaches, as well as spurs to ongoing research into the general topic of ML, has been the ability of these algorithms to find previously unsuspected patterns in all sorts of data, but especially in morphological data.
In many ways, ML represents a natural complement to GM. Whereas GM was designed to operate on a specific type of morphological data (= configurations of landmark point locations), ML can be used to analyze any sort of morphological data, including configurations of landmark point locations. Thus, whereas the application of GM is limited to those situations in which forms can, reasonably, be represented by configurations of point coordinates, use of ML approaches opens the door to the consideration of a much wider range of morphological data and morphological problems. To date the overwhelming majority of GM analyses published in the biological, paleontological, systematic and evolutionary literature have been based on linear data-analysis models. However, ML approaches can be applied readily to situations in which the optimal models are non-linear, even if that is not known to the case at the outset of an investigation.
At present, ML models are inferior to their GM counterparts in terms of their ability to be queried and so used to identify which aspects of a set of morphologies are contributing disproportionately to overall sample variance, which are useful for group discrimination and/or the structure of variable correlations. To be sure, attempts have been made to improve the interpretability of ML models (see  for a review). But as we have outlined above, much more remains to be done in this area.
Inevitably some will claim that GM is their preferred option for generalized morphological data analysis, either because, despite its obvious limitations, they regard their study group(s) and research questions well-served by this approach and/or because they wish to retain a “geometric focus” in their analysis. In response we can only point out that all analyses of morphological data are “geometric” in character because morphology is composed entirely of the sizes and shapes of various parts, characters and characteristics as well as their spatial arrangements relative to other parts, characters and characteristics. From the results we have presented above it is unquestionably clear that the strictly GM approach to the analysis of Trithemis wing-shape data was the one that performed least well in finding, summarizing, testing, and assembling sets of characteristics that could be used to answer the generalized questions of whether shape variance was distributed among Trithemis landscape and water-body ecological guilds in a continuous or disjunct manner. What is also clear is that this comparative finding is not an unusual or exceptional result [44,45,67–69,98––107].