We used computational models and direct measures of song acoustics to determine the features that accurately classify vocal communication signals by species, and to identify features that correlate with relatedness. By comparing the results of multiple approaches, we developed a framework for understanding how vocal acoustics convey social information. Results showed that: 1) song syllables can be described and compared by their spectral, spectral shape, and spectrotemporal features; 2) simple spectral features map onto phylogeny; 3) complex features vary within species and overlap between species; and 4) syllables can be accurately classified by species using six acoustic features. Previous neurophysiological and behavioral studies suggest that songbirds are sensitive to complex acoustics and manipulations in the relationship between acoustic features [38, 62, 65]. Generalization of findings to different species, however, has been criticized given the limited number of species and acoustic features typically studied, and the limited consideration of interactions between acoustic features in previous research [66]. Our paper aims to provide a standardized approach for future researchers to quantify the acoustic similarities between species, identify the acoustic features that optimize species-level classification, and determine which acoustic relationships may be associated with phylogeny. This song-evolutionary framework may be used to test the role of acoustic and phylogenetic relationships on species recognition in comparative studies.
Overall, our results show that estrildid songs can be meaningfully characterized in reduced acoustic dimensions from their spectral, spectral shape, and spectrotemporal components (Fig. 3). Hypervolume analyses showed that species occupy specific regions of acoustic feature space and exhibit variable amounts of overlap with each other, representative of the acoustic similarity in their song syllable PC features (Fig. 4) [3, 67]. In comparing species relationships along these components, we found that lower overlap in the three dimensions correlates with greater differentiability of syllables by their raw acoustic features (Fig. 5d). Phylogenetic analysis showed that clustering in spectral features follows cladistics groupings: Australian species cluster in the lower range of PC1 (i.e., these species generally produce lower pitch songs), African species in the higher range, and the Bengalese finch spans between these (Fig. 3a; Supplementary Fig. S1). Clustering in PCs 2 and 3, however, does not follow cladistic grouping.
Ensemble tree procedure for feature selection and model optimization showed that fundamental frequency (minimum, maximum, and mean), mean frequency, spectral flatness, and syllable duration are the features most important for classifying song by species. Most of these features (i.e., fundamental frequency; “entropy,” but see note on spectral flatness below) have been previously established for use in calculating similarity between songs for measurements of song imitation, variability, and change over time and experimental manipulations [48]. Spectral flatness is a measure of noisiness in the frequency domain of a signal, similar to spectral entropy [68] and Wiener entropy [48]. Mean frequency has previously been found to vary along genetic lines (i.e., across species [38] and strains [59]), but not rearing conditions when cross-fostered. These findings suggest that mean frequency could be used in classification by allowing variability in song within species ranges while still preserving species identities [69]. Previous studies with estrildids have not used duration as a diagnostic feature for species identity, despite its usefulness in categorizing vocalizations in song- [70] and non-song-learning [71] birds. Estrildid studies, however, have found that duration could be used in conjunction with other features to differentiate between syllables that otherwise overlapped in measures of entropy and fundamental frequency [72]. Fundamental frequency has also been shown to be used in categorical discrimination of vocalizations in estrildids [70]. Previous studies have also attributed variations in neural [52] and behavioral [70] response to these acoustic features in songbirds. Overall, the features identified as most important in song classification in our study parallels those previously identified in acoustic discrimination and preference research. Machine learning methods may be informative in further identifying the acoustic features that elicit species-recognition and neural tuning [73].
Phylogenetic signal analysis provides a measure of the amount of trait variation that can be explained by evolutionary relatedness [37] and has been found in species-level analyses of vocalizations across taxa, including birds [74–78], anurans [79], and mammals [80]. Acoustic features that identify species from field recordings allow measurement of biodiversity, migration, and habitat use [17–20]. Our analyses detected phylogenetic signal in spectral, but not spectral shape or spectrotemporal, components. Despite song being a learned individual behavior, previous studies suggest that spectral features of song are strongly heritable and evolve in populations via natural selection. Spectrotemporal characteristics are proposed to be less constrained due to the vocal production organ (i.e., the syrinx), which may provide a ubiquitous and flexible mechanism on which processes of cultural transmission (e.g., song learning) and sexual selection impose demands for faster rates of change [3, 81, 82]. Our results match previous studies in Regulus, a songbird genus that expresses phylogenetic signal in frequency features (PC1 in our study), and for which frequency bandwidth and modulation (related to PCs 2 and 3 in our study, respectively) were previously established as learned acoustic parameters [83]. Changes in song frequency features may be constrained by pleiotropic or polygenic traits that affect vocal production organs (e.g., body size affecting fundamental and dominant frequencies through syrinx size) [78] and other morphological features implicated in sound modification (e.g., the beak) [84]. These constraints would impart evolutionary rates on song frequency changes that correlate with time and reflect phylogenetic relationships (as predicted under Brownian motion), as opposed to faster rates of change – both between and within species – typically found in song features that freely shift under cultural transmission [85]. Although significant differences exist in syringeal morphology across birds [78], song learning in oscine birds – including estrildids – has been facilitated by an adaptable capacity for vocal production by conserving a uniform syringeal morphology that can meet the demands of culturally transmitted song [81] (but see [61, 78] for discussions on constraints).
Previous electrophysiological studies have found neural tuning to spectral shape (PC2) and spectrotemporal (PC3) modulations – but not to basic frequencies (PC1) – in regions critical for song learning (e.g., caudomedial nidopallium [38], and L3 [62]. Neurophysiological studies have characterized spectrotemporal tuning in auditory neurons and its role in song learning and perception [38, 65, 86, 87]. Moore and Woolley [38] found that neural selectivity for the spectrotemporal modulations is shaped by vocal learning during development, while basic spectral tuning measured from tone receptive fields is not. This study supports the hypothesis that basic spectral features are more conserved across individuals of the same species than are complex features [3, 61]. The vocal acoustics driving PC1 were more similar between closely related species than between distantly related species (Fig. 3a; Supplementary Fig. S1), suggesting that spectral features can be used to identify species and predict relatedness. This relationship between spectral features and phylogeny may function in species recognition, without visual or direct contact between signalers and receivers [88].
Significant phylogenetic signal using the EM-Mantel procedure suggests that divergence in those features (i.e., PC1) may be accounted for by the simple “random walk” of traits as a function of the evolutionary time between species (i.e., Brownian motion). The absence of phylogenetic signal, however, does not necessarily mean that traits evolve independently of evolutionary constraints. Spectral energy distribution (PC2) and spectrotemporal (PC3) features, which did not express significant phylogenetic signal under Brownian motion, may instead be evolutionarily labile and change under alternative selection pressures and processes within species [3, 61, 81, 82], specifically those important for vocal communication (e.g., individual recognition, mate attraction, intrasexual competition, and cultural evolution) [13, 69, 78, 89]. In this case, we would expect species to cluster along PCs 2 and 3 independent of their phylogenetic relatedness, and this result can be seen in the polyphyletic clustering of the species as well as in the divergence within the Australian (and to a lesser extent the African) species along PCs 2 and 3 (Fig. 3c; Supplementary Fig. S1).
Songbird vocalizations offer a diverse system for studying the effects of learning and evolutionary processes on behavior. Communication signals allow for coding, transmittance, and interpretation of social information between senders and receivers, and behavioral and neural studies in various animal taxa support this theory [1, 13, 31, 65, 90–92]. We therefore expect acoustic differences between species’ songs to scale with differences in perceptual mechanisms and to – in turn – affect processes of behavioral response, reproductive success, and ultimately speciation in estrildid finches [3, 6, 9, 13, 30, 31, 61, 93, 94]. The machine learning and statistical models used here identified the acoustic dimensions that best characterize and distinguish species’ songs. We also found that spectral features such as fundamental frequency correlated with phylogenetic distance, while spectral shape and spectrotemporal features did not. These findings suggest that basic spectral characteristics and more complex characteristics are generated by different mechanisms. Selection may conserve spectral features that identify the species of a signaler [84], while features such as spectral shape and spectrotemporal modulations may vary across conspecific individuals and overlap between species [3]. These more complex and variable features within a species may function to aid in individual recognition and competition for mates [95–97]. Further studies on the vocal acoustics in multiple suites of closely related species will further disentangle the effects of genetics and learning on vocal communication and guide our understanding of the mechanisms driving communication production and perception. Future studies should incorporate measures of syntax (the temporal sequencing of syllables) into quantitative analyses of species difference in song acoustics.