KOBV Portal

In: PLOS Computational Biology, Public Library of Science (PLoS), Vol. 19, No. 4 ( 2023-4-13), p. e1010325-

Abstract: Despite the accumulation of data and studies, deciphering animal vocal communication remains challenging. In most cases, researchers must deal with the sparse recordings composing Small, Unbalanced, Noisy, but Genuine (SUNG) datasets. SUNG datasets are characterized by a limited number of recordings, most often noisy, and unbalanced in number between the individuals or categories of vocalizations. SUNG datasets therefore offer a valuable but inevitably distorted vision of communication systems. Adopting the best practices in their analysis is essential to effectively extract the available information and draw reliable conclusions. Here we show that the most recent advances in machine learning applied to a SUNG dataset succeed in unraveling the complex vocal repertoire of the bonobo, and we propose a workflow that can be effective with other animal species. We implement acoustic parameterization in three feature spaces and run a Supervised Uniform Manifold Approximation and Projection (S-UMAP) to evaluate how call types and individual signatures cluster in the bonobo acoustic space. We then implement three classification algorithms (Support Vector Machine, xgboost, neural networks) and their combination to explore the structure and variability of bonobo calls, as well as the robustness of the individual signature they encode. We underscore how classification performance is affected by the feature set and identify the most informative features. In addition, we highlight the need to address data leakage in the evaluation of classification performance to avoid misleading interpretations. Our results lead to identifying several practical approaches that are generalizable to any other animal communication system. To improve the reliability and replicability of vocal communication studies with SUNG datasets, we thus recommend: i) comparing several acoustic parameterizations; ii) visualizing the dataset with supervised UMAP to examine the species acoustic space; iii) adopting Support Vector Machines as the baseline classification approach; iv) explicitly evaluating data leakage and possibly implementing a mitigation strategy.

Type of Medium: Online Resource

ISSN: 1553-7358

URL: Article

DOI: 10.1371/journal.pcbi.1010325

DOI: 10.1371/journal.pcbi.1010325.g001

DOI: 10.1371/journal.pcbi.1010325.g002

DOI: 10.1371/journal.pcbi.1010325.g003

DOI: 10.1371/journal.pcbi.1010325.g004

DOI: 10.1371/journal.pcbi.1010325.g005

DOI: 10.1371/journal.pcbi.1010325.g006

DOI: 10.1371/journal.pcbi.1010325.g007

DOI: 10.1371/journal.pcbi.1010325.g008

DOI: 10.1371/journal.pcbi.1010325.g009

DOI: 10.1371/journal.pcbi.1010325.g010

DOI: 10.1371/journal.pcbi.1010325.g011

DOI: 10.1371/journal.pcbi.1010325.g012

DOI: 10.1371/journal.pcbi.1010325.g013

DOI: 10.1371/journal.pcbi.1010325.g014

DOI: 10.1371/journal.pcbi.1010325.g015

DOI: 10.1371/journal.pcbi.1010325.t001

DOI: 10.1371/journal.pcbi.1010325.t002

DOI: 10.1371/journal.pcbi.1010325.t003

DOI: 10.1371/journal.pcbi.1010325.s001

DOI: 10.1371/journal.pcbi.1010325.s002

DOI: 10.1371/journal.pcbi.1010325.s003

DOI: 10.1371/journal.pcbi.1010325.s004

DOI: 10.1371/journal.pcbi.1010325.s005

DOI: 10.1371/journal.pcbi.1010325.s006

DOI: 10.1371/journal.pcbi.1010325.s007

DOI: 10.1371/journal.pcbi.1010325.s008

DOI: 10.1371/journal.pcbi.1010325.s009

DOI: 10.1371/journal.pcbi.1010325.s010

DOI: 10.1371/journal.pcbi.1010325.r001

DOI: 10.1371/journal.pcbi.1010325.r002

DOI: 10.1371/journal.pcbi.1010325.r003

DOI: 10.1371/journal.pcbi.1010325.r004

DOI: 10.1371/journal.pcbi.1010325.r005

DOI: 10.1371/journal.pcbi.1010325.r006

Language: English

Publisher: Public Library of Science (PLoS)

Publication Date: 2023

detail.hit.zdb_id: 2193340-6

Kooperativer Bibliotheksverbund

Berlin Brandenburg