MIT/CETI Team Uses Machine Learning to Isolate Sperm Whale Pronunciation Alphabet! Highly Similar to Human Language System, With Greater Information Carrying Capacity!

In marine ecology research, bioacoustics is an important way for people to obtain information about marine organisms.As the name suggests, bioacoustics mainly studies the generation, propagation and reception of animal sounds.With the development of technology, researchers are now able to decode animal sounds to understand their species, gender, individual identification or health status.
However, traditional bioacoustics requires a lot of manpower to process and analyze field recordings when conducting population monitoring, which is time-consuming and costly. AI breakthroughs in sound recognition offer an ideal solution to this challenge. Machine learning has already made great strides in the field of bioacoustics, thanks to its automated processing and self-learning capabilities.
Nowadays, machine learning analysis of marine vocalizations has been maturely applied. Among all marine creatures, whales, dolphins and other cetaceans have complex social and cooperative behavior characteristics, which are highly similar to human society and have extremely high research value.
Among them, sperm whales have become the focus of research because of their language system that is highly similar to that of human society.
As a highly social mammal, sperm whales live in families and have a complex social structure.In order to make group decisions, they communicate most of the time by making continuous "click" sounds, which may last as short as 10 seconds or more than half an hour. Although their communication system seems simple, it can achieve a series of complex coordinated behaviors. The contrast between the two has become a "puzzle" that researchers want to solve. A large number of previous studies have proved that the vocalizations of sperm whales are complex, but the specific characteristics and structure of their codas are still unknown.
To this end, Pratyusha Sharma of MIT and researchers from CETI used machine learning to analyze recordings of sperm whales.It was confirmed that the sounds made by sperm whales are structured and formed by a combination of different features. The sperm whale pronunciation alphabet was also separated through machine learning technology, and it was found that its language expression system is highly similar to that of humans and has stronger information carrying capacity.
The related research was published in Nature Communications under the title "Contextual and combinatorial structure in sperm whale vocalisations".
Research highlights:
* This study used data from the Dominica Sperm Whale Project (DSWP), the largest sperm whale database available, to analyze 8,719 coda records from approximately 60 different sperm whales from the Eastern Caribbean sperm whale community and defined a "sperm whale phonetic alphabet."
* Sperm whale language is combinatorial, meaning it can combine and modulate different clicks and rhythms to create complex vocalizations that are highly similar to human language

Paper address:
https://www.nature.com/articles/s41467-024-47221-8
The open source project "awesome-ai4s" brings together more than 100 AI4S paper interpretations and provides massive data sets and tools:
https://github.com/hyperai/awesome-ai4s
Dataset: large amount of data and long time span
The dataset used in this study comes from the Dominica Sperm Whale Project (DSWP), which is currently the largest repository of sperm whale data.The researchers used recordings from 60 different sperm whales from the Eastern Caribbean Sperm Whale Clade 1 (EC-1) in their analysis, which included a total of 8,719 wake data.
It is worth mentioning that the dataset not only contains manually annotated tail data from various platforms and recording systems between 2005 and 2018; it also includes data recorded from sensors (DTags) attached to sperm whales between 2014 and 2018.
Sperm whale wake has rich combination features
In order to clearly observe the changes and long-term trends of sperm whales' tail sounds during communication, researchers used visualization to describe these sounds, as shown in the following figure: Figure A shows the tail sound diagram of two whales' communication within 2 minutes in the DSWP dataset, and the tail sounds emitted by the whales are represented by blue and orange respectively.

Next, the researchers projected these codas onto a time-time plot to observe changes in the sperm whale's codas within 2 minutes. As shown in Figures B and C, the horizontal axis represents the time since the sperm whale began communicating, and the vertical axis represents the time since the coda began. In Figure C, the researchers also connected the matching clicks between adjacent codas. It can be seen that during the communication process, the coda changes smoothly over the duration, and additional clicks appear, revealing complex, contextual changes in the coda structure, indicating that sperm whales have a greater information carrying capacity than previously reported in studies.
Previously, it was thought that sperm whales had 21 separate types of tail sounds.Different coda types are composed of two context-independent features (Tempo and Rhythm) and two context-dependent features (Rubato and Ornamentation).
As shown in the figure below, the researchers named the tail sound characteristics distributed in a limited set of patterns within the duration as tempo. The left figure reveals that the total duration of the sperm whale tail sound is the sum of its click intervals; the right figure shows the changes in tail sounds of different rhythm types.

In Figure B, the researchers normalized the ICI vector by total duration to obtain a coda representation that is independent of duration and named it Rhythm.

In Figure C, the researchers called the sperm whale's slow adjustment of the duration of a series of wakes a rubato, and noted that the rubato is gradual, that is, adjacent wakes in sperm whale communication are closer in duration than similar wakes elsewhere.

In Figure D, the researchers defined the last click in the sperm whale's wake as an ornamentation. Ornamentations are not randomly distributed, but appear at specific locations in a longer communication.
The study found that (1) in the call sequence of a single whale, the proportion of the ornament sound sequence appearing at the beginning of the call sequence is significantly higher than that of the sequence without ornament sound; (2) the proportion of the ornament sound sequence appearing at the end of the call sequence is also significantly higher than that of the sequence without ornament sound.

The researchers note that all four of these features can be sensed and acted upon by whales engaging in vocal communication, so they constitute a conscious part of the whale's communication system.Rhythm, meter, tremolo, and ornamentation can be combined freely, allowing whales to systematically synthesize a large number of distinguishable codas.
Research results: Sperm whale pronunciation alphabet highly similar to human language repertoire
Through the above visual analysis,The researchers used machine learning to isolate the sperm whale's pronunciation alphabet, which is highly similar to the human language library.As shown in the following figure:

The horizontal axis represents the rhythm type of the coda, the vertical axis represents the rhythm type of the coda, and the color of each cell represents the number of times the rhythm/rhythm combination appears in the DSWP dataset. The pie chart in each cell provides information about the degree to which tremolo and ornaments are used in combination in the coda for each feature combination: the pie chart on the left shows the proportion of codas with tremolo and without tremolo, while the pie chart on the right shows the proportion of all ornaments in the feature combination.
The researchers noted that although not all tail features were combined,However, the sperm whale wake has a rich combination structure with discrete and continuous parameters, of which at least 143 combinations frequently appear in combination in the wake, far exceeding the 21 discrete wake types previously identified.
Project CETI: Dedicated to using machine learning to enable cross-species conversations
The CETI organization, which is collaborating with MIT this time, has a high voice in the study of sperm whale wakes. CETI is a nonprofit organization that applies advanced machine learning and robotics technologies to listen to and translate sperm whale communication.The organization was founded in 2020 with the aim of effectively protecting sperm whale populations by understanding and translating their communication system.
The CETI team is composed of world-leading artificial intelligence and natural language processing experts, cryptographers, linguists, marine biologists, roboticists, and underwater acousticians from various universities. The team's focus is mainly on the Dominican Republic in the Eastern Caribbean, and all research and findings will be open source.
In addition to the sperm whale pronunciation alphabet mentioned above, the team has many other studies on sperm whale vocalizations.
On August 29, 2019, CETI published a research result titled "Deep Machine Learning Techniques for the Detection and Classification of Sperm Whale Bioacoustics" in Scientific Reports.We demonstrate the feasibility of applying machine learning (ML) techniques to sperm whale bioacoustics and establish the effectiveness of building neural networks to learn meaningful representations of whale vocalizations.
Paper address:
https://www.nature.com/articles/s41598-019-48909-4
On June 17, 2022, CETI published "Toward understanding the communication in sperm whales" in IScience, focusing on the recording and analysis methods of sperm whale communication, including the following key steps:
Recording: A large-scale longitudinal multimodal dataset of whale communication and behavioral data collected from a variety of sensors;
Processing: Coordinating and processing multi-sensor data;
Decoding: Using machine learning techniques, we create models of whale communication, characterize its structure, and link it to behavior;
Encode and Replay: Conduct interactive playback experiments and refine the whale language model.

Paper address:
https://www.sciencedirect.com/science/article/pii/S2589004222006642
December 4, 2023CETI used machine learning to discover vowels and diphthongs in sperm whale codas, and that both codas can appear in different traditional coda types..
On March 24, 2024, the team of researchers discovered that sperm whales make a series of impulsive, "click-like" clicking sounds when sailing underwater, and named it echolocation clicks. They also detected the existence of sperm whale echolocation clicks in a noisy environment.
As a mammal with highly developed intelligence, the language system of sperm whales has been proven to be highly similar to that of humans. In an era of rapid development of machine learning technology, more and more professionals are joining the sperm whale vocalization research project. As the research continues to deepen, human-whale dialogue is expected to become a reality.
References:
1.https://www.projectceti.org/news-research-insights#publications