A dolphin handler makes the signal for “together” with her hands, followed by “create”. The two trained dolphins disappear underwater, exchange sounds and then emerge, flip on to their backs and lift their tails. They have devised a new trick of their own and performed it in tandem, just as requested. “It doesn’t prove that there’s language,” says Aza Raskin. “But it certainly makes a lot of sense that, if they had access to a rich, symbolic way of communicating, that would make this task much easier.”
Raskin is the co-founder and president of Earth Species Project (ESP), a California non-profit group with a bold ambition: to decode non-human communication using a form of artificial intelligence (AI) called machine learning, and make all the knowhow publicly available, thereby deepening our connection with other living species and helping to protect them. A 1970 album of whale song galvanised the movement that led to commercial whaling being banned. What could a Google Translate for the animal kingdom spawn?
The organisation, founded in 2017 with the help of major donors such as LinkedIn co-founder Reid Hoffman, published its first scientific paper last December. The goal is to unlock communication within our lifetimes. “The end we are working towards is, can we decode animal communication, discover non-human language,” says Raskin. “Along the way and equally important is that we are developing technology that supports biologists and conservation now.”
Understanding animal vocalisations has long been the subject of human fascination and study. Various primates give alarm calls that differ according to predator; dolphins address one another with signature whistles; and some songbirds can take elements of their calls and rearrange them to communicate different messages. But most experts stop short of calling it a language, as no animal communication meets all the criteria.
Until recently, decoding has mostly relied on painstaking observation. But interest has burgeoned in applying machine learning to deal with the huge amounts of data that can now be collected by modern animal-borne sensors. “People are starting to use it,” says Elodie Briefer, an associate professor at the University of Copenhagen who studies vocal communication in mammals and birds. “But we don’t really understand yet how much we can do.”
Briefer co-developed an algorithm that analyses pig grunts to tell whether the animal is experiencing a positive or negative emotion. Another, called DeepSqueak, judges whether rodents are in a stressed state based on their ultrasonic calls. A further initiative – Project CETI (which stands for the Cetacean Translation Initiative) – plans to use machine learning to translate the communication of sperm whales.
Yet ESP says its approach is different, because it is not focused on decoding the communication of one species, but all of them. While Raskin acknowledges there will be a higher likelihood of rich, symbolic communication among social animals – for example primates, whales and dolphins – the goal is to develop tools that could be applied to the entire animal kingdom. “We’re species agnostic,” says Raskin. “The tools we develop… can work across all of biology, from worms to whales.”
* * *
The “motivating intuition” for ESP, says Raskin, is work that has shown that machine learning can be used to translate between different, sometimes distant human languages – without the need for any prior knowledge.
This process starts with the development of an algorithm to represent words in a physical space. In this many-dimensional geometric representation, the distance and direction between points (words) describes how they meaningfully relate to each other (their semantic relationship). For example, “king” has a relationship to “man” with the same distance and direction that “woman’ has to “queen”. (The mapping is not done by knowing what the words mean but by looking, for example, at how often they occur near each other.)
It was later noticed that these “shapes” are similar for different languages. And then, in 2017, two groups of researchers working independently found a technique that made it possible to achieve translation by aligning the shapes. To get from English to Urdu, align their shapes and find the point in Urdu closest to the word’s point in English. “You can translate most words decently well,” says Raskin.
ESP’s aspiration is to create these kinds of representations of animal communication – working on both individual species and many species at once – and then explore questions such as whether there is overlap with the universal human shape. We don’t know how animals experience the world, says Raskin, but there are emotions, for example grief and joy, it seems some share with us and may well communicate about with others in their species. “I don’t know which will be the more incredible – the parts where the shapes overlap and we can directly communicate or translate, or the parts where we can’t.”
He adds that animals don’t only communicate vocally. Bees, for example, let others know of a flower’s location via a “waggle dance”. There will be a need to translate across different modes of communication too.
The goal is “like going to the moon”, acknowledges Raskin, but the idea also isn’t to get there all at once. Rather, ESP’s roadmap involves solving a series of smaller problems necessary for the bigger picture to be realised. This should see the development of general tools that can help researchers trying to apply AI to unlock the secrets of species under study.
For example, ESP recently published a paper (and shared its code) on the so called “cocktail party problem” in animal communication, in which it is difficult to discern which individual in a group of the same animals is vocalising in a noisy social environment.
“To our knowledge, no one has done this end-to-end detangling [of animal sound] before,” says Raskin. The AI-based model developed by ESP, which was tried on dolphin signature whistles, macaque coo calls and bat vocalisations, worked best when the calls came from individuals that the model had been trained on; but with larger datasets it was able to disentangle mixtures of calls from animals not in the training cohort.
Another project involves using AI to generate novel animal calls, with humpback whales as a test species. The novel calls – made by splitting vocalisations into micro-phonemes (distinct units of sound lasting a hundredth of a second) and using a language model to “speak” something whale-like – can then be played back to the animals to see how they respond. If the AI can identify what makes a random change versus a semantically meaningful one, it brings us closer to meaningful communication, explains Raskin. “It is having the AI speak the language, even though we don’t know what it means yet.”
A further project aims to develop an algorithm that ascertains how many call types a species has at its command by applying self-supervised machine learning, which does not require any labelling of data by human experts to learn patterns. In an early test case, it will mine audio recordings made by a team led by Christian Rutz, a professor of biology at the University of St Andrews, to produce an inventory of the vocal repertoire of the Hawaiian crow – a species that, Rutz discovered, has the ability to make and use tools for foraging and is believed to have a significantly more complex set of vocalisations than other crow species.
Rutz is particularly excited about the project’s conservation value. The Hawaiian crow is critically endangered and only exists in captivity, where it is being bred for reintroduction to the wild. It is hoped that, by taking recordings made at different times, it will be possible to track whether the species’s call repertoire is being eroded in captivity – specific alarm calls may have been lost, for example – which could have consequences for its reintroduction; that loss might be addressed with intervention. “It could produce a step change in our ability to help these birds come back from the brink,” says Rutz, adding that detecting and classifying the calls manually would be labour intensive and error prone.
Meanwhile, another project seeks to understand automatically the functional meanings of vocalisations. It is being pursued with the laboratory of Ari Friedlaender, a professor of ocean sciences at the University of California, Santa Cruz. The lab studies how wild marine mammals, which are difficult to observe directly, behave underwater and runs one of the world’s largest tagging programmes. Small electronic “biologging” devices attached to the animals capture their location, type of motion and even what they see (the devices can incorporate video cameras). The lab also has data from strategically placed sound recorders in the ocean.
ESP aims to first apply self-supervised machine learning to the tag data to automatically gauge what an animal is doing (for example whether it is feeding, resting, travelling or socialising) and then add the audio data to see whether functional meaning can be given to calls tied to that behaviour. (Playback experiments could then be used to validate any findings, along with calls that have been decoded previously.) This technique will be applied to humpback whale data initially – the lab has tagged several animals in the same group so it is possible to see how signals are given and received. Friedlaender says he was “hitting the ceiling” in terms of what currently available tools could tease out of the data. “Our hope is that the work ESP can do will provide new insights,” he says.
* * *
But not everyone is as gung ho about the power of AI to achieve such grand aims. Robert Seyfarth is a professor emeritus of psychology at University of Pennsylvania who has studied social behaviour and vocal communication in primates in their natural habitat for more than 40 years. While he believes machine learning can be useful for some problems, such as identifying an animal’s vocal repertoire, there are other areas, including the discovery of the meaning and function of vocalisations, where he is sceptical it will add much.
The problem, he explains, is that while many animals can have sophisticated, complex societies, they have a much smaller repertoire of sounds than humans. The result is that the exact same sound can be used to mean different things in different contexts and it is only by studying the context – who the individual calling is, how are they related to others, where they fall in the hierarchy, who they have interacted with – that meaning can hope to be established. “I just think these AI methods are insufficient,” says Seyfarth. “You’ve got to go out there and watch the animals.”
There is also doubt about the concept – that the shape of animal communication will overlap in a meaningful way with human communication. Applying computer-based analyses to human language, with which we are so intimately familiar, is one thing, says Seyfarth. But it can be “quite different” doing it to other species. “It is an exciting idea, but it is a big stretch,” says Kevin Coffey, a neuroscientist at the University of Washington who co-created the DeepSqueak algorithm.
Raskin acknowledges that AI alone may not be enough to unlock communication with other species. But he refers to research that has shown many species communicate in ways “more complex than humans have ever imagined”. The stumbling blocks have been our ability to gather sufficient data and analyse it at scale, and our own limited perception. “These are the tools that let us take off the human glasses and understand entire communication systems,” he says.