Scientists have developed an AI-powered modeling method that mimics how birds produce songs, similar to how large language models like ChatGPT generate human sentences.
By training on birdsong recordings, the model successfully recreates sequences of Bengalese finch songs, revealing important insights into how birds structure their vocalizations.
The research, published in The Journal of Neuroscience, could enhance understanding of the neurobiology behind language processing and how humans and animals form complex vocal patterns.
Both birds and humans arrange their vocal expressions in structured ways. In human speech, words follow specific grammatical rules, while birds sing in organized syllabic patterns.
“Although much simpler, the sequences of a bird’s song syllables are organized in a similar way to human language, so birds provide a good model to explore the neurobiology of language,” said Dezhe Jin, an associate professor of physics at Penn State’s Eberly College of Science and lead author of the study.
One key aspect of both birdsong and human language is context dependence – where previous sounds influence what comes next. In English, for example, the phrase “flies like” can have different meanings depending on what follows.
For instance, while “time flies like an arrow” is a statement about time, “fruit flies like bananas” is one about insects and food. If we mix these up – “time flies like bananas” – the meaning becomes nonsensical.
The same principle applies to birds: the order of their syllables is not random but follows specific patterns that make their songs recognizable.
“We know from our previous work that the songs of Bengalese finches also have context dependence,” Jin said. “In this study, we developed a new statistical method to better quantify context dependence in individual birds and start to understand how it is wired in the brain.”
The research team analyzed previously recorded songs from six Bengalese finches, each of which sings 7 to 15 syllables per sequence. Their AI model, similar to large language models, learns the probability of different syllables appearing in sequences – akin to how ChatGPT predicts words based on past text.
The model is built using Markov models, a method that maps sequences of events by tracking what comes next based on past occurrences. Imagine a flowchart where each syllable leads to a range of possible next syllables, with probabilities assigned to each transition.
“Basic Markov models are quite simple, but they tend to overgeneralize, meaning they might result in sequences that don’t actually exist,” Jin explained.
“Here, we used a specific type of model called a Partially Observable Markov Model that allowed us to incorporate context dependence, adding more connections to what syllables typically go together. The added complexity allows for more accurate models.”
Using their approach, the team generated a range of possible models for each bird’s song, gradually refining them to ensure they only produced sequences that actually exist in the bird’s natural repertoire.
The final models accurately reflected each bird’s real singing patterns while identifying which syllables were context-dependent.
The results showed that all six birds displayed context dependence in their songs, meaning they carefully structured their syllable sequences rather than singing randomly. However, the degree of context dependence varied between individuals.
“This could be due to several factors, including aspects of the birds’ brains,” Jin explained. “Or, because these songs are learned, this could be related to the amount of context dependence in their tutor’s songs.”
To explore this further, the researchers studied birds that could not hear their own songs. The results were striking. These birds showed a dramatic reduction in context-dependent syllables, suggesting that auditory feedback is crucial in shaping how birds refine their songs.
This finding supports the idea that birds listen to themselves and adjust their vocal patterns based on what they hear, much like how humans refine speech and grammar while learning a language.
“The birds are listening to themselves and adjusting their song based on what they hear, and the related machinery in the brain likely plays a role in context dependence,” Jin said.
“In the future, we would also like to map neuron states to specific syllables. Our study suggests that, even when a bird is singing the same syllable, different sets of neurons might be active.”
The researchers emphasize that this new AI-based modeling technique could be applied not just to birdsong, but to other animal vocalizations and behavioral patterns.
“We actually used this method with the English language and were able to generate text that is mostly grammatical,” Jin said. “Of course, we’re not trying to create a new generative language model, but it is interesting that the same kind of model can handle both birdsong and human language.”
This suggests that the neural mechanisms behind birdsong and human language may share deeper similarities than previously thought.
Some philosophers argue that human grammar is a uniquely complex feature of cognition, but this study raises questions about how truly unique human language is if birds follow similar rules.
This research opens up new possibilities for understanding how brains – both bird and human – encode language and communication. In the future, scientists hope to map neuronal activity during song production to see which brain circuits control context dependence.
The experts would also like to apply AI models to other vocal species – such as whales, dolphins, and primates – to identify common patterns in communication.
Furthermore, the goal is to develop better speech-related AI systems based on how real biological brains organize vocalization sequences.
Ultimately, this work bridges the gap between neuroscience, AI, and linguistics, offering new clues about how brains generate structured communication across species.
As AI technology continues to evolve, models trained on birdsongs could help us better understand how our own brains process language and adapt speech over time.
—–
Like what you read? Subscribe to our newsletter for engaging articles, exclusive content, and the latest updates.
Check us out on EarthSnap, a free app brought to you by Eric Ralls and Earth.com.
—–