AI model creates new protein that simulates 500 million years of biological evolution
01-25-2025

AI model creates new protein that simulates 500 million years of biological evolution

Artificial intelligence (AI) experts are abuzz about a new multimodal generative language model that may change how we think about proteins.

Scientists have spent years searching for better ways to predict and design these tiny building blocks of life.

Yet, progress always felt like tiptoeing around a vast ocean. Now, a tool known as ESM3 has arrived. It is generating a fresh wave of excitement for anyone interested in uncovering uncharted protein varieties.

ESM3 and protein science

ESM3 can learn from protein sequences, structural information, and functional annotations at the same time, which broadens what it can “understand.” Previous models mostly looked at sequences alone.

This more robust approach gave the developers a chance to encode intricate biological knowledge in a single model with an impressive 98 billion parameters.

Thomas Hayes from EvolutionaryScale Forge and colleagues used this model to produce a fluorescent protein with a genetic sequence that is far removed from anything seen before.

Leaping through time

“We found a bright fluorescent protein at a far distance (58% sequence identity) from known fluorescent proteins, which we estimate is equivalent to simulating five hundred million years of evolution,” said Thomas Hayes, who led the study.

This method explores sequence space far beyond the scope of typical lab experiments. While labs traditionally direct mutations step by step, ESM3 can jump to regions of protein space that look like they arose eons ago.

AI analyzing billions of proteins

The training data for ESM3 is unusually large. Its developers compiled 771 billion unique tokens from around 3.15 billion protein sequences, 236 million three-dimensional structures, and 539 million annotated functions.

This variety helps the model see patterns that might otherwise remain hidden. It also offers a view of how different proteins fit together and carry out biological tasks.

This depth of insight has led to speculation that ESM3 may help us fill gaps in our knowledge of how proteins fold and what structural elements matter most.

Why fluorescent proteins matter

Fluorescent proteins light up under specific conditions, making them indispensable in microscopy and related imaging. They help scientists see inside cells and observe processes that used to be invisible.

Many labs rely on natural fluorescent proteins, often modified through traditional protein engineering. AI-driven methods like ESM3 may speed up that work, offering faster alternatives for researchers who want to customize these proteins or discover entirely new ones.

Some investigators predict that such AI tools will be a boon for projects in which large, complex data sets are needed for rational protein design.

AI-designed proteins and health

The same reasoning that leads ESM3 to create an unexpected fluorescent protein can apply elsewhere. AI models can suggest ways to make enzymes more stable in challenging environments or build proteins that capture and break down pollutants.

Drugs might also be developed more quickly by zeroing in on promising protein targets. In many cases, success hinges on spotting crucial patterns across countless variations.

That is where a wide-training model like ESM3 stands out. Researchers say it is poised to support innovation in medicine and environmental technology. 

ESM3 AI: New way to program proteins

ESM3’s release in public beta means any curious scientist can start to experiment with an API. There is also a browser-based interface that invites users to explore protein options without diving into code.

This broader access may spark unexpected collaborations. People working on renewable energy, biomedical devices, or synthetic biology might all find themselves using this AI to craft proteins that were never seen in nature.

Despite the excitement, experts remind us that real-world protein applications need careful validation. AI models like ESM3 may propose structures that look good on paper but require detailed lab checks.

Still, many researchers argue that these tools will save time and money, fueling experiments that push beyond the usual boundaries of what proteins can do. The high capacity of ESM3 and similar models suggests that this field will keep growing.

What happens next?

ESM3’s method of handling sequence, structure, and function as a unified puzzle has opened doors that were locked by the limitations of older technology. This is more than a clever trick.

It hints at future AI-driven pipelines where scientists quickly generate ideas for new proteins and test them with robotic labs. The approach points to a time when evolving proteins by design is as natural as editing text on a screen.

It may be a sign that we have only scratched the surface of what AI can accomplish in biology.

The study is published in Science.

—–

Like what you read? Subscribe to our newsletter for engaging articles, exclusive content, and the latest updates. 

Check us out on EarthSnap, a free app brought to you by Eric Ralls and Earth.com.

—–

News coming your way
The biggest news about our planet delivered to you each day
Subscribe