Meta Alum Unveils Groundbreaking AI Biology Model Simulating 500 Million Years of Evolution

As the potential of GPT-4o advances, EvolutionaryScale, an AI research lab founded by former Meta engineers from the now-disbanded protein-folding team, is venturing into a groundbreaking area: making biology programmable.

Despite being a year old, the company is already making significant strides. Today, it unveiled ESM3, a multimodal generative language model capable of following prompts and designing novel proteins. In tests, ESM3 successfully generated a new green fluorescent protein (esmGFP), an achievement that would typically require hundreds of millions of years of evolution.

Revolutionizing Protein Design

The generated esmGFP exhibits a sequence only 58% similar to the closest known fluorescent protein, with the company estimating that this innovation simulates over 500 million years of natural protein diversification.

In conjunction with ESM3's launch, EvolutionaryScale has raised $142 million in a seed funding round led by notable investors such as Nat Friedman, Daniel Gross, and Lux Capital. Amazon and Nvidia's venture capital arm also contributed. The smallest model has been open-sourced to accelerate research in this pioneering field.

The Challenge Ahead

Creating ESM3 is merely the first step; its real-world impact remains to be fully explored.

EvolutionaryScale aims to harness the power of generative AI models to decode the fundamental language of life, focusing on core biological molecules—RNA, proteins, and DNA—that have evolved over 3.5 billion years. By programming biology and designing new molecules, the company hopes to tackle significant challenges such as climate change, plastic pollution, and disease, including cancer.

Competitive Landscape

Numerous organizations, including Google DeepMind and Isomorphic Labs, are also developing similar technologies. Founded in 2023, EvolutionaryScale has developed several protein language models, culminating in ESM3, which stands out due to its size and capabilities.

ESM3 has been trained on a colossal dataset—1 trillion teraflops of computing power across 2.78 billion natural proteins and 771 billion unique tokens. This advanced model can reason across three essential biological properties of proteins: sequence, structure, and function. Users can input partial data across these tracks, and ESM3 generates predictions for all, ultimately creating novel proteins.

Enhanced Control for Scientists

“ESM3’s multimodal reasoning empowers scientists to engineer new proteins with exceptional control. For example, it can integrate structure, sequence, and function to propose scaffolds for enzymes like PETase, which breaks down plastic waste,” the company stated.

In one instance, ESM3 was utilized to design a novel version of a green fluorescent protein, enabling scientists to visualize specific proteins within cells. Remarkably, the generated protein matches the brightness of natural fluorescent variants and would have taken evolution 500 million years to develop.

An Adaptive Model

The ESM3 model also features self-improvement capabilities, allowing it to refine its outputs based on feedback from laboratory experiments or existing data.

Availability and Future Applications

Currently, ESM3 is available in three sizes: small, medium, and large. The smallest model, with 1.4 billion parameters, is open-sourced on GitHub under a non-commercial license, while medium and large versions (up to 98 billion parameters) are accessible for commercial use through EvolutionaryScale's API and partnerships with Nvidia and AWS.

EvolutionaryScale aims for this technology to address global challenges and enhance human health. Its most promising applications may lie in the pharmaceutical sector, where companies can leverage ESM3 to develop innovative treatments for life-threatening conditions. Previous models from EvolutionaryScale have already shown success in enhancing antibody characteristics and detecting COVID-19 variants, underscoring the potential impact of this groundbreaking AI in biology.

Most people like

Find AI tools in YBX