Alphabet-owned DeepMind is renowned for its AI that triumphed over a top Go player, but it has unveiled another significant advancement: AlphaFold 2. This AI has demonstrated remarkable accuracy in predicting how proteins fold, with some predictions competing closely with experimental data.
For nearly 50 years, understanding protein folding has posed a challenge. As Professor John Moult, CASP chair and co-founder, noted, “To see DeepMind produce a solution for this long-standing problem is a very special moment.” Researchers and enthusiasts have responded with excitement, suggesting that AlphaFold may have cracked the "protein folding problem." But what does this mean, and how will it impact us?
To grasp its significance, we must first understand proteins. These essential biomolecules perform various tasks in the body, starting as chains of amino acids. The process of folding into complex structures—helices and sheets—enables proteins to carry out functions like oxygen transport and bone reinforcement. Consequently, accurately predicting a protein’s folded structure based solely on its amino acid sequence has been a vital pursuit in biology.
The Critical Assessment of Protein Structure Prediction (CASP) has been a significant player in this field since 1994. This biannual competition brings together global teams aimed at solving the protein folding challenge using computational methods. Researchers analyze selected target proteins, making predictions based on experimental data. The submissions are evaluated by experts, fostering advancements in the field.
Recent years have seen a surge in computational power and machine learning applications, allowing for rapid progress. DeepMind’s AlphaFold 2 was trained on approximately 170,000 known protein structures and numerous sequences with unknown 3D forms. This foundation resembles that of the original AlphaFold, which excelled in CASP 13. However, AlphaFold 2 introduced vital modifications to its machine learning techniques and leveraged approximately 128 of Google’s cloud-based TPUv3 cores, enabling it to predict protein structures in days—sometimes even hours.
Although AlphaFold 2 has made significant strides, challenges remain. While many of its predictions are highly accurate, some did not meet the threshold deemed competitive with experimental results. For example, AlphaFold achieved a median score of 87.0 GDT for particularly difficult protein targets, just below the 90 GDT benchmark set by CASP co-founder Moult. This indicates that while the protein folding problem is not yet fully resolved, DeepMind is closer than many anticipated.
As DeepMind continues its research, the potential applications of accurate protein predictions could unfold. There are hints of progress in sustainability and drug design, although specific details are yet to be revealed. Structural biologist Janet Thornton expressed hope that this improved accuracy might illuminate the functions of thousands of unsolved proteins in the human body. Regardless, the influx of new protein structure data presents exciting opportunities for further exploration in the field—a development worth celebrating, even as we await its practical implications.