NEWS FEATURE
11 July 2023

AI tools are designing entirely new proteins that could transform medicine

Digital art techniques can now devise custom, working biomolecules on demand.

Ewen Callaway

Ewen Callaway

View author publications

You can also search for this author in PubMed Google Scholar

You have full access to this article via your institution.

Animated sequence of a RF diffusion generated protein that binds to parathyroid hormone (pink). — An artificial-intelligence tool called RFdiffusion designed a protein that binds to the parathyroid hormone, shown in pink.Credit: Ian C. Haydon/UW Institute for Protein Design

“OK. Here we go.” David Juergens, a computational chemist at the University of Washington (UW) in Seattle, is about to design a protein that, in 3-billion-plus years of tinkering, evolution has never produced.

On a video call, Juergens opens a cloud-based version of an artificial intelligence (AI) tool he helped to develop, called RFdiffusion. This neural network, and others like it, are helping to bring the creation of custom proteins — until recently a highly technical and often unsuccessful pursuit — to mainstream science.

These proteins could form the basis for vaccines, therapeutics and biomaterials. “It’s been a completely transformative moment,” says Gevorg Grigoryan, the co-founder and chief technical officer of Generate Biomedicines in Somerville, Massachusetts, a biotechnology company applying protein design to drug development.

The tools are inspired by AI software that synthesizes realistic images, such as the Midjourney software that, this year, was famously used to produce a viral image of Pope Francis wearing a designer white puffer jacket. A similar conceptual approach, researchers have found, can churn out realistic protein shapes to criteria that designers specify — meaning, for instance, that it’s possible to speedily draw up new proteins that should bind tightly to another biomolecule. And early experiments show that when researchers manufacture these proteins, a useful fraction do perform as the software suggests.

The tools have revolutionized the process of designing proteins in the past year, researchers say. “It is an explosion in capabilities,” says Mohammed AlQuraishi, a computational biologist at Columbia University in New York City, whose team has developed one such tool for protein design. “You can now create designs that have sought-after qualities.”

“You’re building a protein structure customized for a problem,” says David Baker, a computational biophysicist at UW whose group, which includes Juergens, developed RFdiffusion. The team released the software in March 2023, and a paper describing the neural network appears this week in Nature¹. (A preprint version was released in late 2022, at around the same time that several other teams, including AlQuraishi’s² and Grigoryan’s³, reported similar neural networks).

For the first time, protein designers now have the kinds of reproducible and robust tools around which a new industry can be created, Grigoryan adds. “The next challenge becomes, what do you do with it?”

Grand designs

Juergens inputs a few specifications for the protein he wants into a web form resembling an online tax calculator. It must be 100 amino acids long and form a symmetrical two-protein complex called a homodimer. Many cell receptors adopt this configuration, and a new homodimer could be a synthetic cell-signalling molecule, chimes in Joe Watson, a UW computational biochemist who co-developed RFdiffusion, and is also on the video call. But this morning’s design isn’t meant to do anything except resemble a realistic protein.

Researchers have struggled for decades to build new proteins. At first, they tried to cobble together useful parts of existing proteins, such as a pocket of an enzyme in which a chemical reaction is catalysed. This approach relied on understanding how proteins fold up and work, as well as intuition and a lot of trial and error. Scientists sometimes screened thousands of designs to identify one that worked as hoped.

A light-bulb moment came with AlphaFold (developed by the London-based AI firm DeepMind, now Google DeepMind) and other AI-based models that could accurately predict protein structures from amino-acid sequences, says Baker. Designers realized that these neural networks, trained on real protein sequences and structures, could also help to create proteins from scratch.

Scientists are using AI to dream up revolutionary new proteins

In the past few years, Baker’s team and others in the field have released a slew of AI-based protein-design tools. One approach these tools use, called hallucination, involves creating a random string of amino acids that is then optimized by AlphaFold, or a similar tool called RoseTTAFold, until it resembles something that the neural network suggests is likely to fold into a specific structure. Another, called inpainting, takes a specified snippet of a protein sequence or structure and builds the rest of the molecule around it using RoseTTAFold.

But these tools are far from perfect. Experiments tended to show that structures designed by hallucination methods didn’t always form well-folded proteins when they were made in the laboratory, and ended up as gunk at the bottom of a test tube, for instance. Hallucination methods also struggled to make anything but small proteins (although other researchers showed, in a February preprint, how the technique could be used to design longer molecules⁴). Inpainting also did a poor job of forming proteins when given shorter snippets. Even when the approach did produce a theoretical protein structure, it wasn’t able to come up with diverse solutions to a problem that would increase the odds of success.

That is where RFdiffusion and similar protein-designing AIs, released in recent months, come in. They are based on the same principles as neural networks that generate realistic images, such as Stable Diffusion, DALL-E and Midjourney. These ‘diffusion’ networks are trained on data, be they images or protein structures, which are then made progressively noisier, eventually bearing no resemblance to the starting image or structure. The network then learns to ‘denoise’ the data, performing the task in reverse.

Networks such as RFdiffusion are trained on tens of thousands of real protein structures stored in a repository called the Protein Data Bank (PDB). When the network makes a new protein, it begins with total noise: a random assortment of amino acids. “You’re asking what is the protein that gave rise to the noise,” explains Watson. After rounds of denoising, it produces something resembling a real — but new — protein.

When Baker’s team tested RFdiffusion without providing any guidance except the length of the protein, the network generated diverse, realistic-looking proteins, different from anything it had been trained on in the PDB.

But the researchers are also able to direct the program to make proteins according to specific design constraints during the denoising process, a process called conditioning.

For instance, Baker’s team conditioned RFdiffusion to make proteins that include a specific fold, or that can nestle against the surface of another molecule (an interaction that underlies binding). Grigoryan’s team even developed a diffusion network called Chroma and then conditioned it to make proteins shaped to resemble the 26 capital letters used in English, as well the Arabic numerals³.

A series of proteins generated to resemble each letter in the English alphabet. — Proteins designed by AI to resemble letters in the English alphabet.Credit: John Ingraham, Wujie Wang, Max Baranov, Gevorg Grigoryan

Signal from noise

Juergens’ computer screen initially shows noise, the random assortment of amino acids that the AI system starts with. They are represented as red, smudgy squiggles that resemble a toddler’s fingerpainting. They morph, frame by frame, into ever-more-complex shapes, with protein-like features such as tight spirals known as α-helices and ribbony shapes that double back on themselves, called β-sheets. “It’s a nice mixed alpha–beta topology,” says Juergens, smiling as he admires a creation that took only a few minutes to make. “This is looking good.”

The tool has gained widespread use in Baker’s laboratory. “The design process is almost unrecognizable compared to a year ago,” he says. The neural network has excelled in design challenges that have been inefficient, difficult or impossible using other approaches.

In one analysis reported in their study¹, the researchers started with a snippet from another protein, such as a portion of a viral protein recognized by immune cells, and tasked AI-based tools with churning out 100 different new proteins, to see how many would incorporate the desired motif. The team carried out this challenge for 25 different initial shapes. The results didn’t always incorporate the starting snippet, but RFdiffusion produced at least one protein that did for 23 of the motifs, compared with 15 for hallucination and 12 for inpainting.

‘The entire protein universe’: AI predicts shape of nearly every known protein

RFdiffusion has also proved adept at making proteins that self-assemble into complex nanoparticles that might be able to deliver drugs or vaccine components. Previous AI approaches⁵ can also make these kinds of protein, but Watson says RFdiffusion’s designs are much more sophisticated.

Neural networks such as RFdiffusion seem to really shine when tasked with designing proteins that can stick to another specified protein. Baker’s team has used the network to create proteins that bind strongly to proteins implicated in cancers, autoimmune diseases and other conditions. One as-yet unpublished success, he says, was to design strong binders for a hard-to-target immune-signalling molecule called the tumour necrosis factor receptor — the target for antibody drugs that generate billions of dollars in revenue each year. “It is broadening the space of proteins we can make binders to and make meaningful therapies” for, Watson says.

Real-world testing

Baker’s team is cranking out so many designs that testing whether they work as intended has become a serious bottleneck. “One machine-learning person can generate enough designs to keep 100 biologists busy for months,” says Kevin Yang, a biomedical machine-learning researcher at Microsoft Research in Cambridge, Massachusetts whose team has developed its own diffusion-based protein design tool⁶.

But early signs suggest that RFdiffusion’s creations are the real deal. In another challenge described in their study, Baker’s team tasked the tool with designing proteins containing a key stretch of p53, a signalling molecule that is overactive in many cancers (and a sought-after drug target). When the researchers made 95 of the software’s designs (by engineering bacteria to express the proteins), more than half maintained p53’s ability to bind to its natural target, MDM2. The best designs did so around 1,000 times more strongly than did natural p53. When the researchers attempted this task with hallucination, the designs — although predicted to work — did not pan out in the test tube, says Watson.

Overall, Baker says his team has found that 10–20% of RFdiffusion’s designs bind to their intended target strongly enough to be useful, compared with less than 1% for earlier, pre-AI methods. (Previous machine-learning approaches were not able to reliably design binders, Watson says). Biochemist Matthias Gloegl, a colleague at UW, says that lately he has been hitting success rates approaching 50%, which means it can take just a week or two to come up with working designs, as opposed to months. “It’s really insane,” he says.

Sequential images showing the development of two protein assemblies designed with 'diffusion'-based AI art generators. — A funnel-shaped protein assembly (top) and a ring-like structure with six protein chains (bottom), designed from noise using diffusion-based AI art generators.Credit: Ian C. Haydon/UW Institute for Protein Design

The cloud-based version of RFdiffusion had around 100 users each day by late June, according to Sergey Ovchinnikov, an evolutionary biologist at Harvard University in Cambridge, Massachusetts. Joel Mackay, a biochemist at the University of Sydney in Australia, has been dabbling with RFdiffusion to design proteins capable of binding to other proteins that his lab studies, which include molecules called transcription factors that control gene activity in cells. He found the design process simple, and used computer modelling to validate that, in theory, the proteins should bind to the transcription factors.

Mackay is now testing whether the proteins can alter gene expression as intended when they are produced in cells. He has his fingers crossed, because such a finding would amount to a simple way to switch specific transcription factors on and off within cells, instead of using drugs that can take years to identify, if they can be discovered at all. “If this method works reliably for our types of proteins, it would be a total game-changer,” he says.

Enjoying our latest content?
Login or create an account to continue

Access the most recent journalism from Nature's award-winning team
Explore the latest features & opinion covering groundbreaking research

Access through your institution

Continue with Google

Continue with ORCiD

Nature 619, 236-238 (2023)

doi: https://doi.org/10.1038/d41586-023-02227-y

References

Watson, J. L. et al. Nature https://doi.org/10.1038/s41586-023-06415-8 (2023).
Article Google Scholar
Lin, Y. & AlQuraishi, M. Preprint at https://arxiv.org/abs/2301.12485 (2023).
Ingraham, J. et al. Preprint at bioRxiv https://doi.org/10.1101/2022.12.01.518682 (2022).
Frank, C. et al. Preprint at bioRxiv https://doi.org/10.1101/2023.02.24.529906 (2023).
Wicky, B. I. M. et al. Science 378, 56–61 (2022).
Article PubMed Google Scholar
Wu, K. E. Preprint at https://arxiv.org/abs/2209.15611 (2022).

Download references

Reprints and permissions

Subjects

Latest on:

Stem cells coaxed into most advanced amniotic sacs ever grown in the lab

News 15 MAY 25

Targeting the SHOC2–RAS interaction in RAS-mutant cancers

Article 07 MAY 25

Molecular basis of SIFI activity in the integrated stress response

Article 06 MAY 25

Where do proteins go in cells? Next-generation methods map the molecules’ hidden lives

Technology Feature 07 APR 25

Mass-spectrometry-based proteomics: from single cells to clinical applications

Review Article 26 FEB 25

‘Dark proteins’ hiding in our cells could hold clues to cancer and other diseases

News Feature 29 JAN 25

Can AI help us talk to dolphins? The race is now on

News 16 MAY 25

AI language models develop social norms like groups of people

News 15 MAY 25

Is it OK for AI to write science papers? Nature survey shows researchers are split

News Feature 14 MAY 25

Jobs

Postdoctoral Fellow in Bioinformatics

Postdoctoral research position in bioinformatics focused on studying the somatic genetic mechanisms of neurological diseases.

Longwood Medical Area, Boston, Massachusetts

Brigham and Women's Hospital/Harvard Medical School
HORIZON MSCA COFUND - 13 Fully Funded PhD Positions in Chronic Inflammatory Disease Research: MAP-ID

13 fully funded PhD positions within the MAP-ID (Multilevel Approaches to Understanding Chronic Inflammatory Diseases)

Kraków (PL)

Jagiellonian University
Recruitment of Overseas Outstanding Young Scholars

State Key Laboratory of Materials-Oriented Chemical Engineering in the field of materials-oriented chemical engineering.

Nanjing, Jiangsu, China

State Key Laboratory of Materials-Oriented Chemical Engineering, Nanjing Tech University
Tenure Track Assistant Professor toward Associate Professor in Digital pharmacy

UNIL is a leading international teaching and research institution, with over 5,000 employees and 17,000 students split between its Dorigny campus, ...

Lausanne, Canton of Vaud (CH)

University of Lausanne (UNIL)
Bioinformatician (m/f/d, PhD)

The Institute for Infection Prevention and Control (IPC, Head Prof. Dr. Philipp Henneke) is looking for a Bioinformatician (m/f/d, PhD).

Freiburg im Breisgau, Baden-Württemberg (DE)

University Hospital Freiburg

AI tools are designing entirely new proteins that could transform medicine

Grand designs

Signal from noise

Real-world testing

Enjoying our latest content?
Login or create an account to continue

References

Subjects

Latest on:

Jobs

Postdoctoral Fellow in Bioinformatics

HORIZON MSCA COFUND - 13 Fully Funded PhD Positions in Chronic Inflammatory Disease Research: MAP-ID

Recruitment of Overseas Outstanding Young Scholars

Tenure Track Assistant Professor toward Associate Professor in Digital pharmacy

Bioinformatician (m/f/d, PhD)

Search

Quick links

Grand designs

Signal from noise

Real-world testing

Enjoying our latest content? Login or create an account to continue

References

Related Articles

Subjects

Latest on:

Jobs

Postdoctoral Fellow in Bioinformatics

HORIZON MSCA COFUND - 13 Fully Funded PhD Positions in Chronic Inflammatory Disease Research: MAP-ID

Recruitment of Overseas Outstanding Young Scholars

Tenure Track Assistant Professor toward Associate Professor in Digital pharmacy

Bioinformatician (m/f/d, PhD)

Search

Quick links

Enjoying our latest content?
Login or create an account to continue