Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain
the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in
Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles
and JavaScript.
An artificial-intelligence tool called RFdiffusion designed a protein that binds to the parathyroid hormone, shown in pink.Credit: Ian C. Haydon/UW Institute for Protein Design
“OK. Here we go.” David Juergens, a computational chemist at the University of Washington (UW) in Seattle, is about to design a protein that, in 3-billion-plus years of tinkering, evolution has never produced.
On a video call, Juergens opens a cloud-based version of an artificial intelligence (AI) tool he helped to develop, called RFdiffusion. This neural network, and others like it, are helping to bring the creation of custom proteins — until recently a highly technical and often unsuccessful pursuit — to mainstream science.
These proteins could form the basis for vaccines, therapeutics and biomaterials. “It’s been a completely transformative moment,” says Gevorg Grigoryan, the co-founder and chief technical officer of Generate Biomedicines in Somerville, Massachusetts, a biotechnology company applying protein design to drug development.
The tools are inspired by AI software that synthesizes realistic images, such as the Midjourney software that, this year, was famously used to produce a viral image of Pope Francis wearing a designer white puffer jacket. A similar conceptual approach, researchers have found, can churn out realistic protein shapes to criteria that designers specify — meaning, for instance, that it’s possible to speedily draw up new proteins that should bind tightly to another biomolecule. And early experiments show that when researchers manufacture these proteins, a useful fraction do perform as the software suggests.
The tools have revolutionized the process of designing proteins in the past year, researchers say. “It is an explosion in capabilities,” says Mohammed AlQuraishi, a computational biologist at Columbia University in New York City, whose team has developed one such tool for protein design. “You can now create designs that have sought-after qualities.”
“You’re building a protein structure customized for a problem,” says David Baker, a computational biophysicist at UW whose group, which includes Juergens, developed RFdiffusion. The team released the software in March 2023, and a paper describing the neural network appears this week in Nature1. (A preprint version was released in late 2022, at around the same time that several other teams, including AlQuraishi’s2 and Grigoryan’s3, reported similar neural networks).
For the first time, protein designers now have the kinds of reproducible and robust tools around which a new industry can be created, Grigoryan adds. “The next challenge becomes, what do you do with it?”
Grand designs
Juergens inputs a few specifications for the protein he wants into a web form resembling an online tax calculator. It must be 100 amino acids long and form a symmetrical two-protein complex called a homodimer. Many cell receptors adopt this configuration, and a new homodimer could be a synthetic cell-signalling molecule, chimes in Joe Watson, a UW computational biochemist who co-developed RFdiffusion, and is also on the video call. But this morning’s design isn’t meant to do anything except resemble a realistic protein.
Researchers have struggled for decades to build new proteins. At first, they tried to cobble together useful parts of existing proteins, such as a pocket of an enzyme in which a chemical reaction is catalysed. This approach relied on understanding how proteins fold up and work, as well as intuition and a lot of trial and error. Scientists sometimes screened thousands of designs to identify one that worked as hoped.
A light-bulb moment came with AlphaFold (developed by the London-based AI firm DeepMind, now Google DeepMind) and other AI-based models that could accurately predict protein structures from amino-acid sequences, says Baker. Designers realized that these neural networks, trained on real protein sequences and structures, could also help to create proteins from scratch.
In the past few years, Baker’s team and others in the field have released a slew of AI-based protein-design tools. One approach these tools use, called hallucination, involves creating a random string of amino acids that is then optimized by AlphaFold, or a similar tool called RoseTTAFold, until it resembles something that the neural network suggests is likely to fold into a specific structure. Another, called inpainting, takes a specified snippet of a protein sequence or structure and builds the rest of the molecule around it using RoseTTAFold.
But these tools are far from perfect. Experiments tended to show that structures designed by hallucination methods didn’t always form well-folded proteins when they were made in the laboratory, and ended up as gunk at the bottom of a test tube, for instance. Hallucination methods also struggled to make anything but small proteins (although other researchers showed, in a February preprint, how the technique could be used to design longer molecules4). Inpainting also did a poor job of forming proteins when given shorter snippets. Even when the approach did produce a theoretical protein structure, it wasn’t able to come up with diverse solutions to a problem that would increase the odds of success.
That is where RFdiffusion and similar protein-designing AIs, released in recent months, come in. They are based on the same principles as neural networks that generate realistic images, such as Stable Diffusion, DALL-E and Midjourney. These ‘diffusion’ networks are trained on data, be they images or protein structures, which are then made progressively noisier, eventually bearing no resemblance to the starting image or structure. The network then learns to ‘denoise’ the data, performing the task in reverse.
Networks such as RFdiffusion are trained on tens of thousands of real protein structures stored in a repository called the Protein Data Bank (PDB). When the network makes a new protein, it begins with total noise: a random assortment of amino acids. “You’re asking what is the protein that gave rise to the noise,” explains Watson. After rounds of denoising, it produces something resembling a real — but new — protein.
When Baker’s team tested RFdiffusion without providing any guidance except the length of the protein, the network generated diverse, realistic-looking proteins, different from anything it had been trained on in the PDB.
But the researchers are also able to direct the program to make proteins according to specific design constraints during the denoising process, a process called conditioning.
For instance, Baker’s team conditioned RFdiffusion to make proteins that include a specific fold, or that can nestle against the surface of another molecule (an interaction that underlies binding). Grigoryan’s team even developed a diffusion network called Chroma and then conditioned it to make proteins shaped to resemble the 26 capital letters used in English, as well the Arabic numerals3.
Proteins designed by AI to resemble letters in the English alphabet.Credit: John Ingraham, Wujie Wang, Max Baranov, Gevorg Grigoryan
Signal from noise
Juergens’ computer screen initially shows noise, the random assortment of amino acids that the AI system starts with. They are represented as red, smudgy squiggles that resemble a toddler’s fingerpainting. They morph, frame by frame, into ever-more-complex shapes, with protein-like features such as tight spirals known as α-helices and ribbony shapes that double back on themselves, called β-sheets. “It’s a nice mixed alpha–beta topology,” says Juergens, smiling as he admires a creation that took only a few minutes to make. “This is looking good.”
The tool has gained widespread use in Baker’s laboratory. “The design process is almost unrecognizable compared to a year ago,” he says. The neural network has excelled in design challenges that have been inefficient, difficult or impossible using other approaches.
In one analysis reported in their study1, the researchers started with a snippet from another protein, such as a portion of a viral protein recognized by immune cells, and tasked AI-based tools with churning out 100 different new proteins, to see how many would incorporate the desired motif. The team carried out this challenge for 25 different initial shapes. The results didn’t always incorporate the starting snippet, but RFdiffusion produced at least one protein that did for 23 of the motifs, compared with 15 for hallucination and 12 for inpainting.
RFdiffusion has also proved adept at making proteins that self-assemble into complex nanoparticles that might be able to deliver drugs or vaccine components. Previous AI approaches5 can also make these kinds of protein, but Watson says RFdiffusion’s designs are much more sophisticated.
Neural networks such as RFdiffusion seem to really shine when tasked with designing proteins that can stick to another specified protein. Baker’s team has used the network to create proteins that bind strongly to proteins implicated in cancers, autoimmune diseases and other conditions. One as-yet unpublished success, he says, was to design strong binders for a hard-to-target immune-signalling molecule called the tumour necrosis factor receptor — the target for antibody drugs that generate billions of dollars in revenue each year. “It is broadening the space of proteins we can make binders to and make meaningful therapies” for, Watson says.
Real-world testing
Baker’s team is cranking out so many designs that testing whether they work as intended has become a serious bottleneck. “One machine-learning person can generate enough designs to keep 100 biologists busy for months,” says Kevin Yang, a biomedical machine-learning researcher at Microsoft Research in Cambridge, Massachusetts whose team has developed its own diffusion-based protein design tool6.
But early signs suggest that RFdiffusion’s creations are the real deal. In another challenge described in their study, Baker’s team tasked the tool with designing proteins containing a key stretch of p53, a signalling molecule that is overactive in many cancers (and a sought-after drug target). When the researchers made 95 of the software’s designs (by engineering bacteria to express the proteins), more than half maintained p53’s ability to bind to its natural target, MDM2. The best designs did so around 1,000 times more strongly than did natural p53. When the researchers attempted this task with hallucination, the designs — although predicted to work — did not pan out in the test tube, says Watson.
Overall, Baker says his team has found that 10–20% of RFdiffusion’s designs bind to their intended target strongly enough to be useful, compared with less than 1% for earlier, pre-AI methods. (Previous machine-learning approaches were not able to reliably design binders, Watson says). Biochemist Matthias Gloegl, a colleague at UW, says that lately he has been hitting success rates approaching 50%, which means it can take just a week or two to come up with working designs, as opposed to months. “It’s really insane,” he says.
A funnel-shaped protein assembly (top) and a ring-like structure with six protein chains (bottom), designed from noise using diffusion-based AI art generators.Credit: Ian C. Haydon/UW Institute for Protein Design
The cloud-based version of RFdiffusion had around 100 users each day by late June, according to Sergey Ovchinnikov, an evolutionary biologist at Harvard University in Cambridge, Massachusetts. Joel Mackay, a biochemist at the University of Sydney in Australia, has been dabbling with RFdiffusion to design proteins capable of binding to other proteins that his lab studies, which include molecules called transcription factors that control gene activity in cells. He found the design process simple, and used computer modelling to validate that, in theory, the proteins should bind to the transcription factors.
Mackay is now testing whether the proteins can alter gene expression as intended when they are produced in cells. He has his fingers crossed, because such a finding would amount to a simple way to switch specific transcription factors on and off within cells, instead of using drugs that can take years to identify, if they can be discovered at all. “If this method works reliably for our types of proteins, it would be a total game-changer,” he says.
Enjoying our latest content?
Login or create an account to continue
Access the most recent journalism from Nature's award-winning team
Explore the latest features & opinion covering groundbreaking research