When DeepMind released its AlphaFold code, Ovchinnikov wanted to better understand how the tool worked. Within days, he and computational-biology colleagues, including Steinegger, set up a website called ColabFold that allowed anyone to submit a protein sequence to AlphaFold or RoseTTAFold and get a structure prediction. Ovchinnikov imagined that he and other scientists would use ColabFold to try and ‘break’ AlphaFold, for instance, by supplying false information about a target protein sequence’s evolutionary relatives. By doing this, Ovchinnikov hoped he could determine how the network had learnt to predict structures so well.
As it turned out, most researchers who used ColabFold just wanted to get a protein structure. But others used it as a platform to modify the inputs to AlphaFold to tackle new applications. “I didn’t expect the number of hacks of various types,” says Jumper.
By far the most popular hack has been to wield the tool on protein complexes comprised of multiple, interacting — and often intertwined — chains of peptides. Just as with the nuclear pore complex, many proteins in cells gain their function when they form complexes with multiple protein subunits.
AlphaFold was designed to predict the shape of single peptide chains, and its training consisted entirely of such proteins. But the network seems to have learnt something about how complexes fold together. Several days after AlphaFold’s code was released, Yoshitaka Moriwaki, a protein bioinformatician at the University of Tokyo, tweeted that it could accurately predict interactions between two protein sequences if they were stitched together with a long linker sequence. Baek soon shared another hack to predict complexes, gleaned from developing RoseTTAFold.
ColabFold later incorporated the ability to predict complexes. And in October 2021, DeepMind released an update called AlphaFold-Multimer8 that was specifically trained on protein complexes, unlike its predecessor. Jumper’s team applied it to thousands of complexes in the PDB, and found that it predicted around 70% of the known protein–protein interactions.
These tools are already helping researchers to spot potential new protein partners. Elofsson’s team used AlphaFold to predict the structures of 65,000 human protein pairs that were suspected to interact on the basis of experimental data9. And a team led by Baker used AlphaFold and RoseTTAFold to model interactions between nearly every pair of proteins encoded by yeast, identifying more than 100 previously unknown complexes10. Such screens are just starting points, says Elofsson. They do a good job of predicting some protein pairings, particularly those that are stable, but struggle to identify more transient interactions. “Because it looks nice doesn’t mean it is correct,” says Elofsson. “You need some experimental data that show you’re right.”
The nuclear pore complex work is a good example of how predictions and experimental data can work together, says Kosinski (see ‘Genome gateway’). “It’s not like we take all the 30 proteins, throw them into AlphaFold and get the structure out.” To put the predicted protein structures together, the team used 3D images of the nuclear pore complex, captured using a form of cryo-EM called cryo-electron tomography. In one instance, experiments that can determine the proximity of proteins turned up a surprising interaction between two components of the complex, which AlphaFold’s models then confirmed.
Images adapted from ref. 3/Agnieszka Obarska-Kosinska
Kosinski sees the team’s current map of the nuclear pore complex as a starting point for experiments and simulations that examine how the pore complex functions — and how it malfunctions in disease.
AlphaFold’s limits
For all the progress made with AlphaFold, scientists say that it is important to be clear about its limitations — particularly because researchers who don’t specialize in predicting protein structures use it.
Attempts to apply AlphaFold to various mutations that disrupt a protein’s natural structure, including one linked to early breast cancer, have confirmed that the software is not equipped to predict the consequences of new mutations in proteins, since there are no evolutionarily-related sequences to examine11.
The AlphaFold team is now thinking about how a neural network could be designed to deal with new mutations. Jumper expects this would require the network to better predict how a protein goes from its unfolded to its folded state. That would probably need software that relies only on what it has learnt about protein physics to predict structures, says Mohammed AlQuraishi, a computational biologist at Columbia University in New York City. “One thing we are interested in is making predictions from single sequences without using evolutionary information,” he says. “That’s a key problem that does remain open.”
AlphaFold is also designed to predict a single structure, although it has been hacked to spit out more than one. But many proteins take on multiple conformations, which can be important to their function. “AlphaFold can’t really deal with proteins that can adopt different structures in different conformations,” says Schueler-Furman. And the predictions are for structures in isolation, whereas many proteins function alongside ligands such as DNA and RNA, fat molecules and minerals such as iron. “We are still missing ligands, we are missing everything else about proteins,” says Elofsson.