0% found this document useful (0 votes)
6 views

Reading Assignment Protein Structures For All

protein structure
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Reading Assignment Protein Structures For All

protein structure
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

2021 Breakthrough of the Year

Protein structures for all


AI-powered predictions show proteins finding their shapes
by Robert Service

In his 1972 Nobel Prize acceptance speech, American biochemist Christian Anfinsen laid out a vision: One
day it would be possible, he said, to predict the 3D structure of any protein merely from its sequence of
amino acid building blocks. With hundreds of thousands of proteins in the human body alone, such an
advance would have vast applications, offering insights into basic biology and revealing promising new
drug targets. Now, after nearly 50 years, researchers have shown that artificial intelligence (AI)-driven
software can churn out accurate protein structures by the thousands—an advance that realizes Anfinsen’s
dream and is Science’s 2021 Breakthrough of the Year.
Protein structures could once be determined only through painstaking lab analyses. But they can now be
calculated, quickly, for tens of thousands of proteins, and for complexes of interacting proteins. “This is a
sea change for structural biology,” says Gaetano Montelione, a structural biologist at Rensselaer
Polytechnic Institute. David Baker, a University of Washington, Seattle, computational biochemist who led
one of the prediction projects, adds that with the bounty of readily available structures, “All areas of
computational and molecular biology will be transformed.”
Proteins are biology’s workhorses. They contract our muscles, convert food into cellular energy, ferry
oxygen in our blood, and fight microbial invaders. Yet despite their varied talents, all proteins start out
with the same basic form: a linear chain of up to 20 different kinds of amino acids, strung together in a
sequence encoded in our DNA. After being assembled in cellular factories called ribosomes, each chain
folds into a unique, exquisitely complex 3D shape. Those shapes, which determine how proteins interact
with other molecules, define their roles in the cell.

Artificial intelligence predicted how two proteins form a complex involved in DNA repair in yeast.
(Illustration) V. Altounian/Science; (Data) I. R. Humphreys et al., Science 374, eabm4805 (2021);
DOI: 10.1126/science.abm4805
Work by Anfinsen and others suggested interactions between amino acids pull proteins into their final
shapes. But given the sheer number of possible interactions between each individual link in the chain and
all the others, even modest-size proteins could assume an astronomical number of possible shapes. In 1969,
American molecular biologist Cyrus Levinthal calculated that it would take longer than the age of a
universe for a protein chain to cycle through them one by one—even at a furious pace. But in nature, each
protein reliably folds up into just one distinctive shape, usually in the blink of an eye.
In the 1950s, researchers started to map proteins’ 3D structures by analyzing how x-rays ricocheted off the
molecules’ atoms. This technique, known as x-ray crystallography, soon became the leading approach;
today, the field’s central repository, the Protein Data Bank, contains some 185,000 experimentally solved
structures. But mapping structures can take years—and cost hundreds of thousands of dollars per protein.
To speed the process, scientists started to create computer models in the 1970s to predict how a given
protein would fold.
At first, that was possible only for small proteins or short segments of larger ones. By 1994, however,
computer models had grown sophisticated enough to launch the biennial Critical Assessment of protein
Structure Prediction (CASP) competition. Organizers gave modelers the amino acid sequences of dozens
of proteins. At the end of the event, the modelers’ results were compared with the latest experimental data
from x-ray crystallography and emerging techniques such as nuclear magnetic resonance spectroscopy and
cryo–electron microscopy (cryo-EM). Scores above 90 were considered on par with experimentally solved
structures.
Early results were humbling, with median scores below 60. But over time, the modelers learned tricks to
improve their calculations. For example, stretches of amino acids shared by two proteins often fold
similarly. If a protein with an unknown structure shares, say, 50% of its amino acid sequence with a protein
that does have a known structure, the latter can serve as a “template” to guide the computer models.
Another major insight came from evolution. Investigators realized that if one amino acid changed in a
protein shared by closely related organisms, like chimpanzees and humans, amino acids located nearby in
the folded molecule would have to change, too, to preserve the protein’s shape and function. That means
investigators can narrow down a protein’s shape by looking for amino acids that coevolve: Even if they are
far apart on the unfolded chain, they are likely neighbors in the final 3D structure.

By 2018, the modelers were often scoring in the mid-70s. Then, AlphaFold, an AI-driven software
program, entered the scene. The program, developed by Google sister company DeepMind, trains itself on
databases of experimentally solved structures. In its first competition, its median score was close to 80, and
it won 43 of 90 matches against other algorithms. In 2020, its successor, AlphaFold2, shone even brighter.
Powered by a network of 182 processors optimized for machine learning, AlphaFold2 rang up a median
score of 92.4—on par with experimental techniques.
“I never thought I’d see this in my lifetime,” John Moult, a structural biologist at the University of
Maryland, Shady Grove, and CASP co-founder, said at the time.
This year, AI predictions shifted into overdrive. In mid-July, Baker and his colleagues reported that their
AI program RoseTTAFold had solved the structures of hundreds of proteins, all from a class of common
drug targets. A week later, DeepMind scientists reported they had done the same for 350,000 proteins
found in the human body—44% of all known human proteins. In coming months, they expect their
database will grow to 100 million proteins across all species, nearly half the total number believed to exist.
The next step is to predict which of those proteins work together and how they interact. DeepMind is
already doing just that. In an October preprint, its scientists unveiled 4433 protein-protein complexes,
revealing which proteins bind to one another—and how. In November, RoseTTAFold added another 912
complexes to the tally.
Code for AlphaFold2 and RoseTTAFold is now publicly available, helping other scientists jump into the
game. In November, researchers in Germany and the United States used AlphaFold2 and cryo-EM to map
the structure of the nuclear pore complex, an assembly of 30 different proteins that controls access to the
cell nucleus. In August, Chinese researchers used AlphaFold2 to map the structures for nearly 200 proteins
that bind to DNA, which could be involved in everything from DNA repair to gene expression. Last
month, Google’s parent company, Alphabet, launched a new venture that will use predicted protein
structures to design new drug candidates. And Baker’s team is using its software to dream up novel protein
sequences that will fold into stable structures, an advance that could lead to new antivirals and catalysts.
Even now, scientists studying SARS-CoV-2 are using AlphaFold2 to model the effect of mutations in the
Omicron variant’s spike protein. By inserting larger amino acids into the protein, the mutations have
changed its shape—perhaps enough to keep antibodies from binding to it and neutralizing the virus.

Much work remains. Protein structures aren’t static; they bend and twist as they do their jobs, and
modeling those changes remains a challenge. And it’s still a daunting task to visualize most of the large,
multi­protein complexes that carry out myriad jobs in cells. But this year’s explosion of AI-driven advances
offers a view of the dance of life as never seen before, a panorama that will forever change biology and
medicine.

You might also like