0% found this document useful (0 votes)
23 views

Accurate Structure Prediction of Bimolecular Interactions With AlphaFold3

Uploaded by

yahel.godinez
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

Accurate Structure Prediction of Bimolecular Interactions With AlphaFold3

Uploaded by

yahel.godinez
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Article

Accurate structure prediction of


biomolecular interactions with AlphaFold 3

https://doi.org/10.1038/s41586-024-07487-w Josh Abramson1,7, Jonas Adler1,7, Jack Dunger1,7, Richard Evans1,7, Tim Green1,7,
Alexander Pritzel1,7, Olaf Ronneberger1,7, Lindsay Willmore1,7, Andrew J. Ballard1,
Received: 19 December 2023
Joshua Bambrick2, Sebastian W. Bodenstein1, David A. Evans1, Chia-Chun Hung2,
Accepted: 29 April 2024 Michael O’Neill1, David Reiman1, Kathryn Tunyasuvunakool1, Zachary Wu1, Akvilė Žemgulytė1,
Eirini Arvaniti3, Charles Beattie3, Ottavia Bertolli3, Alex Bridgland3, Alexey Cherepanov4,
Published online: 8 May 2024
Miles Congreve4, Alexander I. Cowen-Rivers3, Andrew Cowie3, Michael Figurnov3,
Open access Fabian B. Fuchs3, Hannah Gladman3, Rishub Jain3, Yousuf A. Khan3,5, Caroline M. R. Low4,
Kuba Perlin3, Anna Potapenko3, Pascal Savy4, Sukhdeep Singh3, Adrian Stecula4,
Check for updates
Ashok Thillaisundaram3, Catherine Tong4, Sergei Yakneen4, Ellen D. Zhong3,6,
Michal Zielinski3, Augustin Žídek3, Victor Bapst1,8, Pushmeet Kohli1,8, Max Jaderberg2,8 ✉,
Demis Hassabis1,2,8 ✉ & John M. Jumper1,8 ✉

The introduction of AlphaFold 21 has spurred a revolution in modelling the structure


of proteins and their interactions, enabling a huge range of applications in protein
modelling and design2–6. Here we describe our AlphaFold 3 model with a substantially
updated diffusion-based architecture that is capable of predicting the joint structure
of complexes including proteins, nucleic acids, small molecules, ions and modified
residues. The new AlphaFold model demonstrates substantially improved accuracy
over many previous specialized tools: far greater accuracy for protein–ligand
interactions compared with state-of-the-art docking tools, much higher accuracy
for protein–nucleic acid interactions compared with nucleic-acid-specific predictors
and substantially higher antibody–antigen prediction accuracy compared with
AlphaFold-Multimer v.2.37,8. Together, these results show that high-accuracy
modelling across biomolecular space is possible within a single unified deep-learning
framework.

Accurate models of biological complexes are critical to our under- Here we present AlphaFold 3 (AF3)—a model that is capable of
standing of cellular functions and for the rational design of thera­ high-accuracy prediction of complexes containing nearly all molecular
peutics2–4,9. Enormous progress has been achieved in protein structure types present in the Protein Data Bank32 (PDB) (Fig. 1a,b). In all but one
prediction with the development of AlphaFold1, and the field has category, it achieves a substantially higher performance than strong
grown tremendously with a number of later methods that build on methods that specialize in just the given task (Fig. 1c and Extended
the ideas and techniques of AlphaFold 2 (AF2)10–12. Almost immediately Data Table 1), including higher accuracy at protein structure and the
after AlphaFold became available, it was shown that simple input structure of protein–protein interactions.
modifications would enable surprisingly accurate protein interaction This is achieved by a substantial evolution of the AF2 architec-
predictions13–15 and that training AF2 specifically for protein inter­ ture and training procedure (Fig. 1d) both to accommodate more
action prediction yielded a highly accurate system7. general chemical structures and to improve the data efficiency of
These successes lead to the question of whether it is possible to learning. The system reduces the amount of multiple-sequence
accurately predict the structure of complexes containing a much wider alignment (MSA) processing by replacing the AF2 evoformer with
range of biomolecules, including ligands, ions, nucleic acids and modi- the simpler pairformer module (Fig. 2a). Furthermore it directly
fied residues, within a deep-learning framework. A wide range of pre- predicts the raw atom coordinates with a diffusion module, replac-
dictors for various specific interaction types has been developed16–28, ing the AF2 structure module that operated on amino-acid-specific
as well as one generalist method developed concurrently with the frames and side-chain torsion angles (Fig. 2b). The multiscale
present work29, but the accuracy of such deep-learning attempts has nature of the diffusion process (low noise levels induce the net-
been mixed and often below that of physics-inspired methods30,31. work to improve local structure) also enable us to eliminate
Almost all of these methods are also highly specialized to particular stereochemical losses and most special handling of bonding pat-
interaction types and cannot predict the structure of general biomo- terns in the network, easily accommodating arbitrary chemical
lecular complexes containing many types of entities. components.
1
Core Contributor, Google DeepMind, London, UK. 2Core Contributor, Isomorphic Labs, London, UK. 3Google DeepMind, London, UK. 4Isomorphic Labs, London, UK. 5Department of Molecular
and Cellular Physiology, Stanford University, Stanford, CA, USA. 6Department of Computer Science, Princeton University, Princeton, NJ, USA. 7These authors contributed equally: Josh Abramson,
Jonas Adler, Jack Dunger, Richard Evans, Tim Green, Alexander Pritzel, Olaf Ronneberger, Lindsay Willmore. 8These authors jointly supervised this work: Victor Bapst, Pushmeet Kohli,
Max Jaderberg, Demis Hassabis, John M. Jumper. ✉e-mail: [email protected]; [email protected]; [email protected]

Nature | Vol 630 | 13 June 2024 | 493


Article
a b

c Ligands PoseBusters set Nucleic acids Covalent modifications Proteins


100 AF3 AF3
*** RoseTTAFold2NA AF-M 2.3 ***
AIchemy_RNA2 (has human input)
*** ***
80
** ***
**
Success (%)

60

40

20

0
AF3 AutoDock RoseTTAFold PDB PDB CASP15 Bonded Glycosylation Modified residues All Protein– Protein
2019 cut-off Vina All-Atom protein–RNA protein–dsDNA RNA ligands n = 28 protein–protein antibody monomers
Protein DNA RNA
n = 428 n = 428 n = 427 n = 25 n = 38 n=8 n = 66 n = 1,064 n = 65 n = 338
n = 40 n = 91 n = 23

d
Template
search
Confidence
Inputs module
Genetic
search (4 blocks)

Input Template MSA


+ module module 0 100
Conformer embedder
(3 blocks) Pair (2 blocks) (4 blocks) Pairformer Diffusion
generation (48 blocks) +
Sequences, module
ligands, (3 + 24 + 3 blocks)
+
covalent Single
bonds

Recycling Diffusion iterations

Fig. 1 | AF3 accurately predicts structures across biomolecular complexes. and ranking details are provided in the Methods. For ligands, n indicates the
a,b, Example structures predicted using AF3. a, Bacterial CRP/FNR family number of targets; for nucleic acids, n indicates the number of structures; for
transcriptional regulator protein bound to DNA and cGMP (PDB 7PZB; modifications, n indicates the number of clusters; and for proteins, n indicates
full-complex LDDT47, 82.8; global distance test (GDT)48, 90.1). b, Human the number of clusters. The bar height indicates the mean; error bars indicate
coronavirus OC43 spike protein, 4,665 residues, heavily glycosylated and exact binomial distribution 95% confidence intervals for PoseBusters and by
bound by neutralizing antibodies (PDB 7PNM; full-complex LDDT, 83.0; GDT, 10,000 bootstrap resamples for all others. Significance levels were calculated
83.1). c, AF3 performance on PoseBusters (v.1, August 2023 release), our recent using two-sided Fisher’s exact tests for PoseBusters and using two-sided
PDB evaluation set and CASP15 RNA. Metrics are as follows: percentage of Wilcoxon signed-rank tests for all others; ***P < 0.001, **P < 0.01. Exact P values
pocket-aligned ligand r.m.s.d. < 2 Å for ligands and covalent modifications; (from left to right) are as follows: 2.27 × 10 −13, 2.57 × 10 −3, 2.78 × 10 −3, 7.28 × 10 −12,
interface LDDT for protein–nucleic acid complexes; LDDT for nucleic acid and 1.81 × 10 −18, 6.54 × 10 −5 and 1.74 × 10 −34. AF-M 2.3, AlphaFold-Multimer v.2.3;
protein monomers; and percentage DockQ > 0.23 for protein–protein and dsDNA, double-stranded DNA. d, AF3 architecture for inference. The rectangles
protein–antibody interfaces. All scores are reported from the top confidence- represent processing modules and the arrows show the data flow. Yellow, input
ranked sample out of five model seeds (each with five diffusion samples), data; blue, abstract network activations; green, output data. The coloured balls
except for protein–antibody scores, which were ranked across 1,000 model represent physical atom coordinates.
seeds for both models (each AF3 seed with five diffusion samples). Sampling

Compared with the original evoformer from AF2, the number of blocks
Network architecture and training is reduced to four, the processing of the MSA representation uses an
The overall structure of AF3 (Fig. 1d and Supplementary Methods 3) inexpensive pair-weighted averaging and only the pair representa-
echoes that of AF2, with a large trunk evolving a pairwise representa- tion is used for later processing steps. The ‘pairformer’ (Fig. 2a and
tion of the chemical complex followed by a structure module that uses Supplementary Methods 3.6) replaces the evoformer of AF2 as the
the pairwise representation to generate explicit atomic positions, but dominant processing block. It operates only on the pair representation
there are large differences in each major component. These modifica- and the single representation; the MSA representation is not retained
tions were driven both by the need to accommodate a wide range of and all information passes through the pair representation. The pair
chemical entities without excessive special casing and by observations processing and the number of blocks (48) is largely unchanged from
of AF2 performance with different modifications. Within the trunk, AF2. The resulting pair and single representation together with the
MSA processing is substantially de-emphasized, with a much smaller input representation are passed to the new diffusion module (Fig. 2b)
and simpler MSA embedding block (Supplementary Methods 3.3). that replaces the structure module of AF2.

494 | Nature | Vol 630 | 13 June 2024


a
Pair representation 48 blocks Pair representation
(n, n, c) (n, n, c)
Triangle Triangle
Triangle Triangle
self- self-
update update
attention attention
using + using + + + Transition +
around around
outgoing incoming
starting ending
edges edges
node node
Single Single
representation (n, c) Single representation (n, c)
attention
with pair
+ Transition +
bias

b d Fine Fine
Initial training tune 1 tune 2
100
Per-
Intraligand
token
cond. 90
Intraprotein
80 Intra-DNA

Per- Intra-RNA
atom 70

LDDT
cond.
Protein–ligand
60 Protein–protein
Rand.
rot. Protein–DNA
Seq. local Seq. local 50
trans. attention attention
(atoms) (atoms) Protein–RNA
3 blocks 3 blocks 40
Global attention
(tokens)
24 blocks 30
0 20 40 60 80 100 120 140
Steps (×103)
c
Network
trunk

Diffusion Diffusion
+ module + module
+
+
(inference) (training)
STOP

48 samples Loss
20 iterations
mini rollout Permute
ground
truth
Ground
truth STOP Confidence
module
STOP Loss
Metrics

Fig. 2 | Architectural and training details. a, The pairformer module. starting from the end of the network trunk. The coloured arrays show activations
Input and output: pair representation with dimension (n, n, c) and single from the network trunk (green, inputs; blue, pair; red, single). The blue arrows
representation with dimension (n, c). n is the number of tokens (polymer show abstract activation arrays. The yellow arrows show ground-truth data.
residues and atoms); c is the number of channels (128 for the pair representation, The green arrows show predicted data. The stop sign represents stopping of
384 for the single representation). Each of the 48 blocks has an independent set the gradient. Both depicted diffusion modules share weights. d, Training
of trainable parameters. b, The diffusion module. Input: coarse arrays depict curves for initial training and fine-tuning stages, showing the LDDT on our
per-token representations (green, inputs; blue, pair; red, single). Fine arrays evaluation set as a function of optimizer steps. The scatter plot shows the raw
depict per-atom representations. The coloured balls represent physical atom datapoints and the lines show the smoothed performance using a median filter
coordinates. Cond., conditioning; rand. rot. trans., random rotation and with a kernel width of nine datapoints. The crosses mark the point at which the
translation; seq., sequence. c, The training set-up (distogram head omitted) smoothed performance reaches 97% of its initial training maximum.

The diffusion module (Fig. 2b and Supplementary Methods 3.7) oper- network to learn protein structure at a variety of length scales, whereby
ates directly on raw atom coordinates, and on a coarse abstract token the denoising task at small noise emphasizes understanding very local
representation, without rotational frames or any equivariant process- stereochemistry and the denoising task at high noise emphasizes the
ing. We had observed in AF2 that removing most of the complexity large-scale structure of the system. At the inference time, random noise
of the structure module had only a modest effect on the prediction is sampled and then recurrently denoised to produce a final structure.
accuracy, and maintaining the backbone frame and side-chain torsion Importantly, this is a generative training procedure that produces a
representation add quite a bit of complexity for general molecular distribution of answers. This means that, for each answer, the local
graphs. Similarly AF2 required carefully tuned stereochemical viola- structure will be sharply defined (for example, side-chain bond geom-
tion penalties during training to enforce chemical plausibility of the etry) even when the network is uncertain about the positions. For this
resulting structures. We use a relatively standard diffusion approach33 reason, we are able to avoid both torsion-based parametrizations of
in which the diffusion model is trained to receive ‘noised’ atomic coor- the residues and violation losses on the structure, while handling the
dinates and then predict the true coordinates. This task requires the full complexity of general ligands. Similarly to some recent work34,

Nature | Vol 630 | 13 June 2024 | 495


Article
a b c

d e f

Fig. 3 | Examples of predicted complexes. Selected structure predictions of an EXTL3 homodimer (PDB 7AU2; mean pocket-aligned r.m.s.d., 1.10 Å).
from AF3. Predicted protein chains are shown in blue (predicted antibody in c, Mesothelin C-terminal peptide bound to the monoclonal antibody 15B6
green), predicted ligands and glycans in orange, predicted RNA in purple and (PDB 7U8C; DockQ, 0.85). d, LGK974, a clinical-stage inhibitor, bound to
the ground truth is shown in grey. a, Human 40S small ribosomal subunit (7,663 PORCN in a complex with the WNT3A peptide (PDB 7URD; ligand r.m.s.d.,
residues) including 18S ribosomal RNA and Met-tRNAiMet (opaque purple) in a 1.00 Å). e, (5S,6S)-O7-sulfo DADH bound to the AziU3/U2 complex with a novel
complex with translation initiation factors eIF1A and eIF5B (opaque blue; PDB fold (PDB 7WUX; ligand r.m.s.d., 1.92 Å). f, Analogue of NIH-12848 bound to an
7TQL; full-complex LDDT, 87.7; GDT, 86.9). b, The glycosylated globular portion allosteric site of PI5P4Kγ (PDB 7QIE; ligand r.m.s.d., 0.37 Å).

we find that no invariance or equivariance with respect to global relatively early and started to decline (most likely due to overfitting to
rotations and translation of the molecule are required in the archi- the limited number of training samples for this capability), while other
tecture and we therefore omit them to simplify the machine learning abilities were still undertrained. We addressed this by increasing or
architecture. decreasing the sampling probability for the corresponding training
The use of a generative diffusion approach comes with some technical sets (Supplementary Methods 2.5.1) and by performing early stopping
challenges that we needed to address. The biggest issue is that genera- using a weighted average of all of the above metrics and some additional
tive models are prone to hallucination35, whereby the model may invent metrics to select the best model checkpoint (Supplementary Table 7).
plausible-looking structure even in unstructured regions. To counteract The fine-tuning stages with the larger crop sizes improve the model on
this effect, we use a cross-distillation method in which we enrich the all metrics with an especially high uplift on protein–protein interfaces
training data with structures predicted by AlphaFold-Multimer (v.2.3)7,8. (Extended Data Fig. 2).
In these structures, unstructured regions are typically represented
by long extended loops instead of compact structures, and training
on them ‘teaches’ AF3 to mimic this behaviour. This cross-distillation Accuracy across complex types
greatly reduced the hallucination behaviour of AF3 (Extended Data AF3 can predict structures from input polymer sequences, residue
Fig. 1 for disorder prediction results on the CAID 236 benchmark set). modifications and ligand SMILES (simplified molecular-input line-entry
We also developed confidence measures that predict the atom-level system). In Fig. 3 we show a selection of examples highlighting the abil-
and pairwise errors in our final structures. In AF2, this was done directly ity of the model to generalize to a number of biologically important
by regressing the error in the output of the structure module during and therapeutically relevant modalities. In selecting these examples,
training. However, this procedure is not applicable to diffusion training, we considered novelty in terms of the similarity of individual chains
as only a single step of the diffusion is trained instead of a full-structure and interfaces to the training set (additional information is provided
generation (Fig. 2c). To remedy this, we developed a diffusion ‘rollout’ in Supplementary Methods 8.1).
procedure for the full-structure prediction generation during training We evaluated the performance of the system on recent interface-
(using a larger step size than normal; Fig. 2c (mini-rollout)). This pre- specific benchmarks for each complex type (Fig. 1c and Extended
dicted structure is then used to permute the symmetric ground-truth Data Table 1). Performance on protein–ligand interfaces was evalu-
chains and ligands, and to compute the performance metrics to train ated on the PoseBusters benchmark set, which is composed of 428
the confidence head. The confidence head uses the pairwise representa- protein–ligand structures released to the PDB in 2021 or later. As our
tion to predict a modified local distance difference test (pLDDT) and a standard training cut-off date is in 2021, we trained a separate AF3
predicted aligned error (PAE) matrix as in AF2, as well as a distance error model with an earlier training-set cutoff (Methods). Accuracy on the
matrix (PDE), which is the error in the distance matrix of the predicted PoseBusters set is reported as the percentage of protein–ligand pairs
structure as compared to the true structure (details are provided in with pocket-aligned ligand root mean squared deviation (r.m.s.d.) of
Supplementary Methods 4.3). less than 2 Å. The baseline models come in two categories: those that
Figure 2d shows that, during initial training, the model learns quickly use only protein sequence and ligand SMILES as an input and those that
to predict the local structures (all intrachain metrics go up quickly and additionally leak information from the solved protein–ligand test struc-
reach 97% of the maximum performance within the first 20,000 training ture. Traditional docking methods use the latter privileged information,
steps), while the model needs considerably longer to learn the global even though that information would not be available in real-world use
constellation (the interface metrics go up slowly and protein–protein cases. Even so, AF3 greatly outperforms classical docking tools such as
interface LDDT passes the 97% bar only after 60,000 steps). During Vina37,38 even while not using any structural inputs (Fisher’s exact test,
AF3 development, we observed that some model abilities topped out P = 2.27 × 10−13) and greatly outperforms all other true blind docking

496 | Nature | Vol 630 | 13 June 2024


like RoseTTAFold All-Atom (P = 4.45 × 10−25). Extended Data Fig. 3 shows antibody–protein interaction prediction in particular showing a
three examples in which AF3 achieves accurate predictions but docking marked improvement (Fig. 1c (right); paired Wilcoxon signed-rank
tools Vina and Gold do not37. PoseBusters analysis was performed using test, P = 6.5 × 10−5, predictions top-ranked from 1,000 rather than the
a training cut-off of 30 September 2019 for AF3 to ensure that the model typical 5 seeds; further details are provided in Fig. 5a). Protein monomer
was not trained on any PoseBusters structures. To compare with the LDDT improvement is also significant (paired Wilcoxon signed-rank
RoseTTAFold All-Atom results, we used PoseBusters version 1. Version test, P = 1.7 × 10−34). AF3 has a very similar dependence on MSA depth
2 (crystal contacts removed from the benchmark set) results including to AlphaFold-Multimer v.2.3; proteins with shallow MSAs are predicted
quality metrics are shown in Extended Data Fig. 4b–f and Extended Data with lower accuracy (a comparison of the dependence of single-chain
Table 1. We use multiple seeds to ensure correct chirality and avoid LDDT on MSA depth is shown in Extended Data Fig. 7a).
slight protein–ligand clashing (as opposed to a method like diffusion
guidance to enforce) but we are typically able to produce high-quality
stereochemistry. Separately, we also train a version of AF3 that receives Predicted confidences track accuracy
the ‘pocket information’ as used in some recent deep-learning work24,26 As with AF2, AF3 confidence measures are well calibrated with accuracy.
(the results are shown in Extended Data Fig. 4a). Our confidence analysis is performed on the recent PDB evaluation
AF3 predicts protein–nucleic complexes and RNA structures with set, with no homology filtering and including peptides. The ligands
higher accuracy than RoseTTAFold2NA15 (Fig. 1c (second plot)). As category is filtered to high-quality experimental structures as described
RoseTTAFold2NA is validated only on structures below 1,000 resi- above, and considers standard non-bonded ligands only. See Extended
dues, we use only structures below 1,000 residues from our recent PDB Data Fig. 8 for a similar assessment on bonded ligand and other inter-
evaluation set for this comparison (Methods). AF3 is able to predict faces. All statistics are cluster-weighted (Methods) and consider the
protein–nucleic structures with thousands of residues, an example top-ranked prediction only (ranking details are provided in Supple-
of which is shown in Fig. 3a. Note that we do not compare directly to mentary Methods 5.9.3).
RoseTTAFold All-Atom, but benchmarks indicate that RoseTTAFold In Fig. 4a (top row), we plot the chain pair interface-predicted TM
All-Atom is slightly less accurate than RoseTTAFold2NA for nucleic (ipTM) score41 (Supplementary Methods 5.9.1) against interface accu-
acid predictions29. racy measures: protein–protein DockQ, protein–nucleic interface
We also evaluated AF3 performance on the ten publicly available LDDT (iLDDT) and protein–ligand success, with success defined as the
Critical Assessment of Structure Prediction 15 (CASP15) RNA targets: percentage of examples under thresholded pocket-aligned r.m.s.d.
we achieve a higher average performance than RoseTTAFold2NA and values. In Fig. 4a (bottom row), we plot the average pLDDT per protein,
AIchemy_RNA27 (the best AI-based submission in CASP1518,31) on the nucleotide or ligand entity against our bespoke LDDT_to_polymer
respective common subsets of our and their predictions (detailed metric (metrics details are provided in the Methods), which is closely
results are shown in Extended Data Fig. 5a). We did not reach the related to the training target of the pLDDT predictor.
performance of the best human-expert-aided CASP15 submission In Fig. 4b–e, we highlight a single example prediction of 7T82, in
AIchemy_RNA239 (Fig. 1c (centre left)). Owing to limited dataset sizes, which per-atom pLDDT colouring identifies unconfident chain tails,
we do not report significance test statistics here. Further analysis of somewhat confident interfaces and otherwise confident secondary
the accuracy of predicting nucleic acids alone (without proteins) is structure. In Fig. 4c, the same prediction is coloured by chain, along with
shown in Extended Data Fig. 5b. DockQ interface scores in Fig. 4d and per-chain colouring displayed on
Covalent modifications (bonded ligands, glycosylation, and modi- the axes for reference. We see from Fig. 4e that PAE confidence is high
fied protein residues and nucleic acid bases) are also accurately pre- for pink–grey and blue–orange residue pairs for which DockQ > 0.7,
dicted by AF3 (Fig. 1c (centre right)). Modifications include those to and least confident about pink–orange and pink–blue residue pairs
any polymer residue (protein, RNA or DNA). We report accuracy as the that have DockQ ≈ 0. A similar PAE analysis of an example with protein
percentage of successful predictions (pocket r.m.s.d. < 2 Å). We apply and nucleic acid chains is shown in Extended Data Fig. 5c,d.
quality filters to the bonded ligands and glycosylation dataset (as does
PoseBusters): we include only ligands with high-quality experimental
data (ranking_model_fit > 0.5, according to the RCSB structure valida- Model limitations
tion report, that is, X-ray structures with a model quality above the We note model limitations of AF3 with respect to stereochemistry,
median). As with the PoseBusters set, the bonded ligands and glyco- hallucinations, dynamics and accuracy for certain targets.
sylation datasets are not filtered by homology to the training dataset. On stereochemistry, we note two main classes of violations. The
Filtering on the basis of the bound polymer chain homology (using first is that the model outputs do not always respect chirality (Fig. 5b),
polymer template similarity < 40) yielded only five clusters for bonded despite the model receiving reference structures with correct chirality
ligands and seven clusters for glycosylation. We exclude multi-residue as input features. To address this in the PoseBusters benchmark, we
glycans here because the RCSB validation report does not provide a included a penalty for chirality violation in our ranking formula for
ranking_model_fit value for them. The percentage of successful pre- model predictions. Despite this, we still observe a chirality violation
dictions (pocket r.m.s.d. < 2 Å) for multi-residue glycans on all-quality rate of 4.4% in the benchmark. The second class of stereochemical
experimental data is 42.1% (n = 131 clusters), which is slightly lower than violations is a tendency of the model to occasionally produce overlap-
the success rate for single-residue glycans on all-quality experimental ping (clashing) atoms in the predictions. This sometimes manifests
data of 46.1% (n = 167). The modified residues dataset is filtered similarly as extreme violations in homomers in which entire chains have been
to our other polymer test sets: it contains only modified residues in observed to overlap (Fig. 5e). Penalizing clashes during ranking (Sup-
polymer chains with low homology to the training set (Methods). See plementary Methods 5.9.3) reduces the occurrence of this failure mode
Extended Data Table 1 for detailed results, and Extended Data Fig. 6 for but does not eliminate them. Almost all remaining clashes occur for
examples of predicted protein, DNA and RNA structures with covalent protein–nucleic complexes with both greater than 100 nucleotides
modifications, including analysis of the impact of phosphorylation and greater than 2,000 residues in total.
on predictions. We note that the switch from the non-generative AF2 model to the
While expanding in modelling abilities, AF3 has also improved in diffusion-based AF3 model introduces the challenge of spurious struc-
protein complex accuracy relative to AlphaFold-Multimer (v.2.3)7,8. tural order (hallucinations) in disordered regions (Fig. 5d and Extended
Generally, protein–protein prediction success (DockQ > 0.23)40 has Data Fig. 1). Although hallucinated regions are typically marked as very
increased (paired Wilcoxon signed-rank test, P = 1.8 × 10−18), with low confidence, they can lack the distinctive ribbon-like appearance

Nature | Vol 630 | 13 June 2024 | 497


Article
a Protein–protein Nucleic acid–protein Ligand–protein
1.0 100 100 R.m.s.d.
threshold (Å)

Pocket r.m.s.d. < threshold (%)


5.0+
0.8 80 80 2.0–5.0
1.0–2.0
0.5–1.0
0.6 60 60 0–0.5

iLDDT
DockQ

0.4 40 40

0.2 20 20

0 0 0
0–0.4 0.4–0.6 0.6–0.8 0.8–0.95 0.95+ 0–0.4 0.4–0.6 0.6–0.8 0.8–0.95 0.95+ 0–0.4 0.4–0.6 0.6–0.8 0.8–0.95 0.95+
n = 834 n = 692 n = 1,574 n = 1,804 n = 229 n = 277 n = 320 n = 449 n = 344 n = 108 n = 7 n = 29 n = 92 n = 299 n = 481
Chain pair ipTM Chain pair ipTM Chain pair ipTM

Protein Nucleic acid Ligand


100 100 100

80 80 80
LDDT to polymer

LDDT to polymer
LDDT to polymer

60 60 60

40 40 40

20 20 20

0 0 0
0–50 50–70 70–90 90+ 0–50 50–70 70–90 90+ 0–50 50–70 70–90 90+
n = 107 n = 469 n = 2,283 n = 1,552 n = 227 n = 225 n = 189 n = 229 n = 57 n = 136 n = 396 n = 934
Chain pLDDT Chain pLDDT Chain pLDDT

b c d e
Interface DockQ score Predicted aligned error matrix
A C D F A C D F
0
30
A 0.003 0.003 0.721 A

Expected position error (Å)


200 25

20
Residue

C 0.003 0.740
400 C
15

D 10
0.003 0.740
600 D
5

F 0.721 800 F 0

0 200 400 600 800


Residue

Fig. 4 | AF3 confidences track accuracy. a, The accuracy of protein-containing 7T82 coloured by pLDDT (orange, 0–50; yellow, 50–70; cyan, 70–90; and
interfaces as a function of chain pair ipTM (top). Bottom, the LDDT-to-polymer blue, 90–100). c, The same prediction coloured by chain. d, DockQ scores for
accuracy was evaluated for various chain types as a function of chain-averaged protein–protein interfaces. e, PAE matrix of same prediction (darker is more
pLDDT. The box plots show the 25–75% confidence intervals (box limits), the confident), with chain colouring of c on the side bars. The dashed black lines
median (centre line) and the 5–95% confidence intervals (whiskers). n values indicate the chain boundaries.
report the number of clusters in each band. b, The predicted structure of PDB

that AF2 produces in disordered regions. To encourage ribbon-like example, E3 ubiquitin ligases natively adopt an open conformation
predictions in AF3, we use distillation training from AF2 predictions, in an apo state and have been observed only in a closed state when
and we add a ranking term to encourage results with more solvent bound to ligands, but AF3 exclusively predicts the closed state for both
accessible surface area36. holo and apo systems42 (Fig. 5c). Many methods have been developed,
A key limitation of protein structure prediction models is that they particularly around MSA resampling, that assist in generating diversity
typically predict static structures as seen in the PDB, not the dynamical from previous AlphaFold models43–45 and may also assist in multistate
behaviour of biomolecular systems in solution. This limitation persists prediction with AF3.
for AF3, in which multiple random seeds for either the diffusion head Despite the large advance in modelling accuracy in AF3, there are
or the overall network do not produce an approximation of the solu- still many targets for which accurate modelling can be challenging.
tion ensemble. To obtain the highest accuracy, it may be necessary to generate a
In some cases, the modelled conformational state may not be correct large number of predictions and rank them, which incurs an extra
or comprehensive given the specified ligands and other inputs. For computational cost. A class of targets in which we observe this effect

498 | Nature | Vol 630 | 13 June 2024


a Correct low-homology Very high accuracy b
antibodies low-homology antibodies
40

DockQ > 0.8


60

DockQ > 0.23

Percentage
30

Percentage

***

***
40 20
10
20
0
*
1 10 100 1,000 1 10 100 1,000
Seeds per target Seeds per target
AF3 AF-M 2.3

d Ground truth AF-M 2.3 AF3 e

Fig. 5 | Model limitations. a, Antibody prediction quality increases with the chirality and clash penalty. c, Conformation coverage is limited. Ground-truth
number of model seeds. The quality of top-ranked, low-homology antibody– structures (grey) of cereblon in open (apo, PDB: 8CVP; left) and closed (holo
antigen interface predictions as a function of the number of seeds. Each mezigdomide-bound, PDB: 8D7U; right) conformations. Predictions (blue) of
datapoint shows the mean over 1,000 random samples (with replacement) of both apo (with 10 overlaid samples) and holo structures are in the closed
seeds to rank over, out of 1,200 seeds. Confidence intervals are 95% bootstraps conformation. The dashed lines indicate the distance between the N-terminal
over 10,000 resamples of cluster scores at each datapoint. Samples per Lon protease-like and C-terminal thalidomide-binding domain. d, A nuclear
interface are ranked by protein–protein ipTM. Significance tests were pore complex with 1,854 unresolved residues (PDB: 7F60). The ground truth
performed using by two-sided Wilcoxon signed-rank tests. n = 65 clusters. (left) and predictions from AlphaFold-Multimer v.2.3 (middle) and AF3 (right)
Exact P values were as follows: 2.0 × 10 −5 (percentage correct) and P = 0.009 are shown. e, Prediction of a trinucleosome with overlapping DNA (pink) and
(percentage very high accuracy). b, Prediction (coloured) and ground- protein (blue) chains (PDB: 7PEU); highlighted are overlapping protein chains
truth (grey) structures of Thermotoga maritima α-glucuronidase and B and J and self-overlapping DNA chain AA. Unless otherwise stated, predictions
beta-d-glucuronic acid—a target from the PoseBusters set (PDB: 7CTM). AF3 are top-ranked by our global complex ranking metric with chiral mismatch and
predicts alpha-d-glucuronic acid; the differing chiral centre is indicated by an steric clash penalties (Supplementary Methods 5.9.1).
asterisk. The prediction shown is top-ranked by ligand–protein ipTM and with a

strongly is antibody–antigen complexes, similar to other recent work46.


Figure 5a shows that, for AF3, top-ranked predictions keep improving Discussion
with more model seeds, even at as many as 1,000 (Wilcoxon signed-rank The core challenge of molecular biology is to understand and ultimately
test between 5 and 1,000 seeds, P = 2.0 × 10−5 for percentage correct regulate the complex atomic interactions of biological systems. The AF3
and P = 0.009 for percentage very high accuracy; ranking by protein– model takes a large step in this direction, demonstrating that it is pos-
protein interface ipTM). This large improvement with many seeds sible to accurately predict the structure of a wide range of biomolecular
is not observed in general for other classes of molecules (Extended systems in a unified framework. Although there are still substantial
Data Fig. 7b). Using only one diffusion sample per model seed for the challenges to achieve highly accurate predictions across all interaction
AF3 predictions rather than five (not illustrated) does not change types, we demonstrate that it is possible to build a deep-learning system
the results significantly, indicating that running more model seeds that shows strong coverage and generalization for all of these interac-
is necessary for antibody score improvements, rather than just more tions. We also demonstrate that the lack of cross-entity evolution-
diffusion samples. ary information is not a substantial blocker to progress in predicting

Nature | Vol 630 | 13 June 2024 | 499


Article
these interactions and, moreover, substantial improvement in antibody 20. Jiang, D. et al. InteractionGraphNet: a novel and efficient deep graph representation
learning framework for accurate protein-ligand interaction predictions. J. Med. Chem. 64,
results suggests AlphaFold-derived methods are able to model the 18209–18232 (2021).
chemistry and physics of classes of molecular interactions without 21. Jiang, H. et al. Predicting protein–ligand docking structure with graph neural network.
dependence on MSAs. Finally, the large improvement in protein–ligand J. Chem. Inf. Model. https://doi.org/10.1021/acs.jcim.2c00127 (2022).
22. Corso, G., Stärk, H., Jing, B., Barzilay, R. & Jaakkola, T. DiffDock: diffusion steps, twists, and
structure prediction shows that it is possible to handle the wide diver- turns for molecular docking. Preprint at arXiv https://doi.org/10.48550/arXiv.2210.01776
sity of chemical space within a general deep-learning framework and (2022).
without resorting to an artificial separation between protein structure 23. Stärk, H., Ganea, O., Pattanaik, L., Barzilay, D. & Jaakkola, T. EquiBind: Geometric deep
learning for drug binding structure prediction. In Proc. 39th International Conference on
prediction and ligand docking. Machine Learning (eds Chaudhuri, K. et al.) 20503–20521 (PMLR, 2022).
The development of bottom-up modelling of cellular components 24. Liao, Z. et al. DeepDock: enhancing ligand-protein interaction prediction by a
is a key step in unravelling the complexity of molecular regulation combination of ligand and structure information. In Proc. 2019 IEEE International
Conference on Bioinformatics and Biomedicine (BIBM) 311–317 (IEEE, 2019).
within the cell, and the performance of AF3 shows that developing the 25. Lu, W. et al. TANKBind: trigonometry-aware neural networks for drug-protein binding
right deep-learning frameworks can massively reduce the amount of structure prediction. Adv. Neural Inf. Process. Syst. 35, 7236–7249 (2022).
data required to obtain biologically relevant performance on these 26. Zhou, G. et al. Uni-Mol: a universal 3D molecular representation learning framework.
Preprint at ChemRxiv https://chemrxiv.org/engage/chemrxiv/article-details/6402990d37
tasks and amplify the impact of the data already collected. We expect e01856dc1d1581 (2023).
that structural modelling will continue to improve not only due to 27. Shen, T. et al. E2Efold-3D: end-to-end deep learning method for accurate de novo RNA 3D
advances in deep learning but also because continuing methodologi- structure prediction. Preprint at arXiv https://arxiv.org/abs/2207.01586 (2022).
28. van Dijk, M. & Bonvin, A. M. J. J. Pushing the limits of what is achievable in protein–DNA
cal advances in experimental structure determination, such as the docking: benchmarking HADDOCK’s performance. Nucleic Acids Res. 38, 5634–5647
substantial improvements in cryo-electron microscopy and tomogra- (2010).
phy, will provide a wealth of new training data to further the improve 29. Krishna, R. et al. Generalized biomolecular modeling and design with RoseTTAFold
All-Atom. Science 384, eadl2528 (2024).
the generalization ability of such models. The parallel developments 30. Buttenschoen, M., Morris, G. M. & Deane, C. M. PoseBusters: AI-based docking methods
of experimental and computational methods promise to propel us fail to generate physically valid poses or generalise to novel sequences. Chem. Sci. 15,
further into an era of structurally informed biological understanding 3130–3139 (2024).
31. Das, R. et al. Assessment of three-dimensional RNA structure prediction in CASP15.
and therapeutic development. Proteins 91, 1747–1770 (2023).
32. Berman, H. M. et al. The Protein Data Bank. Nucleic Acids Res. 28, 235–242 (2000).
33. Karras, T., Aittala, M., Aila, T. & Laine, S. Elucidating the design space of diffusion-based
generative models. Adv. Neural Inf. Process. Syst. 35, 26565–26577 (2022).
Online content 34. Wang, Y., Elhag, A. A., Jaitly, N., Susskind, J. M. & Bautista, M. A. Generating molecular
Any methods, additional references, Nature Portfolio reporting summa- conformer fields. Preprint at arXiv https://doi.org/10.48550/arXiv.2311.17932 (2023).
35. Ji, Z., et al. Survey of hallucination in natural language generation. ACM Comput. Surv.
ries, source data, extended data, supplementary information, acknowl-
55, 248 (2023).
edgements, peer review information; details of author contributions 36. Del Conte, A. et al. Critical assessment of protein intrinsic disorder prediction (CAID)—
and competing interests; and statements of data and code availability results of round 2. Proteins 91, 1925–1934 (2023).
37. Trott, O. & Olson, A. J. AutoDock Vina: improving the speed and accuracy of docking with
are available at https://doi.org/10.1038/s41586-024-07487-w.
a new scoring function, efficient optimization, and multithreading. J. Comput. Chem. 31,
455–461 (2010).
38. Miller, E. B. et al. Reliable and accurate solution to the induced fit docking problem for
protein–ligand binding. J. Chem. Theory Comput. https://doi.org/10.1021/acs.jctc.1c00136
1. Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596,
(2021).
583–589 (2021).
39. Chen, K., Zhou, Y., Wang, S. & Xiong, P. RNA tertiary structure modeling with BRiQ potential
2. Kreitz, J. et al. Programmable protein delivery with a bacterial contractile injection system.
in CASP15. Proteins 91, 1771–1778 (2023).
Nature 616, 357–364 (2023).
40. Basu, S. & Wallner, B. DockQ: a quality measure for protein-protein docking models. PLoS
3. Lim, Y. et al. In silico protein interaction screening uncovers DONSON’s role in replication
ONE 11, e0161879 (2016).
initiation. Science 381, eadi3448 (2023).
41. Zhang, Y. & Skolnick, J. Scoring function for automated assessment of protein structure
4. Mosalaganti, S. et al. AI-based structure prediction empowers integrative structural
template quality. Proteins 57, 702–710 (2004).
analysis of human nuclear pores. Science 376, eabm9506 (2022).
42. Watson, E. R. et al. Molecular glue CELMoD compounds are regulators of cereblon
5. Anand, N. & Achim, T. Protein structure and sequence generation with equivariant
conformation. Science 378, 549–553 (2022).
denoising diffusion probabilistic models. Preprint at arXiv https://doi.org/10.48550/
43. Wayment-Steele, H. K. et al. Predicting multiple conformations via sequence clustering
arXiv.2205.15019 (2022).
and AlphaFold2. Nature 625, 832–839 (2024).
6. Yang, Z., Zeng, X., Zhao, Y. & Chen, R. AlphaFold2 and its applications in the fields of
44. del Alamo, D., Sala, D., Mchaourab, H. S. & Meiler, J. Sampling alternative conformational
biology and medicine. Signal Transduct. Target. Ther. 8, 115 (2023).
states of transporters and receptors with AlphaFold2. eLife https://doi.org/10.7554/
7. Evans, R. et al. Protein complex prediction with AlphaFold-Multimer. Preprint at bioRxiv
eLife.75751 (2022).
https://doi.org/10.1101/2021.10.04.463034 (2022).
45. Heo, L. & Feig, M. Multi-state modeling of G-protein coupled receptors at experimental
8. Židek, A. AlphaFold v.2.3.0 Technical Note. GitHub https://github.com/google-deepmind/
accuracy. Proteins 90, 1873–1885 (2022).
alphafold/blob/main/docs/technical_note_v2.3.0.md (2022).
46. Wallner, B. AFsample: improving multimer prediction with AlphaFold using massive
9. Isert, C., Atz, K. & Schneider, G. Structure-based drug design with geometric deep learning.
sampling. Bioinformatics 39, btad573 (2023).
Curr. Opin. Struct. Biol. 79, 102548 (2023).
47. Mariani, V., Biasini, M., Barbato, A. & Schwede, T. lDDT: a local superposition-free score for
10. Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a
comparing protein structures and models using distance difference tests. Bioinformatics
language model. Science 379, 1123–1130 (2023).
29, 2722–2728 (2013).
11. Baek, M. et al. Accurate prediction of protein structures and interactions using a three-
48. Zemla, A. LGA: A method for finding 3D similarities in protein structures. Nucleic Acids Res.
track neural network. Science https://doi.org/10.1126/science.abj8754 (2021).
31, 3370–3374 (2003).
12. Wu, R. et al. High-resolution de novo structure prediction from primary sequence.
Preprint at bioRxiv https://doi.org/10.1101/2022.07.21.500999 (2022). Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in
13. Bryant, P., Pozzati, G. & Elofsson, A. Improved prediction of protein-protein interactions published maps and institutional affiliations.
using AlphaFold2. Nat. Commun. 13, 1265 (2022).
14. Moriwaki, Y. Post on X. X https://x.com/Ag_smith/status/1417063635000598528?lang= Open Access This article is licensed under a Creative Commons Attribution
en-GB (2021). 4.0 International License, which permits use, sharing, adaptation, distribution
15. Baek, M. Post on X. X https://x.com/minkbaek/status/1417538291709071362?lang=en and reproduction in any medium or format, as long as you give appropriate
(2021). credit to the original author(s) and the source, provide a link to the Creative Commons licence,
16. Qiao, Z. et al. State-specific protein–ligand complex structure prediction with a multiscale and indicate if changes were made. The images or other third party material in this article are
deep generative model. Nat. Mach. Intell. 6, 195–208 (2024). included in the article’s Creative Commons licence, unless indicated otherwise in a credit line
17. Nakata, S., Mori, Y. & Tanaka, S. End-to-end protein–ligand complex structure generation to the material. If material is not included in the article’s Creative Commons licence and your
with diffusion-based generative models. BMC Bioinform. 24, 233 (2023). intended use is not permitted by statutory regulation or exceeds the permitted use, you will
18. Baek, M. et al. Accurate prediction of protein–nucleic acid complexes using need to obtain permission directly from the copyright holder. To view a copy of this licence,
RoseTTAFoldNA. Nat. Methods 21, 117–121 (2024). visit http://creativecommons.org/licenses/by/4.0/.
19. Townshend, R. J. L. et al. Geometric deep learning of RNA structure. Science 373,
1047–1051 (2021). © The Author(s) 2024

500 | Nature | Vol 630 | 13 June 2024


Methods it is the protein chain with the most atoms within 10 Å of the ligand; for
bonded ligand scores, it is the bonded polymer chain; and for modified
Full algorithm details residues, it is the chain in which the residue is contained (minus that
Extensive explanations of the components are available in Supplemen- residue). The pocket is used to align the predicted structure to the
tary Methods 2–5. Moreover, pseudocode is available in Supplementary ground-truth structure with least-squares rigid alignment and then
Algorithms 1–31, network diagrams in Figs. 1d and 2a–c and Supple- the r.m.s.d. is computed on all heavy atoms of the ligand.
mentary Fig. 2, input features in Supplementary Table 5 and additional
hyperparameters for training in Supplementary Tables 3, 4 and 7. Recent PDB evaluation set
General model evaluation was performed on our recent PDB set consist-
Training regime ing of 8,856 PDB complexes released between 1 May 2022 and 12 January
No structural data used during training were released after 30 Sep- 2023. The set contains almost all PDB complexes released during that
tember 2021 and, for the model used in PoseBusters evaluations, we period that are less than 5,120 model tokens in size (Supplementary
filtered out PDB32 structures released after 30 September 2021. One Methods 6.1). Single chains and interfaces within each structure were
optimizer step uses a mini batch of 256 input data samples and during scored separately rather than only looking at full complex scores, and
initial training 256 × 48 = 12,288 diffusion samples. For fine-tuning, the clustering was then applied to chains and interfaces so that scores could
number of diffusion samples is reduced to 256 × 32 = 8,192. The model is be aggregated first within clusters and then across clusters for mean
trained in three stages—the initial training with a crop size of 384 tokens scores, or using a weighting of inverse cluster size for distributional
and two sequential fine tuning stages with crop sizes of 640 and 768 statistics (Supplementary Methods 6.2 and 6.4).
tokens. Further details are provided in Supplementary Methods 5.2. Evaluation on ligands excludes standard crystallization aids (Sup-
plementary Table 8), our ligand exclusion list (Supplementary Table 9)
Inference regime and glycans (Supplementary Table 10). Bonded and non-bonded ligands
No inference time templates or reference ligand position features were are evaluated separately. Ions are only included when specifically men-
released after 30 September 2021, and in the case of PoseBusters evalu- tioned (Supplementary Table 11).
ation, an earlier cut-off date of 30 September 2019 was used. The model The recent PDB set is filtered to a low homology subset (Supplemen-
can be run with different random seeds to generate alternative results, tary Methods 6.1) for some results where stated. Homology is defined
with a batch of diffusion samples per seed. Unless otherwise stated, as sequence identity to sequences in the training set and is measured
all results are generated by selecting the top confidence sample from by template search (Supplementary Methods 2.4). Individual poly-
running 5 seeds of the same trained model, with 5 diffusion samples per mer chains in evaluation complexes are filtered out if the maximum
model seed, for a total of 25 samples to choose from. Standard crystal- sequence identity to chains in the training set is greater than 40%, where
lization aids are excluded from predictions (Supplementary Table 8). sequence identity is the percentage of residues in the evaluation set
Results are shown for the top-ranked sample and sample ranking chain that are identical to the training set chain. Individual peptide
depends on whether trying to select the overall best output globally, chains (protein chains with less than 16 residues) are always filtered
or the best output for some chain, interface or modified residue. Global out. For polymer–polymer interfaces, if both polymers have greater
ranking uses a mix of pTM and ipTM along with terms to reduce cases than 40% sequence identity to two chains in the same complex in the
with large numbers of clashes and increase rates of disorder; individual training set, then the interface is filtered out. For interfaces to a peptide,
chain ranking uses a chain specific pTM measure; interface ranking the interface is filtered out if the non-peptide entity has greater than
uses a bespoke ipTM measure for the relevant chain pair; and modi- 40% sequence identity to any chain in the training set.
fied residue ranking uses average pLDDT over the residue of interest To compare the quality of prediction of protein–protein interfaces
(Supplementary Methods 5.9.3). and protein monomers against that of AlphaFold-Multimer (v.2.3)8, and
to compare the dependence of single-protein-chain prediction quality
Metrics on MSA depth, we restrict the low-homology recent PDB set to com-
Evaluation compares a predicted structure to the corresponding plexes with fewer than 20 protein chains and fewer than 2,560 tokens.
ground-truth structure. If the complex contains multiple identical We compare against unrelaxed AlphaFold-Multimer v.2.3 predictions.
entities, assignment of the predicted units to the ground-truth units To study antibody-antigen interface prediction, we filter the low
is found by maximizing LDDT. Assignment in local symmetry groups homology recent PDB set to complexes that contain at least one
of atoms in ligands is solved by exhaustive search over the first 1,000 protein–protein interface where one of the protein chains is in one
per-residue symmetries as given by RDKit. of the two largest PDB chain clusters (these clusters are representa-
We measure the quality of the predictions with DockQ, LDDT or tive of antibodies). We further filter to complexes with at most 2,560
pocket-aligned r.m.s.d. For nucleic–protein interfaces, we measure tokens and with no unknown amino acids in the PDB to allow extensive
interface accuracy through iLDDT, which is calculated from distances comparison against relaxed predictions of AlphaFold-Multimer v2.3.
between atoms across different chains in the interface. DockQ and That leaves 71 antibody–antigen complexes, containing 166 antibody–
iLDDT are highly correlated (Extended Data Fig. 9), so the standard antigen interfaces spanning 65 interface clusters.
cut-offs for DockQ can be translated to equivalent iLDDT cut-offs. MSA depth analysis (Extended Data Fig. 7a) was based on computing
Nucleic acid LDDTs (intrachains and interface) were calculated with an the normalized number of effective sequences (Neff ) for each position of
inclusion radius of 30 Å compared with the usual 15 Å used for proteins, a query sequence. Per-residue Neff values were obtained by counting the
owing to their larger scale. For confidence calibration assessment, we number of non-gap residues in the MSA for this position and weighting
use a bespoke LDDT (LDDT_to_polymer) metric that considers differ- the sequences using the Neff scheme49 with a threshold of 80% sequence
ences from each atom of a given entity to any Cα or C1′ polymer atom identity measured on the region that is non-gap in either sequence.
within its inclusion radius. This is closely related to how the confidence
prediction is trained (Supplementary Methods 4.3.1). Nucleic acid prediction baseline
Pocket-aligned r.m.s.d. is computed as follows: the pocket is defined For benchmarking performance on nucleic acid structure prediction,
as all heavy atoms within 10 Å of any heavy atom of the ligand, restricted we report baseline comparisons to an existing machine learning sys-
to the primary polymer chain for the ligand or modified residue being tem for protein–nucleic acid and RNA tertiary structure prediction,
scored, and further restricted to only backbone atoms for proteins. RoseTTAFold2NA18. We run the open source RF2NA50 with the same
The primary polymer chain is defined variously: for PoseBusters, MSAs as those that were used for AF3 predictions. For comparison
Article
between AF3 and RF2NA, a subset of our recent PDB set was chosen Matplotlib v.3.6.1 (https://github.com/matplotlib/matplotlib), pan-
to meet the RF2NA criteria (<1,000 total residues and nucleotides). das v.2.0.3 (https://github.com/pandas-dev/pandas), statsmodels
As RF2NA was not trained to predict systems with DNA and RNA, analy- v.0.12.2 (https://github.com/statsmodels/statsmodels), RDKit v.4.3.0
sis is limited to targets with only one nucleic acid type. No system was (https://github.com/rdkit/rdkit) and Colab (https://research.google.
publicly available at time of writing for baseline comparisons on data com/colaboratory). TM-align v.20190822 (https://zhanglab.dcmb.
with arbitrary combinations of biomolecular types in PDB. med.umich.edu/TM-align/) was used for computing TM-scores. Struc-
As an additional baseline for RNA tertiary structure prediction, ture visualizations were created in Pymol v.2.55.5 (https://github.com/
we evaluate AF3 performance on CASP15 RNA targets that were pub- schrodinger/pymol-open-source).
licly available as of 1 December 2023 (R1116/8S95, R1117/8FZA, R1126
(downloaded from the CASP15 website https://predictioncenter.org/ Reporting summary
casp15/TARGETS_PDB/R1126.pdb), R1128/8BTZ, R1136/7ZJ4, R1138/ Further information on research design is available in the Nature Port-
[7PTK/7PTL], R1189/7YR7 and R1190/7YR6). We compare the top-1 folio Reporting Summary linked to this article.
ranked predictions and, where multiple ground-truth structures exist
(R1136), the prediction is scored against the closest state. We display
comparisons to RF2NA as a representative machine learning system; Data availability
AIchemy_RNA2 as the top performing entrant with human intervention; All scientific datasets used to create training and evaluation inputs are
and AIchemy_RNA as the top performing machine learning system. All freely available from public sources. Structures from the PDB were used
entrants’ predictions were downloaded from the CASP website and for training and as templates (https://files.wwpdb.org/pub/pdb/data/
scored internally. assemblies/mmCIF/; sequence clusters are available at https://cdn.rcsb.
org/resources/sequence/clusters/clusters-by-entity-40.txt; sequence
PoseBusters data are available at https://files.wwpdb.org/pub/pdb/derived_data/).
While other analyses used an AlphaFold model trained on PDB data Training used a version of the PDB downloaded 12 January 2023, while
released before a cut-off of 30 September 2021, our PoseBusters analy- template search used a version downloaded 28 September 2022. We
sis was conducted on a model (with identical architecture and similar also used the Chemical Components Dictionary downloaded on 19
training schedule) differing only in the use of an earlier 30 Septem- October 2023 (https://www.wwpdb.org/data/ccd). We show experi-
ber 2019 cut-off. This analysis therefore did not include training data, mental structures from the PDB under accession numbers 7PZB
inference time templates or ‘ref_pos’ features released after this date. (ref. 52), 7PNM (ref. 53), 7TQL (ref. 54), 7AU2 (ref. 55), 7U8C (ref. 56),
Inference was performed on the asymmetric unit from specified 7URD (ref. 57), 7WUX (ref. 58), 7QIE (ref. 59), 7T82 (ref. 60), 7CTM
PDBs, with the following minor modifications. In several PDB files, (ref. 61), 8CVP (ref. 42), 8D7U (ref. 42), 7F60 (ref. 62), 8BTI (ref. 63), 7KZ9
chains clashing with the ligand of interest were removed (7O1T, 7PUV, (ref. 64), 7XFA (ref. 65), 7PEU (ref. 66), 7SDW (ref. 67), 7TNZ (ref. 68),
7SCW, 7WJB, 7ZXV, 8AIE). Another PDB entry (8F4J) was too large to 7R6R (ref. 69), 7USR (ref. 70) and 7Z1K (ref. 71). We also used the fol-
inference the entire system (over 5,120 tokens), so we included only lowing publicly available databases for training or evaluation. Detailed
protein chains within 20 Å of the ligand of interest. Five model seeds, usage is described in Supplementary Methods 2.2 and 2.5.2. UniRef90
each with five diffusion samples, were produced per target, resulting in v.2020_01 (https://ftp.ebi.ac.uk/pub/databases/uniprot/previous_
25 predictions, which were ranked by quality and predicted accuracy: releases/release-2020_01/uniref/), UniRef90 v.2020_03 (https://ftp.
the ranking score was calculated from an ipTM aggregate (Supplemen- ebi.ac.uk/pub/databases/uniprot/previous_releases/release-2020_03/
tary Methods 5.9.3 (point 3)), then further divided by 100 if the ligand uniref/), UniRef90 v.2022_05 (https://ftp.ebi.ac.uk/pub/databases/
had chirality errors or had clashes with the protein. uniprot/previous_releases/release-2022_05/uniref/), Uniclust30
For pocket-aligned r.m.s.d., first alignment between the pre- v.2018_08 (https://wwwuser.gwdg.de/~compbiol/uniclust/2018_08/),
dicted and ground-truth structures was conducted by aligning to the Uniclust30 v.2021_03 (https://wwwuser.gwdg.de/~compbiol/uni-
ground-truth pocket backbone atoms (CA, C or N atoms within 10 Å clust/2021_03/), MGnify clusters v.2018_12 (https://ftp.ebi.ac.uk/pub/
of the ligand of interest) from the primary protein chain (the chain databases/metagenomics/peptide_database/2018_12/), MGnify clus-
with the greatest number of contacts within 10 Å of the ligand). The ters v.2022_05 (https://ftp.ebi.ac.uk/pub/databases/metagenomics/
PoseBusters Python package v.0.2.751 was used to score r.m.s.d. and peptide_database/2022_05/), BFD (https://bfd.mmseqs.com), RFam
violations from the pocket-aligned predictions. v.14.9 (https://ftp.ebi.ac.uk/pub/databases/Rfam/14.9/), RNAcentral
While AlphaFold models are ‘blind’ to the protein pocket, docking is v.21.0 (https://ftp.ebi.ac.uk/pub/databases/RNAcentral/releases/21.0/),
often performed with knowledge of the protein pocket residues. For Nucleotide Database (as of 23 February 2023) (https://ftp.ncbi.nlm.
example, Uni-Mol specifies the pocket as any residue within 6 Å of the nih.gov/blast/db/FASTA/nt.gz), JASPAR 2022 (https://jaspar.elixir.
heavy atoms in the ligand of interest26. To evaluate the ability of AF3 to no/downloads/; see https://jaspar.elixir.no/profile-versions for
dock ligands accurately when given pocket information, we fine-tuned version information), SELEX protein sequences from the supplemen-
a 30 September 2019 cut-off AF3 model with an additional token feature tary tables of ref. 72 and SELEX protein sequences from the supple-
specifying pocket–ligand pairs (Supplementary Methods 2.8). Specifi- mentary tables of ref. 73.
cally, an additional token feature was introduced, set to true for a ligand
entity of interest and any pocket residues with heavy atoms within 6 Å
of the ligand entity. At training time, a single random ligand entity is Code availability
chosen to use in this feature. Note that multiple ligand chains with the AlphaFold 3 will be available as a non-commercial usage only server at
same entity (CCD code) may be selected. At inference time, the ligand https://www.alphafoldserver.com, with restrictions on allowed ligands
entity was chosen based on the ligand of interest’s CCD code, so again and covalent modifications. Pseudocode describing the algorithms
multiple ligand chains were occasionally chosen. The results of this is available in the Supplementary Information. Code is not provided.
analysis are shown in Extended Data Fig. 4.
49. Wu, T., Hou, J., Adhikari, B. & Cheng, J. Analysis of several key factors influencing deep
Model performance analysis and visualization learning-based inter-residue contact prediction. Bioinformatics 36, 1091–1098 (2020).
50. DiMaio, F. RF2NA v.0.2. GitHub https://github.com/uw-ipd/RoseTTAFold2NA/releases/tag/
Data analysis used Python v.3.11.7 (https://www.python.org/), NumPy
v0.2 (2023).
v.1.26.3 (https://github.com/numpy/numpy), SciPy v.1.9.3 (https://www. 51. Buttenschoen, M. PoseBusters v.0.2.7. GitHub https://github.com/maabuu/posebusters/
scipy.org/), seaborn v.0.12.2 (https://github.com/mwaskom/seaborn), releases/tag/v0.2.7 (2023).
52. Werel, L. et al. Structural basis of dual specificity of Sinorhizobium meliloti Clr, a cAMP 71. Appel, L.-M. et al. The SPOC domain is a phosphoserine binding module that bridges
and cGMP receptor protein. MBio 14, e0302822 (2023). transcription machinery with co- and post-transcriptional regulators. Nat. Commun. 14,
53. Wang, C. et al. Antigenic structure of the human coronavirus OC43 spike reveals exposed 166 (2023).
and occluded neutralizing epitopes. Nat. Commun. 13, 2921 (2022). 72. Yin, Y. et al. Impact of cytosine methylation on DNA binding specificities of human
54. Lapointe, C. P. et al. eIF5B and eIF1A reorient initiator tRNA to allow ribosomal subunit transcription factors. Science 356, eaaj2239 (2017).
joining. Nature 607, 185–190 (2022). 73. Jolma, A. et al. DNA-dependent formation of transcription factor pairs alters their binding
55. Wilson, L. F. L. et al. The structure of EXTL3 helps to explain the different roles of specificity. Nature 527, 384–388 (2015).
bi-domain exostosins in heparan sulfate synthesis. Nat. Commun. 13, 3314 (2022).
56. Liu, X. et al. Highly active CAR T cells that bind to a juxtamembrane region of mesothelin
and are not blocked by shed mesothelin. Proc. Natl Acad. Sci. USA 119, e2202439119 Acknowledgements We thank G. Arena, Ž. Avsec, A. Baryshnikov, R. Bates, M. Beck, A. Bond,
(2022). N. Bradley-Schmieg, J. Cavojska, B. Coppin, E. Dupont, S. Eddy, M. Fiscato, R. Green, D. Hariharan,
57. Liu, Y. et al. Mechanisms and inhibition of Porcupine-mediated Wnt acylation. Nature 607, K. Holsheimer, N. Hurley, C. Jones, K. Kavukcuoglu, J. Kelly, E. Kim, A. Koivuniemi, O. Kovalevskiy,
816–822 (2022). D. Lasecki, M. Last, A. Laydon, W. McCorkindale, S. Miller, A. Morris, L. Nicolaisen, E. Palmer,
58. Kurosawa, S. et al. Molecular basis for enzymatic aziridine formation via sulfate elimination. A. Paterson, S. Petersen, O. Purkiss, C. Shi, G. Thomas, G. Thornton and H. Tomlinson for their
J. Am. Chem. Soc. 144, 16164–16170 (2022). contributions.
59. Boffey, H. K. et al. Development of selective phosphatidylinositol 5-phosphate 4-kinase
γ inhibitors with a non-ATP-competitive, allosteric binding mode. J. Med. Chem. 65, Author contributions The equally contributing authors are alphabetically ordered, as are the
3359–3370 (2022). remaining core contributor authors (excluding jointly supervising authors) and similar for all
60. Buckley, P. T. et al. Multivalent human antibody-centyrin fusion protein to prevent and remaining non-supervising authors. D.H., M.J. and J.M.J. led the research. M.J., J.M.J. and P.K.
treat Staphylococcus aureus infections. Cell Host Microbe 31, 751–765 (2023). developed research strategy. J. Abramson, V.B., T.G. and C.-C.H. led key research pillars. T.G.
61. Mohapatra, S. B. & Manoj, N. Structural basis of catalysis and substrate recognition by the and A. Židek led the technical framework for research. O.B., H.G. and S.S. coordinated and
NAD(H)-dependent α-d-glucuronidase from the glycoside hydrolase family 4. Biochem. J. managed the research project. J. Abramson, J. Adler, E.A., A.J.B., J.B., V.B., A.I.C.-R., J.D., R.E.,
478, 943–959 (2021). D.A.E., M.F., F.B.F., T.G., C.-C.H., M.J., J.M.J., Y.A.K., A. Potapenko, A. Pritzel, D.R., O.R., A.T., C.T.,
62. Gao, X. et al. Structural basis for Sarbecovirus ORF6 mediated blockage of K.T., L.W., Z.W. and E.D.Z. developed the neural network architecture and training procedure.
nucleocytoplasmic transport. Nat. Commun. 13, 4782 (2022). J. Abramson, A.J.B., J.B., V.B., C.B., S.W.B., A.B., A. Cherepanov, A.I.C.-R., A. Cowie, J.D., T.G., R.J.,
63. Atkinson, B. N. et al. Designed switch from covalent to non-covalent inhibitors of M.O., K.P., D.R., O.R., M.Z., A. Žemgulytė and A. Žídek developed the training, inference, data
carboxylesterase Notum activity. Eur. J. Med. Chem. 251, 115132 (2023). and evaluation infrastructure. J. Abramson, J. Adler, A.J.B., V.B., A.I.C.-R., R.E., D.A.E., T.G., D.H.,
64. Luo, S. et al. Structural basis for a bacterial Pip system plant effector recognition protein. M.J., J.M.J., P.K., K.P., A. Pritzel, O.R., P.S., S.S., A.S., K.T. and L.W. contributed to the writing of the
Proc. Natl Acad. Sci. USA 118, e2019462118 (2021). paper. M.C., C.M.R.L. and S.Y. advised on the project.
65. Liu, C. et al. Identification of monosaccharide derivatives as potent, selective, and orally
bioavailable inhibitors of human and mouse galectin-3. J. Med. Chem. 65, 11084–11099 Competing interests Author-affiliated entities have filed US provisional patent applications
(2022). including 63/611,674, 63/611,638 and 63/546,444 relating to predicting 3D structures of
66. Dombrowski, M., Engeholm, M., Dienemann, C., Dodonova, S. & Cramer, P. Histone H1 molecule complexes using embedding neural networks and generative models. All of the
binding to nucleosome arrays depends on linker DNA length and trajectory. Nat. Struct. authors other than A.B., Y.A.K. and E.D.Z. have commercial interests in the work described.
Mol. Biol. 29, 493–501 (2022).
67. Vecchioni, S. et al. Metal-mediated DNA nanotechnology in 3D: structural library by Additional information
templated diffraction. Adv. Mater. 35, e2210938 (2023). Supplementary information The online version contains supplementary material available at
68. Wang, W. & Pyle, A. M. The RIG-I receptor adopts two different conformations for https://doi.org/10.1038/s41586-024-07487-w.
distinguishing host from viral RNA ligands. Mol. Cell 82, 4131–4144 (2022). Correspondence and requests for materials should be addressed to Max Jaderberg,
69. McGinnis, R. J. et al. A monomeric mycobacteriophage immunity repressor utilizes two Demis Hassabis or John M. Jumper.
domains to recognize an asymmetric DNA sequence. Nat. Commun. 13, 4105 (2022). Peer review information Nature thanks Justas Dapkunas, Roland Dunbrack and Hashim
70. Dietrich, M. H. et al. Nanobodies against Pfs230 block Plasmodium falciparum Al-Hashimi for their contribution to the peer review of this work.
transmission. Biochem. J. 479, 2529–2546 (2022). Reprints and permissions information is available at http://www.nature.com/reprints.
Article

Extended Data Fig. 1 | Disordered region prediction. a, Example prediction yellow: 50 < pLDDT <= 70, light blue: 70 < pLDDT <= 90, and dark blue: 90 <=
for a disordered protein from AlphaFoldMultimer v2.3, AlphaFold 3, and pLDDT < 100). b, Predictions of disorder across residues in proteins in the CAID
AlphaFold 3 trained without the disordered protein PDB cross distillation set. 2 set, which are also low homology to the AF3 training set. Prediction methods
Protein is DP02376 from the CAID 2 (Critical Assessment of protein Intrinsic include RASA (relative accessible surface area) and pLDDT (N = 151 proteins;
Disorder prediction) set. Predictions coloured by pLDDT (orange: pLDDT <= 50, 46,093 residues).
Extended Data Fig. 2 | Accuracy across training. Training curves for initial to 256 * 32 = 8,192. The scatter plot shows the raw data points and the lines show
training and fine tuning showing LDDT (local distance difference test) on our the smoothed performance using a median filter with a kernel width of 9 data
evaluation set as a function of optimizer steps. One optimizer step uses a points. The dashed lines mark the points where the smoothed performance
mini batch of 256 trunk samples and during initial training 256 * 48 = 12,288 passes 90% and 97% of the initial training maximum for the first time.
diffusion samples. For fine tuning the number of diffusion samples is reduced
Article

Extended Data Fig. 3 | AlphaFold 3 predictions of PoseBusters examples b, Pseudomonas sp. PDC86 Aapf bound to HEHEAA (PDB ID 7KZ9, ligand
for which Vina and Gold were inaccurate. Predicted protein chains are RMSD: 1.3 Å). c, Human Galectin-3 carbohydrate-recognition domain in
shown in blue, predicted ligands in orange, and ground truth in grey. a, Human complex with compound 22 (PDB ID 7XFA, ligand RMSD: 0.44 Å).
Notum bound to inhibitor ARUK3004556 (PDB ID 8BTI, ligand RMSD: 0.65 Å).
Extended Data Fig. 4 | PoseBusters analysis. a, Comparison of AlphaFold 3 those characterized as “common natural” ligands and others. “Common
and baseline method protein-ligand binding success on the PoseBusters natural” ligands are defined as those which occur greater than 100 times in the
Version 1 benchmark set (V1, August 2023 release). Methods classified by the PDB and which are not non-natural (by visual inspection). A full list may be
extent of ground truth information used to make predictions. Note all methods found in Supplementary Table 15. Dark bar indicates RMSD < 2 Å and passing
that use pocket residue information except for UMol and AF3 also use ground PoseBusters validity checks (PB-valid). e, PoseBusters V2 structural accuracy
truth holo protein structures. b, PoseBusters Version 2 (V2, November 2023 and validity. Dark bar indicates RMSD < 2 Å and passing PoseBusters validity
release) comparison between the leading docking method Vina and AF3 2019 checks (PB-valid). Light hashed bar indicates RMSD < 2 Å but not PB valid.
(two-sided Fisher exact test, N = 308 targets, p = 2.3 * 10 −8). c, PoseBusters V2 f, PoseBusters V2 detailed validity check comparison. Error bars indicate exact
results of AF3 2019 on targets with low, moderate, and high protein sequence binomial distribution 95% confidence intervals. N = 427 targets for RoseTTAFold
homology (integer ranges indicate maximum sequence identity with proteins All-Atom and 428 targets for all others in Version 1; 308 targets in Version 2.
in the training set). d, PoseBusters V2 results of AF3 2019 with ligands split by
Article

Extended Data Fig. 5 | Nucleic acid prediction accuracy and confidences. test, p = 5.2 * 10 −12). Note RF2NA was only trained and evaluated on duplexes
a, CASP15 RNA prediction accuracy from AIChemy_RNA (the top AI-based (chains forming at least 10 hydrogen bonds), but some DNA structures in this
submission), RoseTTAFold2NA (the AI-based method capable of predicting set may not be duplexes. Box, centerline, and whiskers boundaries are at
proteinRNA complexes), and AlphaFold 3. Ten of the 13 targets are available in (25%, 75%) intervals, median, and (5%, 95%) intervals. c Predicted structure of
the PDB or via the CASP15 website for evaluation. Predictions are downloaded a mycobacteriophage immunity repressor protein bound to double stranded
from the CASP website for external models. b, Accuracy on structures DNA (PDB ID 7R6R), coloured by pLDDT (left; orange: 0–50, yellow: 50–70, cyan
containing low homology RNA-only or DNA-only complexes from the recent 70–90, and blue 90–100) and chain id (right). Note the disordered N-terminus
PDB evaluation set. Comparison between AlphaFold 3 and RoseTTAFold2NA not entirely shown. d, Predicted aligned error (PAE) per token-pair for the
(RF2NA) (RNA: N = 29 structures, paired Wilcoxon signed-rank test, prediction in c with rows and columns labelled by chain id and green gradient
p = 1.6 * 10 −7; DNA: N = 63 structures, paired two-sided Wilcoxon signed-rank indicating PAE.
Extended Data Fig. 6 | Analysis and examples for modified proteins and pocketaligned RMSDCα 10.261 Å). When excluding phosphorylation, AlphaFold
nucleic acids. a, Accuracy on structures. containing common phosphorylation 3 provides lower pLDDT confidence on the phosphopeptide. c, Structure of
residues (SEP, TPO, PTR, NEP, HIP) from the recent PDB evaluation set. parkin bound to two phospho-ubiquitin molecules (PDB ID 7US1), predictions
Comparison between AlphaFold 3 with phosphorylation modelled, and similarly coloured by pLDDT. Left: Phosphorylation modelled (mean pocket-
AlphaFold 3 without modelling phosphorylation (N = 76 clusters, paired aligned RMSDCα 0.424 Å). Right: Without modelling phosphorylation (mean
two-sided Wilcoxon signed-rank test, p = 1.6 * 10 −4). Note, to predict a structure pocket-aligned RMSDCα 9.706 Å). When excluding phosphorylation, AlphaFold
without modelling phosphorylation, we predict the parent (standard) residue 3 provides lower pLDDT confidence on the interface residues of the incorrectly
in place of the modification. AlphaFold 3 generally achieves better backbone predicted ubiquitin. d, Example structures with modified nucleic acids. Left:
accuracy when modelling phosphorylation. Error bars indicate exact binomial Guanosine monophosphate in RNA (PDB ID 7TNZ, mean pocket-aligned modified
distribution 95% confidence intervals. b, SPOC domain of human SHARP in residue RMSD 0.840 Å). Right: Methylated DNA cytosines (PDB ID 7SDW, mean
complex with phosphorylated RNA polymerase II C-terminal domain (PDB ID pocket-aligned modified residue RMSD 0.502 Å). Welabel residues of the
7Z1K), predictions coloured by pLDDT (orange: 0–50, yellow: 50–70, cyan predicted structure for reference. Ground truth structure in grey; predicted
70–90, and blue 90–100). Left: Phosphorylation modelled (mean pocket- protein in blue, predicted RNA in purple, predicted DNA in magenta, predicted
aligned RMSDCα 2.104 Å). Right: Without modelling phosphorylation (mean ions in orange, with predicted modifications highlighted via spheres.
Article

Extended Data Fig. 7 | Model accuracy with MSA size and number of seeds. obtained through Gaussian kernel average smoothing (window size is 0.2 units
a, Effect of MSA depth on protein prediction accuracy. Accuracy is given as in log10(N eff)); the shaded area is the 95% confidence interval estimated using
single chain LDDT score and MSA depth is computed by counting the number bootstrap of 10,000 samples. b, Increase in ranked accuracy with number of
of non-gap residues for each position in the MSA using the N eff weighting seeds for different molecule types. Predictions are ranked by confidence,
scheme and taking the median across residues (see Methods for details on N eff). and only the most confident per interface is scored. Evaluated on the low
MSA used for AF-M 2.3 differs slightly from AF3; the data uses the AF3 MSA homology recent PDB set, filtered to less than 1,536 tokens. Number of clusters
depth for both to make the comparison clearer. The analysis uses every protein evaluated: dna-intra = 386, protein-intra = 875, rnaintra = 78, protein-dna = 307,
chain in the low homology Recent PDB set, restricted to chains in complexes protein-rna = 102, protein-protein (antibody = False) = 697, protein-protein
with fewer than 20 protein chains and fewer than 2,560 tokens (see Methods (antibody = True) = 58. Confidence intervals are 95% bootstraps over 1,000
for details on Recent PDB set and comparisons to AF-M 2.3). The curves are samples.
Extended Data Fig. 8 | Relationship between confidence and accuracy The ions group includes both metals and nonmetals. N values report the
for protein interactions with ions, bonded ligands and bonded glycans. number of clusters in each band. For a similar analysis on general ligand-protein
Accuracy is given as the percentage of interface clusters under various pocket- interfaces, see Fig. 4 of main text.
aligned RMSD thresholds, as a function of the chain pair ipTM of the interface.
Article

Extended Data Fig. 9 | Correlation of DockQ and iLDDT for protein-protein interfaces. One data point per cluster, 4,182 clusters shown. Line of best fit with a
Huber regressor with epsilon 1. DockQ categories correct (>0.23), and very high accuracy (>0.8) correspond to iLDDTs of 23.6 and 77.6 respectively.
Extended Data Table 1 | Prediction accuracy across biomolecular complexes

AlphaFold 3 Performance on PoseBusters V1 (August 2023 release), PoseBusters V2 (November 6th 2023 release), and our Recent PDB evaluation set. For ligands and nucleic acids N indicates
number of structures; for covalent modifications and proteins N indicates number of clusters.

You might also like