Sequence-Only Prediction of Binding Affinity Changes: A Robust and Interpretable Model For Antibody Engineering
Sequence-Only Prediction of Binding Affinity Changes: A Robust and Interpretable Model For Antibody Engineering
1–8
PAPER
Chen Liu,1 Mingchen Li,1 Yang Tan,1 Wenrui Gou,1 Guisheng Fan1,∗
and Bingxin Zhou2,∗
1
School of Information Science and Engineering, East China University of Science and Technology, Shanghai, 200237, China and 2 Institute
of Natural Sciences, Shanghai Jiao Tong University, shanghai, 200240, China
∗
Corresponding author. [email protected]; [email protected]
Abstract
Motivation: A pivotal area of research in antibody engineering is to find effective modifications that enhance antibody-
antigen binding affinity. Traditional wet-lab experiments assess mutants in a costly and time-consuming manner. Emerging
deep learning solutions offer an alternative by modeling antibody structures to predict binding affinity changes. However,
they heavily depend on high-quality complex structures, which are frequently unavailable in practice. Therefore, We
propose ProtAttBA, a deep learning model that predicts binding affinity changes based solely on the sequence information
of antibody-antigen complexes.
Results: ProtAttBA employs a pre-training phase to learn protein sequence patterns, following a supervised training
phase using labeled antibody-antigen complex data to train a cross-attention-based regressor for predicting binding
affinity changes. We evaluated ProtAttBA on three open benchmarks under different conditions. Compared to both
sequence- and structure-based prediction methods, our approach achieves competitive performance, demonstrating
notable robustness, especially with uncertain complex structures. Notably, our method possesses interpretability from the
attention mechanism. We show that the learned attention scores can identify critical residues with impacts on binding
affinity. This work introduces a rapid and cost-effective computational tool for antibody engineering, with the potential
to accelerate the development of novel therapeutic antibodies.
Availability and implementation: Source codes and data are available at https://github.com/code4luck/ProtAttBA
Key words: deep learning, antibody engineering, binding affinity changes prediction, pre-trained protein language model
© The Author 2022. Published by Oxford University Press. All rights reserved. For permissions, please e-mail:
[email protected]
1
2 Author Name et al.
RoPE
wild-type Vab
<latexit sha1_base64="sdarEuPJLUIonzz/0cznb9x+JI8=">AAACJXicbZDLSsNAFIYnXmu9RV26GSwFVyURL124KLjpsoK9QBvDZDpph84kYWYilJCXceOruHFhEcGVr+IkTUFbDwx8/Ocfzjm/FzEqlWV9GWvrG5tb26Wd8u7e/sGheXTckWEsMGnjkIWi5yFJGA1IW1HFSC8SBHGPka43ucv63SciJA2DBzWNiMPRKKA+xUhpyTVvq8mAIzX2/KSZuslAcIhG6WMOXKUZRYJykpYXts7C5qWuWbFqVl5wFewCKqColmvOBsMQx5wECjMkZd+2IuUkSCiKWTYjliRCeIJGpK8xQJxIJ8mvTGFVK0Poh0K/QMFc/f0jQVzKKfe0M1tVLvcy8b9eP1Z+3UloEMWKBHg+yI8ZVCHMIoNDKghWbKoBYUH1rhCPkUBY6WDLOgR7+eRV6FzU7Ova1f1lpVEv4iiBU3AGzoENbkADNEELtAEGz+AVvIOZ8WK8GR/G59y6ZhR/TsCfMr5/AMUjpqQ=</latexit>
0 0
antigen wt
<latexit sha1_base64="5/XtvMIG5/xVnNhO+H0hz0gI8ng=">AAACBXicbZC7TsMwFIadcivlFmCEwaJCYqoSxKVjJRbGItGL1ITIcZ3WquNEtgOqoiwsvAoLAwix8g5svA1OmgFafsnSp/+cI5/z+zGjUlnWt1FZWl5ZXauu1zY2t7Z3zN29rowSgUkHRywSfR9JwignHUUVI/1YEBT6jPT8yVVe790TIWnEb9U0Jm6IRpwGFCOlLc88dEKkxn6QBpmXOiKEyM/uCnhQmWfWrYZVCC6CXUIdlGp75pczjHASEq4wQ1IObCtWboqEopiRrOYkksQIT9CIDDRyFBLppsUVGTzWzhAGkdCPK1i4vydSFEo5DX3dme8s52u5+V9tkKig6aaUx4kiHM8+ChIGVQTzSOCQCoIVm2pAWFC9K8RjJBBWOriaDsGeP3kRuqcN+6JxfnNWbzXLOKrgAByBE2CDS9AC16ANOgCDR/AMXsGb8WS8GO/Gx6y1YpQz++CPjM8fMtGZAA==</latexit>
Hwt
<latexit sha1_base64="MNck6rUx2C/GGbcmkAcKfsVkApg=">AAACEHicbZC7TsMwFIYdrqXcAowsFhWCqUoQl46VWDoWiV6kJkSO67RW7SSyHVAV5RFYeBUWBhBiZWTjbXDSDNDyS5Y+/ecc+ZzfjxmVyrK+jaXlldW19cpGdXNre2fX3NvvyigRmHRwxCLR95EkjIako6hipB8LgrjPSM+fXOf13j0RkkbhrZrGxOVoFNKAYqS05ZknqcORGvtB2sq81BEcIj+7K+BBZTnFgnKSeWbNqluF4CLYJdRAqbZnfjnDCCechAozJOXAtmLlpkgoihnJqk4iSYzwBI3IQGOIOJFuWhyUwWPtDGEQCf1CBQv390SKuJRT7uvOfHs5X8vN/2qDRAUNN6VhnCgS4tlHQcKgimCeDhxSQbBiUw0IC6p3hXiMBMJKZ1jVIdjzJy9C96xuX9Yvbs5rzUYZRwUcgiNwCmxwBZqgBdqgAzB4BM/gFbwZT8aL8W58zFqXjHLmAPyR8fkDb+eeFw==</latexit>
QVQLQESVGL VAPV …
<latexit sha1_base64="IONp6EyIy3q2Osu7Hped6LJTb6M=">AAACMXicbZDLSgMxFIYzXmu9VV26CRZBN2VGvHRZcdNlBXuBtpZMmqnBJDMkZ5QyzCu58U3ETReKuPUlzIwteDsQ+PjPf8g5vx8JbsB1J87c/MLi0nJhpbi6tr6xWdrabpkw1pQ1aShC3fGJYYIr1gQOgnUizYj0BWv7txdZv33HtOGhuoJxxPqSjBQPOCVgpUGpnvS0xOcAKj1IepLAjR8k9XSQy8RPr3O4hzSjSHPJ0sPi/sx4OTOO0kGp7FbcvPBf8KZQRtNqDEpPvWFIY8kUUEGM6XpuBP2EaOBUsLTYiw2LCL0lI9a1qIhkpp/kF6d43ypDHITaPgU4V79PJEQaM5a+dWarmt+9TPyv140hqPYTrqIYmKJfHwWxwBDiLD485JpREGMLhGpud8X0hmhCwYZctCF4v0/+C62jindaObk8Lteq0zgKaBftoQPkoTNUQ3XUQE1E0QN6Ri/o1Xl0Js6b8/5lnXOmMzvoRzkfnyQIq1w=</latexit>
Hwt
<latexit sha1_base64="XDnnjdND4qVIJaF2wBNJWUsaX/o=">AAACBXicbZC7TsMwFIYdrqXcAowwWFRITFWCuHSsxNKxSPQiNSFyXKe16jiR7YCqKAsLr8LCAEKsvAMbb4OTZoCWX7L06T/nyOf8fsyoVJb1bSwtr6yurVc2qptb2zu75t5+V0aJwKSDIxaJvo8kYZSTjqKKkX4sCAp9Rnr+5Dqv9+6JkDTit2oaEzdEI04DipHSlmceOSFSYz9IW5mXOiKEyM/uCnhQmWfWrLpVCC6CXUINlGp75pczjHASEq4wQ1IObCtWboqEopiRrOokksQIT9CIDDRyFBLppsUVGTzRzhAGkdCPK1i4vydSFEo5DX3dme8s52u5+V9tkKig4aaUx4kiHM8+ChIGVQTzSOCQCoIVm2pAWFC9K8RjJBBWOriqDsGeP3kRumd1+7J+cXNeazbKOCrgEByDU2CDK9AELdAGHYDBI3gGr+DNeDJejHfjY9a6ZJQzB+CPjM8fAx+Y4g==</latexit>
ab conv1D ab Attn(Hwt
ab ) ConvPool fab
antibody D I V L TQPAS L A VGE …
+
RoPE
mutant Kab
<latexit sha1_base64="gcQimS5x5J3bRa/Im8BGnkn/lTE=">AAACJXicbZDLSsNAFIYn9VbrLerSzWApuCqJeOnCRcFNwU0Fe4E2hsl00g6dScLMRCghL+PGV3HjwiKCK1/FSdqCth4Y+PjPP5xzfi9iVCrL+jIKa+sbm1vF7dLO7t7+gXl41JZhLDBp4ZCFoushSRgNSEtRxUg3EgRxj5GON77N+p0nIiQNgwc1iYjD0TCgPsVIack1bypJnyM18vykkbpJX3CIhuljDlylGUWCcpKWFra7hc1LXbNsVa284CrYcyiDeTVdc9ofhDjmJFCYISl7thUpJ0FCUcyyGbEkEcJjNCQ9jQHiRDpJfmUKK1oZQD8U+gUK5urvHwniUk64p53ZqnK5l4n/9Xqx8mtOQoMoViTAs0F+zKAKYRYZHFBBsGITDQgLqneFeIQEwkoHW9Ih2Msnr0L7vGpfVS/vL8r12jyOIjgBp+AM2OAa1EEDNEELYPAMXsE7mBovxpvxYXzOrAVj/ucY/Cnj+we0CaaZ</latexit>
Qab
<latexit sha1_base64="cWPFUy5mNBggcpZFymV9ZsQG3Tg=">AAACJXicbZDLSsNAFIYn9VbrLerSzWApuCqJeOnCRcFNly3YC7SxTKaTduhMEmYmQgl5GTe+ihsXFhFc+SpO0hS09cDAx3/+4ZzzuyGjUlnWl1HY2Nza3inulvb2Dw6PzOOTjgwigUkbBywQPRdJwqhP2ooqRnqhIIi7jHTd6X3a7z4RIWngP6hZSByOxj71KEZKS0PzrhIPOFIT14sbyTAeCA7ROHnMgKskpVBQTpLS0tZa2txkaJatqpUVXAc7hzLIqzk054NRgCNOfIUZkrJvW6FyYiQUxSydEUkSIjxFY9LX6CNOpBNnVyawopUR9AKhn69gpv7+ESMu5Yy72pmuKld7qfhfrx8pr+bE1A8jRXy8GORFDKoAppHBERUEKzbTgLCgeleIJ0ggrHSwJR2CvXryOnQuq/ZN9bp1Va7X8jiK4Aycgwtgg1tQBw3QBG2AwTN4Be9gbrwYb8aH8bmwFoz8zyn4U8b3D71dpp8=</latexit>
Hwt
<latexit sha1_base64="4U/kGZa+hlQ6qJoGw5brqVMYnBQ=">AAACBXicbZC7TsMwFIYdrqXcAowwWFRITFWCuHSsxNKxSPQiNSFyXKe16jiR7YCqKAsLr8LCAEKsvAMbb4OTZoCWX7L06T/nyOf8fsyoVJb1bSwtr6yurVc2qptb2zu75t5+V0aJwKSDIxaJvo8kYZSTjqKKkX4sCAp9Rnr+5Dqv9+6JkDTit2oaEzdEI04DipHSlmceOSFSYz9IW5mXOiKEaJTdFfCgMs+sWXWrEFwEu4QaKNX2zC9nGOEkJFxhhqQc2Fas3BQJRTEjWdVJJIkRnqARGWjkKCTSTYsrMniinSEMIqEfV7Bwf0+kKJRyGvq6M99Zztdy87/aIFFBw00pjxNFOJ59FCQMqgjmkcAhFQQrNtWAsKB6V4jHSCCsdHBVHYI9f/IidM/q9mX94ua81myUcVTAITgGp8AGV6AJWqANOgCDR/AMXsGb8WS8GO/Gx6x1yShnDsAfGZ8/CuWY5w==</latexit>
<latexit sha1_base64="JCb/elwF4+WvhRBQjTRrz/R8yqo=">AAACBXicbZC7TsMwFIadcivlFmCEwaJCYqoSxKVjJRbGItGL1ITIcZ3WquNEtgOqoiwsvAoLAwix8g5svA1OmgFafsnSp/+cI5/z+zGjUlnWt1FZWl5ZXauu1zY2t7Z3zN29rowSgUkHRywSfR9JwignHUUVI/1YEBT6jPT8yVVe790TIWnEb9U0Jm6IRpwGFCOlLc88dEKkxn6QBpmXOiKEaJTdFfCgMs+sWw2rEFwEu4Q6KNX2zC9nGOEkJFxhhqQc2Fas3BQJRTEjWc1JJIkRnqARGWjkKCTSTYsrMnisnSEMIqEfV7Bwf0+kKJRyGvq6M99Zztdy87/aIFFB000pjxNFOJ59FCQMqgjmkcAhFQQrNtWAsKB6V4jHSCCsdHA1HYI9f/IidE8b9kXj/Oas3mqWcVTBATgCJ8AGl6AFrkEbdAAGj+AZvII348l4Md6Nj1lrxShn9sEfGZ8/OpeZBQ==</latexit>
<latexit sha1_base64="ALtgul4IRYEvUjjq7Bg9bUPm88s=">AAACMXicbZDLSgMxFIYzXmu9VV26CRZBN2VGvC0VN122YC/Q1pJJMzWYZIbkjFKGeSU3vom46UIRt76EmWnF64HAx3/+Q875/UhwA647dmZm5+YXFgtLxeWV1bX10sZm04SxpqxBQxHqtk8ME1yxBnAQrB1pRqQvWMu/ucj6rVumDQ/VJYwi1pNkqHjAKQEr9UvVpKslPgdQ6V7SlQSu/SCppv1cJsP0Koc7SDOKNJcs3S/ufhrrX8Z+qexW3LzwX/CmUEbTqvVLj91BSGPJFFBBjOl4bgS9hGjgVLC02I0Niwi9IUPWsaiIZKaX5BeneNcqAxyE2j4FOFe/TyREGjOSvnVmq5rfvUz8r9eJITjtJVxFMTBFJx8FscAQ4iw+POCaURAjC4RqbnfF9JpoQsGGXLQheL9P/gvNg4p3XDmqH5bPTqdxFNA22kF7yEMn6AxVUQ01EEX36Ak9oxfnwRk7r87bxDrjTGe20I9y3j8ALG6rYQ==</latexit>
RoPE
Vag
<latexit sha1_base64="e0vO1JLXclNP+ckTQ6Exdo54jSY=">AAACJXicbZDLSsNAFIYnXmu9RV26GSwFVyURL124KLjpsoK9QBvDZDpph84kYWYilJCXceOruHFhEcGVr+IkjaitBwY+/vMP55zfixiVyrI+jJXVtfWNzdJWeXtnd2/fPDjsyDAWmLRxyELR85AkjAakrahipBcJgrjHSNeb3GT97gMRkobBnZpGxOFoFFCfYqS05JrX1WTAkRp7ftJM3WQgOESj9D4HrtKMIkE5Scvfts6PzTUrVs3KCy6DXUAFFNVyzdlgGOKYk0BhhqTs21aknAQJRTHLZsSSRAhP0Ij0NQaIE+kk+ZUprGplCP1Q6BcomKu/fySISznlnnZmq8rFXib+1+vHyq87CQ2iWJEAzwf5MYMqhFlkcEgFwYpNNSAsqN4V4jESCCsdbFmHYC+evAyds5p9Wbu4Pa806kUcJXAMTsApsMEVaIAmaIE2wOARPINXMDOejBfjzXifW1eM4s8R+FPG5xfMvKap</latexit>
D I V L _ Q P A _ _ H V G E concat FC layers G
<latexit sha1_base64="24JAR4rSblfNsDwDXR1mU1UW68U=">AAACQXicbZBLaxsxFIU1SfNyXm6yzEbUGNKNmQl5eOmSQrJMoHYMtms08h1HWNIM0p2CGeavdZN/0F333WSRELLNppqxDW2SCxIf556rxwkTKSz6/m9vafnDyura+kZlc2t7Z7f6ca9j49RwaPNYxqYbMgtSaGijQAndxABToYSbcHJe9G9+gLEi1t9wmsBAsbEWkeAMnTSsdutZ3yj6BVHnh1lfMbwNo+wyH5YyC/PvJSjMC0qMUJB/rtQXxuuFcZxX+l9BIpvt9GJYrfkNvyz6FoI51Mi8robVX/1RzFMFGrlk1vYCP8FBxgwKLsEdn1pIGJ+wMfQcaqbADrIygZzWnTKiUWzc0khL9d+JjClrpyp0zuLl9nWvEN/r9VKMmoNM6CRF0Hx2UZRKijEt4qQjYYCjnDpg3Aj3VspvmWEcXegVF0Lw+stvoXPUCE4bJ9fHtVZzHsc6OSCfyCEJyBlpkUtyRdqEk5/kD3kgj96dd+89ec8z65I3n9kn/5X38hcGCrEU</latexit>
0 0
<latexit sha1_base64="ttUEGYQMWN8rmuw6aaEi/2kpwJc=">AAACEHicbZC7TsMwFIadcivlFmBksagQTFWCuHSsxNKxSPQiNSFyXKe1aieR7SBVUR6BhVdhYQAhVkY23gYnzQAtv2Tp03/Okc/5/ZhRqSzr26isrK6tb1Q3a1vbO7t75v5BT0aJwKSLIxaJgY8kYTQkXUUVI4NYEMR9Rvr+9Cav9x+IkDQK79QsJi5H45AGFCOlLc88TR2O1MQP0nbmpY7gEPnZfQFcZTnFgnKSeWbdaliF4DLYJdRBqY5nfjmjCCechAozJOXQtmLlpkgoihnJak4iSYzwFI3JUGOIOJFuWhyUwRPtjGAQCf1CBQv390SKuJQz7uvOfHu5WMvN/2rDRAVNN6VhnCgS4vlHQcKgimCeDhxRQbBiMw0IC6p3hXiCBMJKZ1jTIdiLJy9D77xhXzUuby/qrWYZRxUcgWNwBmxwDVqgDTqgCzB4BM/gFbwZT8aL8W58zFsrRjlzCP7I+PwBYEeeDQ==</latexit>
Hmt
<latexit sha1_base64="jbiGIXp6MQGQXsTua88dZcptyUw=">AAACMXicbZDLSgMxFIYzXmu9VV26CRZBN2VGvC0VN122YFuhrSWTZmowyQzJGaEM80pufBNx04Uibn0JM9MWtPVA4OM//yHn/H4kuAHXHTkLi0vLK6uFteL6xubWdmlnt2nCWFPWoKEI9Z1PDBNcsQZwEOwu0oxIX7CW/3iT9VtPTBseqlsYRqwryUDxgFMCVuqVqklHS3wNoNKjpCMJPPhBUk17uUz89D4HCWlGkeaSpcfFw6mxPjUO0l6p7FbcvPA8eBMoo0nVeqXXTj+ksWQKqCDGtD03gm5CNHAqWFrsxIZFhD6SAWtbVEQy003yi1N8aJU+DkJtnwKcq78nEiKNGUrfOrNVzWwvE//rtWMILrsJV1EMTNHxR0EsMIQ4iw/3uWYUxNACoZrbXTF9IJpQsCEXbQje7Mnz0DypeOeVs/pp+epyEkcB7aMDdIQ8dIGuUBXVUANR9Ize0Dv6cF6ckfPpfI2tC85kZg/9Kef7BxOMq1I=</latexit>
Hmt mt
<latexit sha1_base64="ho8+vVISAMcKMfsaRr6RfuuGYI0=">AAACBXicbZC7TsMwFIadcivlFmCEwaJCYqoSxKVjJZaORaIXqQmR4zqtVTuJbAepirKw8CosDCDEyjuw8TY4aQZo+SVLn/5zjnzO78eMSmVZ30ZlZXVtfaO6Wdva3tndM/cPejJKBCZdHLFIDHwkCaMh6SqqGBnEgiDuM9L3pzd5vf9AhKRReKdmMXE5Goc0oBgpbXnmscORmvhB2s681BEcIj+7L4CrzDPrVsMqBJfBLqEOSnU888sZRTjhJFSYISmHthUrN0VCUcxIVnMSSWKEp2hMhhpDxIl00+KKDJ5qZwSDSOgXKli4vydSxKWccV935jvLxVpu/lcbJipouikN40SREM8/ChIGVQTzSOCICoIVm2lAWFC9K8QTJBBWOriaDsFePHkZeucN+6pxeXtRbzXLOKrgCJyAM2CDa9ACbdABXYDBI3gGr+DNeDJejHfjY95aMcqZQ/BHxucP89SY2A==</latexit>
Attn(Hmt
<latexit sha1_base64="MRdwnG6nemvCZ9LDCHdF0Dx716I=">AAACBXicbZC7TsMwFIYdrqXcAowwWFRITFWCuHSsxMJYJHqRmhA5rtNatZ3IdpCqKAsLr8LCAEKsvAMbb4OTdoCWX7L06T/nyOf8YcKo0o7zbS0tr6yurVc2qptb2zu79t5+R8WpxKSNYxbLXogUYVSQtqaakV4iCeIhI91wfF3Uuw9EKhqLOz1JiM/RUNCIYqSNFdhHHkd6FEZZlAeZJzlEYX5fAtd5YNeculMKLoI7gxqYqRXYX94gxiknQmOGlOq7TqL9DElNMSN51UsVSRAeoyHpGxSIE+Vn5RU5PDHOAEaxNE9oWLq/JzLElZrw0HQWO6v5WmH+V+unOmr4GRVJqonA04+ilEEdwyISOKCSYM0mBhCW1OwK8QhJhLUJrmpCcOdPXoTOWd29rF/cnteajVkcFXAIjsEpcMEVaIIb0AJtgMEjeAav4M16sl6sd+tj2rpkzWYOwB9Znz8jlZj2</latexit>
conv1D
RoPE
Masked Language Model (frozen) Kag ab ) ConvPool fab
<latexit sha1_base64="NLjxZ0DSNpYu++ceE+Mg/+AcyAY=">AAACJXicbZDLSsNAFIYn9VbrLerSzWApuCqJeOnCRcFNwU0Fe4E2hsl00g6dScLMRCghL+PGV3HjwiKCK1/FSVpRWw8MfPznH845vxcxKpVlfRiFldW19Y3iZmlre2d3z9w/aMswFpi0cMhC0fWQJIwGpKWoYqQbCYK4x0jHG19n/c4DEZKGwZ2aRMThaBhQn2KktOSaV5Wkz5EaeX7SSN2kLzhEw/Q+B67SjCJBOUlL37abH5trlq2qlRdcBnsOZTCvpmtO+4MQx5wECjMkZc+2IuUkSCiKWTYjliRCeIyGpKcxQJxIJ8mvTGFFKwPoh0K/QMFc/f0jQVzKCfe0M1tVLvYy8b9eL1Z+zUloEMWKBHg2yI8ZVCHMIoMDKghWbKIBYUH1rhCPkEBY6WBLOgR78eRlaJ9W7Yvq+e1ZuV6bx1EER+AYnAAbXII6aIAmaAEMHsEzeAVT48l4Md6M95m1YMz/HII/ZXx+Abuipp4=</latexit>
ab ab
+
0 0 mt
<latexit sha1_base64="wlf55gpMUCw6WljtkI9Uv0TyjGE=">AAACEHicbZC7TsMwFIadcivlFmBksagQTFWCuHSsxNKxSPQiNSFyXKe1aieR7SBVUR6BhVdhYQAhVkY23gYnzQAtv2Tp03/Okc/5/ZhRqSzr26isrK6tb1Q3a1vbO7t75v5BT0aJwKSLIxaJgY8kYTQkXUUVI4NYEMR9Rvr+9Cav9x+IkDQK79QsJi5H45AGFCOlLc88TR2O1MQP0nbmpY7gEI2z+wK4ynKKBeUk88y61bAKwWWwS6iDUh3P/HJGEU44CRVmSMqhbcXKTZFQFDOS1ZxEkhjhKRqTocYQcSLdtDgogyfaGcEgEvqFChbu74kUcSln3Ned+fZysZab/9WGiQqabkrDOFEkxPOPgoRBFcE8HTiigmDFZhoQFlTvCvEECYSVzrCmQ7AXT16G3nnDvmpc3l7UW80yjio4AsfgDNjgGrRAG3RAF2DwCJ7BK3gznowX4934mLdWjHLmEPyR8fkDaD+eEg==</latexit>
Hmt
<latexit sha1_base64="ciIibWhiW/O3uZWdLPLVoMUNyQ4=">AAACMXicbZDLSgMxFIYz9V5vVZdugqWgmzIjXrpU3HRZwbZCW0smzdTQJDMkZ4QyzCu58U3ETReKuPUlzIwVtXog8PGf/5Bzfj8S3IDrTpzC3PzC4tLySnF1bX1js7S13TJhrClr0lCE+tonhgmuWBM4CHYdaUakL1jbH11k/fYd04aH6grGEetJMlQ84JSAlfqletLVEp8DqHQ/6UoCt36Q1NN+LpNhepODhDSjSHPJ0oNi5ct4+W3sl8pu1c0L/wVvCmU0rUa/9NgdhDSWTAEVxJiO50bQS4gGTgVLi93YsIjQERmyjkVFJDO9JL84xRWrDHAQavsU4Fz9OZEQacxY+taZrWpme5n4X68TQ1DrJVxFMTBFPz8KYoEhxFl8eMA1oyDGFgjV3O6K6S3RhIINuWhD8GZP/gutw6p3Uj2+PCqf1aZxLKNdtIf2kYdO0RmqowZqIoru0RN6Ri/OgzNxXp23T2vBmc7soF/lvH8AG/KrVw==</latexit>
Hmt
<latexit sha1_base64="0Z+/xo3dno35Wm/x2xlBYxl74ZQ=">AAACBXicbZC7TsMwFIYdrqXcAowwWFRITFWCuHSsxMJYJHqRmhA5rtNatZ3IdpCqKAsLr8LCAEKsvAMbb4OTdoCWX7L06T/nyOf8YcKo0o7zbS0tr6yurVc2qptb2zu79t5+R8WpxKSNYxbLXogUYVSQtqaakV4iCeIhI91wfF3Uuw9EKhqLOz1JiM/RUNCIYqSNFdhHHkd6FEZZlAeZJzlEw/y+BK7zwK45dacUXAR3BjUwUyuwv7xBjFNOhMYMKdV3nUT7GZKaYkbyqpcqkiA8RkPSNygQJ8rPyityeGKcAYxiaZ7QsHR/T2SIKzXhoeksdlbztcL8r9ZPddTwMyqSVBOBpx9FKYM6hkUkcEAlwZpNDCAsqdkV4hGSCGsTXNWE4M6fvAids7p7Wb+4Pa81G7M4KuAQHINT4IIr0AQ3oAXaAINH8AxewZv1ZL1Y79bHtHXJms0cgD+yPn8AK1uY+w==</latexit>
Attn(Hmt
<latexit sha1_base64="H40n7g8eC2Lsm3CDAB+4o8r+huI=">AAACBXicbZC7TsMwFIadcivlFmCEwaJCYqoSxKVjJZaORaIXqQmR4zqtVTuJbAepirKw8CosDCDEyjuw8TY4aQZo+SVLn/5zjnzO78eMSmVZ30ZlZXVtfaO6Wdva3tndM/cPejJKBCZdHLFIDHwkCaMh6SqqGBnEgiDuM9L3pzd5vf9AhKRReKdmMXE5Goc0oBgpbXnmscORmvhB2s681BEconF2XwBXmWfWrYZVCC6DXUIdlOp45pczinDCSagwQ1IObStWboqEopiRrOYkksQIT9GYDDWGiBPppsUVGTzVzggGkdAvVLBwf0+kiEs5477uzHeWi7Xc/K82TFTQdFMaxokiIZ5/FCQMqgjmkcARFQQrNtOAsKB6V4gnSCCsdHA1HYK9ePIy9M4b9lXj8vai3mqWcVTBETgBZ8AG16AF2qADugCDR/AMXsGb8WS8GO/Gx7y1YpQzh+CPjM8f+5qY3Q==</latexit>
Pre-Trained Trainable
Fig. 1. Overview of the ProtAttBA architecture. The model predicts changes in antigen–antibody binding affinity (∆∆G) by amino acid mutations.
Given wild-type and mutant sequence pairs, ProtAttBA first encodes antibody and antigen sequences using a frozen pre-trained protein language model
to generate contextualized residue embeddings {Hwt wt mt mt
ab , Hag , Hab , Hag }. The attention module then applies convolutional neural networks with dual
′
wt′ mt′ mt′
multi-head cross-attention to yield refined representations {Hwt wt wt mt mt
ab , Hag , Hab , Hag } and the corresponding pooled feature vectors {fab , fag , fab , fag }
(see Sections 2.2.1 and 2.2.2). Finally, the prediction module concatenates wild-type and mutant features and regresses the ∆∆G value.
limited or no experimental labels. However, their performance resilience to uncertainties in input data, such as antibody
in scoring antibody-antigen binding affinity is suboptimal, structures, and efficiency in both training and inference. To
possibly because these models do not account for antigen address these challenges, in this study, we present ProtAttBA,
information. a novel sequence-only method leveraging a cross-Attention
In contrast, explicit scoring methods calculate the binding mechanism for Binding Affinity change prediction. As depicted
affinity changes (e.g., ∆∆G) of antibody mutants relative in Fig. 1, ProtAttBA consists of three key components: the
to the wild type. Two major approaches have been embedding module, the attention module, and the prediction
developed for evaluating these changes in antibody-antigen module. (1) The embedding module processes the wild-type and
complexes, including energy function calculations and data- mutant sequences of both antibodies and antigens, generating
driven prediction methods. Energy function-based methods residue-level latent representations using pre-trained protein
leverage protein structural information, integrating molecular language models. (2) Next, the cross-attention module refines
dynamics simulations and physical computations to evaluate these latent representations by emphasizing information-rich
complex interactions and affinities (Schymkowitz et al., 2005; features and maintaining contextual dependencies through
Dehouck et al., 2013; Pires and Ascher, 2016). These approaches feature transformation and integration. This step is pivotal
offer mechanistic insights and are grounded in molecular for capturing the intricate interactions within antigen-antibody
physics, but they face limitations in processing high-throughput complexes, which form the foundation for the precise
data and achieving high predictive accuracy. On the other hand, prediction of binding affinity changes. This module also
machine learning-based prediction methods utilize large-scale provides interpretability for ProtAttBA by identifying and
data to learn implicit patterns in protein construction and highlighting molecular interaction patterns that significantly
make predictions about binding affinity changes (Wang et al., influence binding affinities. (3) The final prediction module
2020; Yu et al., 2024). Binding is typically considered to have a integrates these interaction-informed features and produces
stronger correlation with protein structures (Tan et al., 2024a; binding affinity change predictions via learnable regression
Huang et al., 2024). Consequently, various models incorporate heads. As demonstrated in the Results section, ProtAttBA
structural information and achieve strong performance in serves as a robust and interpretable solution for predicting
standard evaluations on open benchmarks (Shan et al., 2022; binding affinity changes in antibody-antigen mutants, thus
Rana and Nguyen, 2023). However, these methods heavily rely fulfilling the critical demands of antibody engineering.
on high-quality structural data inputs. Unlike other proteins,
antibodies often lack accurate structural data, with relatively
low prediction accuracy and confidence for inferred structures. Materials and Methods
When only sequence information is available (a common
2.1 Datasets
scenario in antibody engineering), structure-based methods
Three open benchmark datasets have been used to train and
demonstrate limited robustness, making their predictions less
evaluate both baseline methods and ProtAttBA. AB645 (Wang
reliable. Some sequence-based methods address this issue
et al., 2020) and S1131 (Xiong et al., 2017) consist of single-site
by incorporating multiple sequence alignments (MSA) to
mutations, and AB1101 (Wang et al., 2020) includes mutations
capture amino acid co-evolutionary relationships and reduce
across 1 to 7 residues. These datasets quantify changes in
dependence on structural data (Jin et al., 2024). However,
binding affinity using the difference in free energy (∆∆Gbind =
reliable antibodies MSAs are often challenging to obtain, and
∆Gmut − ∆Gwild ), where the binding free energy (∆G) was
the MSA searching during training and inference renders these
experimentally determined via surface plasmon resonance by
models slower and unsuited for high-throughput screening (Tan
∆G = −RT ln (1/Kd ), with R representing the universal gas
et al., 2024b; Misra et al., 2024).
constant, T the absolute temperature in Kelvin, and Kd the
The limitations of existing methods and the critical role
dissociation equilibrium constant of the complex.
of antibody engineering underscore the need for a robust and
The experimental values for S1131 are sourced from the
efficient tool to predict binding affinity changes in antibody-
SKEMPI database (Moal and Fernández-Recio, 2012), whereas
antigen complexes. Such a tool is expected to demonstrate
those for AB645 and AB1101 originate from AB-bind (Sirin
Short Article Title 3
et al., 2016). SKEMPI compiles 3,047 mutation-induced The attention module comprises three core components: a 1D
binding free energy changes in protein-protein heterodimeric convolutional operation to capture local features by modeling
complexes with experimentally determined structures. After interactions between sequentially connected residues, a dual
redundancy removal, Xiong et al. (2017) curated S1131 with multi-head cross-attention to incorporate global contextual
1,131 interface single-point mutations. Conversely, AB-bind information from both the operated protein sequence and its
includes 32 complexes with 7 to 246 variants per complex, all paired sequence (e.g., if the wild-type antibody sequence is
measured using consistent experimental techniques to minimize the operated protein sequence, the wild-type antigen sequence
discrepancies caused by variations in experimental conditions. serves as the paired sequence), and a final convolutional
From this dataset, Wang et al. (2020) selected 645 single- pooling layer to compress the matrix representation into
point mutations as AB645 and aggregated them with 456 a vector representation. By integrating these components,
multi-point mutations to form AB1101. All three benchmark the attention module effectively propagates both local and
datasets use ∆∆Gbind (The distribution of ∆∆Gbind across global interactions, ensuring a robust and comprehensive
the three datasets can be found in Supplementary Fig. representation of antibody-antigen complex dynamics. The
S1.) as the prediction target, but they exhibit distinct following sections provide detailed descriptions of each
characteristics—such as variations in mutation sites and label submodule. For simplicity and clarity, in this subsection we use
distributions. This difference provides a comprehensive basis for H without superscripts and subscripts to denote the hidden
evaluating model performance across different data patterns. representation of an arbitrary sequence.
1D Convolutional Operation The first 1D convolutional
operation learns the local patterns within the input
2.2 Model Architecture representation space. Define
Fig. 1 presents the architecture of ProtAttBA. It processes
sequences of wild-type and mutant antigen-antibody complexes ′
H = softmax (Conv1D(LayerNorm(H))) ⊙ LayerNorm(H), (2)
as input to predict the resulting change in binding affinity.
The model’s architecture is organized into three principal where LayerNorm(·) denotes layer normalization to ensure
modules, operating across two conceptual phases: a pre-trained numerical stability, ⊙ represents element-wise multiplication,
representation phase, followed by a trainable interaction and softmax(·) is the softmax activation function, and H ∈ RL×d
prediction phase. These modules are: an embedding module for is the d-dimensional embedding representation of the protein
efficient sequence representation; an attention module designed sequence with L residues derived from the pre-trained language
to capture high-dimensional interactions within the input model. The Conv1D(·) operator, with a kernel size of 1,
complex; and a prediction module that integrates features from calculates spatial weights for each position in the sequence.
the preceding modules to generate the final predictions. Next, These weights are element-wise multiplied with the original
we explain each of these modules in detail. representation, allowing for adaptive feature weighting and
enhancing the model’s sensitivity to potentially significant
2.2.1 Embedding Module positions within the sequence. The same operation in (2)
The first embedding module takes four protein sequences as applies to all four representations in parallel.
input: the wild-type antibody, the wild-type antigen, the Dual Multi-head Cross-Attention This submodule
mutated antibody, and the wild-type antigen. It generates implements multi-head cross-attention to model interactions
latent representations for these sequences, for which we denote between antibody-antigen complex pairs. At this stage,
as {Hwt wt mt mt following the 1D convolutional operation defined in Equation (2),
ab , Hag , Hab , Hag }, which are found by a pre-
trained protein language model. protein language models have each processed representation H′ corresponds to an attention
demonstrated enhanced scalability and stability (Bai et al., score matrix Attn(H′ ). To compute the attention scores, we
2021) when applied to protein sequences. Common protein first define the query, key, and value matrices: Q ∈ RL×d ,
language models include BERT-style models (Elnaggar et al., K ∈ RL×d , and V ∈ RL×d , which are parameterized by
2021; Li et al., 2024b), which are better suited for predictive learnable weight matrices Wq , Wk , and Wv , respectively:
tasks, and GPT-style models (Xu et al., 2024; Xiao et al., ′ ′ ′
Q = RoPE(Wq H ), K = RoPE(Wk H ), V = Wv H . (3)
2024), which are more appropriate for generative tasks. Here
we opted for a BERT-style model, i.e.a masked language model
Rotary position embedding (RoPE) (Su et al., 2024) is applied
(MLM), which learns to infer the probability distribution of
to enhance sensitivity to spatial relationships between residues.
amino acids at masked positions based on the surrounding
The same computations are applied in parallel to all four input
context, and the protein representation can serve as high-
embeddings.
dimensional features of the protein. In empirical evaluations, we
The next step computes cross-attention for antibody-antigen
implemented four popular open-source protein language models
pairs. The calculation is performed separately for the wild-type
to extract embeddings, including ProtBert (Elnaggar et al.,
and mutated pairs. For each pair, a symmetric operation is
2021), ESM1b (Rives et al., 2021), ESM2 (Lin et al., 2023),
applied to the respective antibody and antigen components.
and Ankh (Elnaggar et al., 2023).
The overall cross-attention mechanism is defined as follows:
!
2.2.2 Attention Module ′ Qab (Kag )⊤
Attn(Hab ) = softmax √ Vag , (4)
The attention module processes the hidden representations d
of antibody-antigen complexes. Overall, the module projects !
′ Qag (Kab )⊤
the residue-level matrix representation of protein sequences to Attn(Hag ) = softmax √ Vab . (5)
vector representations, i.e., d
Table 1. Performance comparison on three open benchmarks with K-fold validation. We highlighted the best and second best results.
AB645 S1131 AB1101
Model RMSE R2 PCC ρ RMSE R2 PCC ρ RMSE R2 PCC ρ
Sequence-based Methods
DeepEP-PPI* - 0.09 0.41 - - 0.03 0.21 - - 0.28 0.54 -
LSTM-PHV* - 0.07 0.17 - - 0.19 0.39 - - 0.05 0.16 -
PIPR* - 0.10 0.20 - - 0.21 0.33 - - 0.19 0.37 -
TransPPI* - 0.07 0.18 - - 0.19 0.38 - - 0.12 0.22 -
AttABseq* 1.75 0.17 0.44 - 1.82 0.37 0.66 - 1.72 0.34 0.59 -
Structure-based Methods
BeAtMuSiC 1.98±0.23 -0.03±0.11 0.26±0.13 0.38±0.08 2.37±0.41 0.05±0.10 0.29±0.13 0.36±0.09 - - -
FoldX-PDB 2.51±0.78 -0.83±1.11 0.31±0.20 0.29±0.12 2.65±0.50 -0.22±0.36 0.43±0.08 0.47±0.09 3.40±0.48 -1.66±0.78 0.28±0.11 0.26±0.14
FoldX-AF2 3.04±1.41 -2.02±3.33 0.13±0.14 0.14±0.11 3.14±0.69 -0.85±0.94 0.39±0.08 0.49±0.08 3.96±0.79 -2.65±1.27 0.14±0.07 0.08±0.05
FoldX-ESM 3.32±1.34 -2.27±2.58 0.04±0.13 0.05±0.13 2.72±0.32 -0.28±0.12 0.08±0.07 0.09±0.07 3.89±0.96 -2.49±1.49 0.09±0.08 0.04±0.06
DDGPred-PDB 1.69±0.51 0.25±0.23 0.54±0.16 0.62±0.11 0.95±0.13 0.84±0.04 0.92±0.02 0.85±0.02 1.79±0.16 0.28±0.03 0.59±0.02 0.53±0.02
DDGPred-AF2 2.19±0.29 -0.34±0.33 0.21±0.13 0.23±0.14 1.63±0.13 0.52±0.14 0.76±0.07 0.60±0.08 2.37±0.11 -0.29±0.18 0.17±0.06 0.13±0.04
DDGPred-ESM 2.01±0.48 -0.08±0.17 0.37±0.19 0.43±0.12 2.39±0.30 0.02±0.11 0.39±0.11 0.37±0.08 2.04±0.23 0.06±0.05 0.48±0.02 0.43±0.05
Ours
ProtAttBA-ESM2 1.70±0.25 0.20±0.09 0.47±0.11 0.48±0.12 1.31±0.09 0.69±0.09 0.84±0.05 0.75±0.06 1.61±0.07 0.42±0.06 0.65±0.04 0.63±0.04
ProtAttBA-ESM1b 1.71±0.26 0.19±0.11 0.47±0.10 0.47±0.11 1.36±0.14 0.65±0.12 0.82±0.06 0.70±0.07 1.62±0.16 0.41±0.11 0.64±0.08 0.63±0.09
ProtAttBA-Ankh 1.72±0.27 0.18±0.13 0.46±0.11 0.47±0.10 1.29±0.12 0.69±0.11 0.84±0.06 0.76±0.06 1.52±0.09 0.48±0.06 0.69±0.04 0.66±0.03
ProtAttBA-ProtBert 1.73±0.28 0.18±0.11 0.47±0.09 0.49±0.13 1.37±0.13 0.64±0.16 0.81±0.10 0.71±0.09 1.62±0.07 0.41±0.07 0.65±0.05 0.62±0.04
Table 2. Performance comparison on three open benchmarks with sequence identity split and mutation depth split. We highlighted the best
and second best results.
Fig. 3. Protein structure visualization for interpretability analysis. Panels a, b, c, and d depict localized views of the antibody-antigen complex at the
mutation site, before and after mutation, respectively. The antigen chain is highlighted in green. Panels e and f illustrate the attention weight matrices
learned by the model, where cooler colors (tending towards blue) indicate regions where the model assigns higher importance to interactions between
the mutated residue and the current position.
framework generalizes more effectively to unseen proteins and is unavailable or inconsistent, or when evaluating mutations in
is more robust to distributional shifts. proteins with low sequence similarity.
In the last four columns of Table 2, we present the
performance comparison under the mutation depth split on
AB1101, the only dataset containing multi-site mutations. This
evaluation aims to examine ProtAttBA’s capability in handling 3.3 Model Interpretability with Attention Scores
complex mutational effects and its ability to generalize from An advantage of ProtAttBA is its ability to provide residue-
simpler to more complex mutations. Overall, baseline models level analysis of proteins, visualizing the impact of residue
did not show improved performance on high-depth mutations mutations on prediction outcomes. We randomly selected
compared to the sequence identity split. In fact, several models two complexes from the AB-bind and SKEMPI datasets for
experienced a significant performance drop, such as FoldX, analysis, employing visualization techniques to examine the
especially when relying on less accurate predicted structures as distribution of attention weights, focusing on mutation sites.
input. In contrast, ProtAttBA consistently maintained strong Fig. 3(a-d) illustrates the hydrogen bond network alterations
predictive accuracy and outperformed all baseline methods at mutation sites for two complexes, pre- and post-mutation.
under this more challenging extrapolation setting. Fig. 3a and Fig. 3b depict the arginine (R) to glutamine
In summary, across diverse evaluation scenarios we (Q) mutation at position 53 of the antibody chain in 1IAR,
examined, ProtAttBA consistently outperformed baseline while Fig. 3c and Fig. 3d show the serine (S) to alanine
methods. The performance variability observed in structure- (A) mutation at position 91 of the antibody chain in 1DQJ.
dependent approaches under different structure qualities and Both mutants exhibit a marked reduction in hydrogen bonds
data splits highlights the robustness and broader applicability post-mutation, potentially leading to changes in antibody-
of our sequence-only framework. These results underscore antigen binding affinity. The visualization of attention weights
ProtAttBA’s strong generalization capability and its potential in Fig. 3e-f reveals that ProtAttBA identifies key amino acid
as a reliable tool to predict mutation-induced changes in positions within the hydrogen bond network. For the 1IAR
binding affinity, particularly in settings where structural data position 53 mutation, strong interactions are observed with
positions phenylalanine-41 (F-41) and leucine-42 (L-42) of the
antigen chain. Similarly, for the 1DQJ position 91 mutation,
Short Article Title 7
Conclusion References
This study addresses the critical task of predicting binding X. Liu, Y. Luo, P. Li, S. Song, and J. Peng, “Deep geometric
affinity changes in antibody–antigen complexes upon mutation, representations for modeling effects of mutations on protein-
which plays a central role in antibody engineering. We protein binding affinity,” PLOS Computational Biology,
introduce ProtAttBA, a novel deep learning framework that vol. 17, pp. 1–28, 08 2021.
combines pre-trained protein language models and attention G. Wang, X. Liu, K. Wang, Y. Gao, G. Li, D. Baptista-
mechanisms to model protein features and interaction contexts Hon, X. Yang, K. Xue, W. Tai, Z. Jiang, L. Cheng,
from sequences alone. Unlike traditional structure-based M. Fok, J. Lau, S. Yang, L. Lu, P. Zhang, and K. Zhang,
methods, ProtAttBA avoids reliance on structural inputs, “Deep-learning-enabled protein–protein interaction analysis
which are often unavailable or of uncertain quality, thus for prediction of sars-cov-2 infectivity and variant evolution,”
enhancing its robustness and real-world applicability. Nature Medicine, vol. 29, pp. 1–12, 07 2023.
We conducted a comprehensive evaluation of ProtAttBA J. Zhang, Q. Wu, Z. Liu, Q. Wang, J. Wu, Y. Hu,
under various experimental settings, including standard cross- T. Bai, T. Xie, M. Huang, T. Wu et al., “Spike-specific
validation, sequence identity-based splits, and mutation circulating t follicular helper cell and cross-neutralizing
depth-based extrapolation. While structure-based methods antibody responses in covid-19-convalescent individuals,”
demonstrate strong performance when high-quality crystal Nature Microbiology, vol. 6, no. 1, pp. 51–58, 2021.
structures are available, their predictive accuracy drops A. Beck, T. Wurch, C. Bailly, and N. Corvaia, “Strategies and
significantly when fed predicted structures from folding models challenges for the next generation of therapeutic antibodies,”
such as AlphaFold2 or ESMFold. This sensitivity to structural Nature Reviews Immunology, vol. 10, no. 5, pp. 345–352,
inputs limits their reliability in practical scenarios where 2010.
such ideal data is rarely available (especially for antibodies E. M. Brustad and F. H. Arnold, “Optimizing non-natural
and protein complexes). In contrast, ProtAttBA maintained protein function with directed evolution,” Current Opinion
stable performance across all settings, demonstrating strong in Chemical Biology, vol. 15, no. 2, pp. 201–210, 2011.
generalization capabilities even in extrapolative scenarios, such P. Kouba, P. Kohout, F. Haddadi, A. Bushuiev,
as predicting the effects of multi-site mutations or mutations R. Samusevich, J. Sedlar, J. Damborsky, T. Pluskal,
in proteins with low sequence similarity to the training data. J. Sivic, and S. Mazurenko, “Machine learning-guided
Moreover, ProtAttBA offers potential interpretability with protein engineering,” ACS Catalysis, vol. 13, no. 21, pp.
residue-level attention scores, allowing users to identify amino 13 863–13 895, 2023.
acid positions with a strong influence on binding affinity C. Hsu, R. Verkuil, J. Liu, Z. Lin, B. Hie, T. Sercu,
changes. These attention patterns show promising alignment A. Lerer, and A. Rives, “Learning inverse folding from
with known functional sites, providing mechanistic insights and millions of predicted structures,” in International Conference
enhancing the model’s transparency. on Machine Learning. PMLR, 2022, pp. 8946–8970.
While this study emphasizes the strengths of sequence- M. Li, Y. Shi, S. Hu, S. Hu, P. Guo, W. Wan, L. Y. Zhang,
based approaches, we do not discount the value of S. Pan, J. Li, L. Sun et al., “Mvsf-ab: Accurate antibody-
structural modeling. Structure-based methods remain essential, antigen binding affinity prediction via multi-view sequence
particularly for tasks where spatial configuration plays a feature learning,” Bioinformatics, p. btae579, 2024.
dominant role. However, our findings highlight the need B. Zhou, L. Zheng, B. Wu, Y. Tan, O. Lv, K. Yi, G. Fan,
for more robust integration of structural, sequence, and and L. Hong, “Protein engineering with lightweight graph
evolutionary information to mitigate sensitivity to imperfect denoising neural networks,” Journal of Chemical Information
structure inputs. Future research could explore hybrid models and Modeling, 2024.
that incorporate predicted or partial structural features, B. Zhou, Y. Tan, Y. Hu, L. Zheng, B. Zhong, and L. Hong,
binding site annotations, or contrastive learning strategies “Protein engineering in the deep learning era,” mLife, vol. 3,
focused on antibody–antigen interfaces to further enhance no. 4, pp. 477–491, 2024.
model reliability. F. Cuturello, M. Celoria, A. Ansuini, and A. Cazzaniga,
“Enhancing predictions of protein stability changes induced
by single mutations using msa-based language models,”
Competing interests Bioinformatics, vol. 40, no. 7, p. btae447, 2024.
J. Schymkowitz, J. Borg, F. Stricher, R. Nys, F. Rousseau,
No competing interest is declared. and L. Serrano, “The foldx web server: an online force
field,” Nucleic Acids Research, vol. 33, no. suppl 2, pp.
W382–W388, 2005.
Author contributions statement Y. Dehouck, J. M. Kwasigroch, M. Rooman, and D. Gilis,
Conceptualization of this study: C. L.; Implementation of “Beatmusic: prediction of changes in protein–protein binding
the methodology: C. L.; Data curation: C. L. and G.W. affinity on mutations,” Nucleic Acids Research, vol. 41,
R.; Investigation of the study: C. L. and B.X. Z.; Project no. W1, pp. W333–W339, 2013.
administration: B.X. Z. and G.S. F.; Writing the paper: C. D. E. Pires and D. B. Ascher, “mcsm-ab: a web server for
L. and B.X. Z.; Review and editing the paper: M.C. L., Y. T., predicting antibody–antigen affinity changes upon mutation
and B.X. Z.; Supervision: B.X. Z. and G.S. F. with graph-based signatures,” Nucleic Acids Research,
8 Author Name et al.
vol. 44, no. W1, pp. W469–W473, 2016. M. Li, Y. Tan, X. Ma, B. Zhong, H. Yu, Z. Zhou,
M. Wang, Z. Cang, and G.-W. Wei, “A topology-based network W. Ouyang, B. Zhou, P. Tan, and L. Hong, “ProSST: Protein
tree for the prediction of protein–protein binding affinity language modeling with quantized structure and disentangled
changes following mutation,” Nature Machine Intelligence, attention,” in The Thirty-eighth Annual Conference on
vol. 2, no. 2, pp. 116–123, 2020. Neural Information Processing Systems, 2024.
G. Yu, Q. Zhao, X. Bi, and J. Wang, “Ddaffinity: predicting X. Xu, C. Xu, W. He, L. Wei, H. Li, J. Zhou, R. Zhang,
the changes in binding affinity of multiple point mutations Y. Wang, Y. Xiong, and X. Gao, “Helm-gpt: de novo
using protein 3d structure,” Bioinformatics, vol. 40, no. macrocyclic peptide design using generative pre-trained
Supplement 1, pp. i418–i427, 2024. transformer,” Bioinformatics, p. btae364, 2024.
Y. Tan, M. Li, B. Zhou, B. Zhong, L. Zheng, P. Tan, Z. Zhou, Y. Xiao, E. Sun, Y. Jin, Q. Wang, and W. Wang, “Proteingpt:
H. Yu, G. Fan, and L. Hong, “Simple, efficient, and scalable Multimodal llm for protein property prediction and structure
structure-aware adapter boosts protein language models,” understanding,” arXiv preprint arXiv:2408.11363, 2024.
Journal of Chemical Information and Modeling, vol. 64, A. Rives, J. Meier, T. Sercu, S. Goyal, Z. Lin, J. Liu, D. Guo,
no. 16, pp. 6338–6349, 2024. M. Ott, C. L. Zitnick, J. Ma et al., “Biological structure
J. Huang, C. Sun, M. Li, R. Tang, B. Xie, S. Wang, and J.- and function emerge from scaling unsupervised learning to
M. Wei, “Structure-inclusive similarity based directed gnn: 250 million protein sequences,” Proceedings of the National
a method that can control information flow to predict drug– Academy of Sciences, vol. 118, no. 15, p. e2016239118, 2021.
target binding affinity,” Bioinformatics, vol. 40, no. 10, p. Z. Lin, H. Akin, R. Rao, B. Hie, Z. Zhu, W. Lu, N. Smetanin,
btae563, 2024. R. Verkuil, O. Kabeli, Y. Shmueli et al., “Evolutionary-scale
S. Shan, S. Luo, Z. Yang, J. Hong, Y. Su, F. Ding, L. Fu, C. Li, prediction of atomic-level protein structure with a language
P. Chen, J. Ma et al., “Deep learning guided optimization model,” Science, vol. 379, no. 6637, pp. 1123–1130, 2023.
of human antibody against sars-cov-2 variants with broad A. Elnaggar, H. Essam, W. Salah-Eldin, W. Moustafa,
neutralization,” Proceedings of the National Academy of M. Elkerdawy, C. Rochereau, and B. Rost, “Ankh: Optimized
Sciences, vol. 119, no. 11, p. e2122954119, 2022. protein language model unlocks general-purpose modelling,”
M. M. Rana and D. D. Nguyen, “Geometric graph learning arXiv preprint arXiv:2301.06568, 2023.
to predict changes in binding free energy and protein J. Su, M. Ahmed, Y. Lu, S. Pan, W. Bo, and Y. Liu, “Roformer:
thermodynamic stability upon mutation,” The Journal of Enhanced transformer with rotary position embedding,”
Physical Chemistry Letters, vol. 14, no. 49, pp. 10 870– Neurocomputing, vol. 568, p. 127063, 2024.
10 879, 2023. I. Loshchilov and F. Hutter, “Decoupled weight decay
R. Jin, Q. Ye, J. Wang, Z. Cao, D. Jiang, T. Wang, regularization,” in International Conference on Learning
Y. Kang, W. Xu, C.-Y. Hsieh, and T. Hou, “Attabseq: Representations, 2019.
an attention-based deep learning prediction method for Y. Yao, X. Du, Y. Diao, and H. Zhu, “An integration of
antigen–antibody binding affinity changes based on protein deep learning with feature embedding for protein–protein
sequences,” Briefings in Bioinformatics, vol. 25, no. 4, p. interaction prediction,” PeerJ, vol. 7, p. e7126, 2019.
bbae304, 2024. S. Tsukiyama, M. M. Hasan, S. Fujii, and H. Kurata, “Lstm-
Y. Tan, R. Wang, B. Wu, L. Hong, and B. Zhou, “Retrieval- phv: prediction of human-virus protein–protein interactions
enhanced mutation mastery: Augmenting zero-shot by lstm with word2vec,” Briefings in Bioinformatics, vol. 22,
prediction of protein language model,” arXiv:2410.21127, no. 6, p. bbab228, 2021.
2024. M. Chen, C. J.-T. Ju, G. Zhou, X. Chen, T. Zhang, K.-W.
M. Misra, J. Jeffy, C. Liao, S. Pickthorn, K. Wagh, and Chang, C. Zaniolo, and W. Wang, “Multifaceted protein–
A. Herschhorn, “Hiresist: a database of hiv-1 resistance to protein interaction prediction based on siamese residual
broadly neutralizing antibodies,” Bioinformatics, vol. 40, rcnn,” Bioinformatics, vol. 35, no. 14, pp. i305–i314, 2019.
no. 3, p. btae103, 2024. X. Yang, S. Yang, X. Lian, S. Wuchty, and Z. Zhang,
P. Xiong, C. Zhang, W. Zheng, and Y. Zhang, “Bindprofx: “Transfer learning via multi-scale convolutional neural layers
assessing mutation-induced binding affinity change by for human–virus protein–protein interaction prediction,”
protein interface profiles with pseudo-counts,” Journal of Bioinformatics, vol. 37, no. 24, pp. 4771–4778, 2021.
Molecular Biology, vol. 429, no. 3, pp. 426–434, 2017. J. Jumper, R. Evans, A. Pritzel, T. Green, M. Figurnov,
I. H. Moal and J. Fernández-Recio, “Skempi: a structural O. Ronneberger, K. Tunyasuvunakool, R. Bates, A. Žı́dek,
kinetic and energetic database of mutant protein interactions A. Potapenko et al., “Highly accurate protein structure
and its use in empirical models,” Bioinformatics, vol. 28, prediction with alphafold,” Nature, vol. 596, no. 7873, pp.
no. 20, pp. 2600–2607, 2012. 583–589, 2021.
S. Sirin, J. R. Apgar, E. M. Bennett, and A. E. R. Townshend, R. Bedi, P. Suriana, and R. Dror, “End-to-
Keating, “Ab-bind: antibody binding mutational database end learning on 3d protein structure for interface prediction,”
for computational affinity predictions,” Protein Science, Advances in Neural Information Processing Systems, vol. 32,
vol. 25, no. 2, pp. 393–409, 2016. 2019.
Y. Bai, J. Mei, A. L. Yuille, and C. Xie, “Are transformers B. Zhou, L. Zheng, B. Wu, K. Yi, B. Zhong, Y. Tan, Q. Liu,
more robust than cnns?” Advances in Neural Information P. Liò, and L. Hong, “A conditional protein diffusion model
Processing Systems, vol. 34, pp. 26 831–26 843, 2021. generates artificial programmable endonuclease sequences
A. Elnaggar, M. Heinzinger, C. Dallago, G. Rehawi, Y. Wang, with enhanced activity,” Cell Discovery, vol. 10, no. 1, p. 95,
L. Jones, T. Gibbs, T. Feher, C. Angerer, M. Steinegger 2024.
et al., “Prottrans: Toward understanding the language of S. Li, Y. Tan, S. Ke, L. Hong, and B. Zhou, “Immunogenicity
life through self-supervised learning,” IEEE transactions on prediction with dual attention enables vaccine target
Pattern Analysis and Machine Intelligence, vol. 44, no. 10, selection,” in The Thirteenth International Conference on
pp. 7112–7127, 2021. Learning Representations, 2025.