When we think of AI, artificial intelligence, we think of a tool somewhere between stupid and diabolical. We have in mind some of the things it can do, and we find them unpleasant or even frightening: for example the insipid texts that the AI produces by replacing the writers, or the fake videos that the AI generates, sometimes used to build fake news .
They are all outputs that need instructions: “show me Kamala Harris and Donald Trump walking holding hands and then kissing.” It is a video that is still circulating online, but to generate it the AI first had to learn. And that is digesting (real) videos of Trump and Harris, as well as learning the concept of walking and kissing. The Nobel Prizes in Physics and Chemistry, awarded on Tuesday 8 and Wednesday 9 October, have a lot to do with this. But they suggest that we can also look at AI with respect, if not gratitude.
We need to start with proteins. They are made of amino acids such as alanine, leucine, cysteine (there are twenty), small molecules considered the building blocks of life. Proteins are made up of long chains of amino acids, up to a few thousand. A fundamental characteristic of proteins is their structure, i.e. shape, because it determines their function. Function that can range from building muscle tissue to degrading toxic molecules, or transporting iron ions in the blood. But where does the shape of a protein come from? How its amino acid chains fold and coil, which in turn depends on how amino acids in certain parts of the chain interact with amino acids in other parts of the chain. But if we know the sequence of the amino acids of a protein, that is, how they are aligned along the chain, we should also be able to know how these interact with each other and, therefore, predict the shape the protein will have. Right? Absolutely not until six years ago.
There are experimental techniques that allow us to know the structure of a given protein: it is isolated, purified, crystallized, an X-ray diffraction experiment is conducted on the crystal, and, after many calculations, here is the structure. It’s hard work, but it’s been done since the 1950s.
In turn, proteins can be sequenced, that is, their amino acid composition can be determined in the laboratory and how these are lined up along the chain. Well, in 1994 a biennial public competition was opened between biochemists on the sequence/structure relationship, the CASP (Critical Assessment of Protein Structure Prediction). Those who participated had access to the amino acid sequences of new proteins whose structure had just been determined by crystal diffraction. However, the latter, the structure, was not made known to the participants. It should have been predicted on the sole basis of the amino acid sequence. Result? Only failures. It is not surprising: after all, even a protein made of just 100 amino acids has 10 to the power of 47 (that is, more than a billion billion billion billion billion billion) potential three-dimensional structures. Things changed dramatically in 2018 when Demis Hassabis attended CASP. Hassabis is one of three winners of this year’s Nobel Prize in Chemistry, and he is not a chemist. Graduated in computer science and with a doctorate in neuroscience, now CEO of GoogleDeepMind, in his debut in the CASP competition he managed to hit 60% of the predictions of protein structures using AI.
One of his systems, AlphaFold, had been trained with all the protein structures known up to that point, and with the corresponding amino acid sequences. John Jumper, on the other hand, has a degree in theoretical physics of matter and a doctorate in theoretical chemistry. A protein expert, he joined GoogleDeepMind in 2017, immediately started working with Hassabis, and brought the concepts of neural networks to model proteins into the AI approach, improving AlphaFold in a new version. In 2020, he and Hassabis, using AlphaFold2, hit the questions of the CASP competition almost 100%, so much so that since then all participants have been using it (it has been made open source).
This year Jumper received half of the Nobel Prize for Chemistry, shared in turn with his colleague Hassabis. And the Nobel Prize for Chemistry followed by just one day the one for Physics, awarded to John Hopfield and Geoffrey Hinton, for the development of neural network technology in machine learning, that is, that process which through learning by a computer of known schemes, and with the comparison between the known ones and the proposed ones, can lead, to put it simply, to the recognition of a person or an object in a photo. Or, to put it more complicated, to predict the real structure of a protein.
The third winner of the 2024 Nobel Prize for Chemistry is missing, the one who won the other half. It’s David Baker, this time a full-fledged biochemist, of the kind who uses the in silico (i.e. computer) approach, but who then actually synthesizes the molecules. Baker developed a program that he called Rosetta (like the stele), which does the opposite job of that of Hassabis and Jumper: it notes the structure of a protein, it finds the sequence of amino acids. Rosetta is based on the exploration of databases of protein structures and corresponding sequences, and searches for small fragments of amino acid sequences that have a similar structure to fragments of the desired one. An AI job. And in fact in 2020 Baker implemented a system similar to AlphaFold2 in Rosetta, significantly improving its use.
Baker’s approach has an important advantage: if you invent a protein with a given structure, with Rosetta you can understand which sequence of amino acids you can obtain it with, and then prepare it in the laboratory. If these impressive scientific results already make us feel respect, this is where gratitude also comes.
Because proteins can be designed from scratch with a shape that allows them to perform useful functions, determine their amino acid sequence, and then actually synthesize them: proteins useful as vectors for the more effective administration of drugs, selective sensors for specific molecules (such as fentanyl, the so-called “zombie drug”), or proteins that mimic the structure of influenza viruses, which can be used as vaccines.
*Full professor of inorganic chemistry at the University of Pavia