Fundamentally different

Large research facilities at PSI such as the X-ray free-electron laser SwissFEL and the Swiss Light Source SLS – especially after the upgrade SLS 2.0 – deliver unimaginably vast amounts of data. Artificial intelligence is helping to evaluate data efficiently and exploit the facilities’ full potential for research.

KI image generation: © Studio HübnerBraun/Midjourney

Proteins are the workhorses of life. As tiny molecular machines, they are found in every cell and have a role in nearly all biological processes – from metabolism to cellular communication. Their diversity is enormous, because in the human body alone there are hundreds of thousands of different proteins, each with its own function. Proteins are important targets for drugs, and understanding their structure and function is an important task in biological research. One challenge in drug development is to find, if possible, an active agent that interacts with just one type of protein, to the exclusion of all the rest.

To achieve such a feat, one must first understand the language of proteins. The basis of this protein language is a kind of alphabet. It essentially consists of 20 building blocks analogous to letters. In proteins, however, it’s not about letters, but rather amino acids. Each protein is built up from a certain sequence of these amino acids; the sequence in turn largely determines its properties. Researchers would now like to know which protein sequence leads to which property. This is where so-called large language models such as GPT4 come into play. The AI chatbot ChatGPT, which has been causing a stir since 2022, is based on GPT4. Both were developed by the company OpenAI. ChatGPT uses an extensive dataset of texts created by humans to learn the patterns and structures of language. When the user enters a question or task, the model produces a response based on its understanding of the contexts and patterns that it learned during training. In this way it can write poems, novels and even programming code.

Flurin Hidber, a doctoral candidate supervised by Xavier Deupi, an expert in bioinformatics and protein structure at PSI, uses AI in protein research. Hidber uses a sophisticated model similar to ChatGPT that is trained to predict amino acids in protein sequences, instead of generating human-like language. This unique ability does not merely mimic the predictive capabilities of language models in AI, but rather provides valuable insights into the structure and function of proteins. Pharmaceutical researchers could use these to tailor medications and significantly shorten the process of trial and error in the laboratory, which in the end yields only a small proportion of drug candidates with promising properties.

Xavier Deupi (left) and Flurin Hidber from the research group for Condensed Matter Theory want to better understand how the function of proteins is related to their structure. They are targeting light-sensitive proteins in particular. © Paul Scherrer Institute/Markus Fischer; KI image generation: Studio HübnerBraun/Midjourney

An ambitious goal

Deupi and Hidber are working towards an ambitious goal: being able to determine the precise amino acid sequence that leads to a desired protein property. One focus of their research is light-sensitive proteins, a speciality of Deupi’s group and a research subject at SwissFEL. These proteins occur in many organisms, from microbes to humans, and have medical potential. Hidber’s use of AI to predict the properties of light-sensitive proteins solely on the basis of the sequence of their building blocks represents a significant advance in this field.

Through the precise prediction of the light-absorption properties of proteins, Hidber’s work could pave the way for the development of molecules with tailored properties – a step that could have a profound impact on optogenetics. This scientific technique employs light to control and monitor the activity of certain cells in living organisms, such as nerve cells in the brain. Researchers insert genes for light-sensitive proteins into these cells so they can precisely influence the cells’ behaviour by irradiating them with light.

This technology could contribute to the understanding and treatment of neurological diseases, since it provides a tool that can be used to investigate and control the activity of specific brain cells with unprecedented precision. For the future, Deupi and Hidber have set themselves the goal of reversing this process. They want to design new proteins with properties tailored to meet specific requirements, for example proteins that react to light of a particular colour. This blueprint could then be checked experimentally, and hopefully confirmed by colleagues in the laboratory.

The topic of protein dynamics is also at the heart of Cecilia Casadei’s research. The physicist has developed a new algorithm that enables more efficient evaluation of measurements at X-ray free-electron laser facilities such as SwissFEL. The building blocks of life often perform ultrafast movements. Investigating these with precision is crucial to gain a better understanding of proteins. In the long run, this can provide valuable information about disease processes and enable the development of novel medical approaches.

Evaluating extremely short flashes of X-ray light

SwissFEL delivers extremely intense and short flashes of X-ray light in laser quality to measure these ultrafast movements of proteins. Examined in crystal form, the proteins’ structure is revealed in so-called diffraction images, which arise from the regular arrangement of proteins in the crystal and are registered by a detector. But the data from a single crystal contains only two percent of the information for a complete image. Getting around this limitation usually involves dividing the data into rough time periods and averaging all data within a period. With this averaging, however, a lot of detailed information is lost. “You could say that the individual frames of the protein movie are a bit washed-out,” Casadei says. “That’s why we developed a method that gets more out of the measurement data.”

The new method that Casadei and her team developed is called low-pass spectral analysis, LPSA for short. Through highly complex mathematical equations, the researchers remove unwanted noise from the data without losing the relevant details of protein dynamics. Thus instead of blurry diffraction images, sharp pictures can be generated in the shortest time periods that smoothly trace protein movement – like switching from an old tube television set to high-resolution digital video.

“The new algorithm helps the researchers here at PSI’s SwissFEL extract more information from their data,” Casadei says. Conversely, the algorithm can help to shorten the long measurement times. Since beam time is always in high demand and short supply at large research facilities in general, and at SwissFEL in particular, this represents an extremely welcome prospect for protein researchers who use these top facilities.

With the SLS 2.0 project, the researchers are facing another challenge. From 2025 on, after its upgrade, the Swiss Light Source SLS will deliver many times more measurement data than before. Then even extremely high-performance computers will struggle to process it. Machine learning will therefore play a central role. The researchers have developed algorithms for SLS 2.0 that use the brightness values the detectors register to determine the phase shifts of the incoming light at high speed and provide especially valuable information about the sample. “In this, PSI is a world leader,” emphasises Gebhard Schertler, head of the Biology and Chemistry Research Division at PSI.

Rapidly revealing changes in cells

One further strength of machine learning is that it can combine data from different measurement methods. So, for example, pictures of cell nuclei could be made with a light microscope, and X-ray methods in SLS 2.0 could provide additional high-resolution images. AI would combine these different types of data with biochemical clinical data from patients. It’s not possible to examine one and the same cell with different analytical methods, but with machine learning, sets of data from different methods can be compared. The algorithm recognises the properties of cells from different experiments. That is almost as if one and the same cell had been examined with all methods simultaneously.

Large research facilities still indispensable

Will large research facilities such as SwissFEL and SLS soon become superfluous because everything can be investigated with AI and machine learning? Xavier Deupi says no. “Large research facilities remain indispensable even in the age of AI. Large language models do offer high-performance tools for the analysis of known data, but they can never replace the capability of these facilities to generate new fundamental data.”

The process of how science is conducted is changing fundamentally.

Xavier Deupi, Laboratory for Condensed Matter Theory

Nevertheless, AI has become an integral part of the research toolkit: from extracting insights from a large number of scientific publications to automatically generating program code or even writing articles on the basis of experimental data. “These tools are part of our daily routine,” Flurin Hidber confirms. Xavier Deupi stresses: “Despite these advances, the interpretation and critical discussion of the results will always rely on experienced researchers.” But he admits: “Today young researchers like Flurin work very differently from the way I did 20 years ago – the way science is done is changing fundamentally.”

Text: Bernd Müller

© PSI provides image and/or video material free of charge for media coverage of the content of the above text. Use of this material for other purposes is not permitted. This also  includes the transfer of the image and video material into databases as well as sale by third parties.

Dr. Xavier Deupi 
Paul Scherrer Insitute PSI

+41 56 310 33 37 

About PSI

The Paul Scherrer Institute PSI develops, builds and operates large, complex research facilities and makes them available to the national and international research community. The institute's own key research priorities are in the fields of future technologies, energy and climate, health innovation and fundamentals of nature. PSI is committed to the training of future generations. Therefore about one quarter of our staff are post-docs, post-graduates or apprentices. Altogether PSI employs 2200 people, thus being the largest research institute in Switzerland. The annual budget amounts to approximately CHF 420 million. PSI is part of the ETH Domain, with the other members being the two Swiss Federal Institutes of Technology, ETH Zurich and EPFL Lausanne, as well as Eawag (Swiss Federal Institute of Aquatic Science and Technology), Empa (Swiss Federal Laboratories for Materials Science and Technology) and WSL (Swiss Federal Institute for Forest, Snow and Landscape Research). (Last updated in June 2023)