Follow the function of RNA shape – why is it difficult to predict?

RNA brown brings unique challenges to mathematical models.Credit: Getti

At a virtual conference in November 2020, the winner was announced in the two -year protein structure: alphafold. It was created by Google DeepMind, and this mathematical tool has detonated its competitors of water by solving dozens of protein structures with offspring accurately, and the completion of an unique work that researchers have been trying for decades.

The challenge, known as the cash evaluation of the prediction of the protein structure (CASP), was launched in 1994 to enhance the arithmetic tools for the modeling of 3D protein formations from the amino acid sequence. The teams of scientists incited their mathematical models against each other, in an attempt to create the most accurate predictions of the unknown protein structures that, which, before the event, are experimentally resolved using methods such as X -ray crystals and electron microscope.

Alphafold 2020 predictions are competing with those that have been solved using these tested and tested techniques, and have since become a favorite of the structural biology community. Its warehouse – Alphafold – contains about 200 million buildings, and in 2024, Alphafold’s developers participated in the Nobel Prize in Chemistry for their work.

But this is proteins. In 2022, CASP organizers turned their attention about a different category of vital molecules that are still difficult, however they are still difficult: RNA.

As with proteins, determining the structure of the RNA usually requires expensive and timely experimental methods. Mathy tools can help, but RNA is tougher nuts. One of the simple reasons, according to Yu Li, the computer world at the Chinese University of Hong Kong, is historical. For a long time, most scientists did not believe that RNA biology was interesting enough to study. But RNA also introduces unique molecular challenges, and is available relatively few data to train mathematical models of a good performance with proteins.

However, the researchers have become creative, and there is a growing set of tools of arithmetic tools to help predict the RNA. Many of these developments include the latest developments in artificial intelligence (AI), including LLMS models behind popular Chatbots, such as ChatGPT.

“The RNA is a very difficult problem,” the X-Jai Chen, the biomedic physicist at Missouri University in Colombia. He adds that artificial intelligence is “better and better.”

Featured goals

For a long time, RNA was seen simply as an intermediary between two more interesting categories than the molecule: DNA, “lifestyle”, proteins, “building blocks” of the cell. Only a small part of the human genome codes the proteins, but a lot of non -encrypted genome is copied in the RNA. Over the past few decades, scientists have discovered that this non-encrypted RNA is mediating the basic functions of healthy cells-and contributes to many diseases.

How these currencies remain, in many cases, a mystery. The researchers hope, by determining their shape, that they are able to understand the role these particles play better in making our cells a sign – the issue of dictation function. “In biology, we assume that the sequence is likely to determine the structure, and that the structure is likely to determine the job.”

But the calculations to predict the RNA structure fails to combat protein. Even alphafold3, the latest version of the DeepMind-Fatening tool-is short when it comes to RNA.

“If you look at the modern CASP competitions, we are at the point that, on the side of the protein structure of things, the entire automatic difference is like human difference,” says Lydia Friedolino, biologist at the University of Michigan in Ann Arbour and a member of a scientific advisory council at Circnova, a company that uses deep science tools in the field of blood sciences. “For RNA, we are not anywhere near that – all higher groups benefit greatly from human intervention.”

The prediction of the RNA structure appeared in the CASP competitions in 2022 and 2024, and Friedolino participated in both. The team that ranked first to predict RNA in the last event, CASP16, used a hybrid approach: the combination of artificial intelligence with a specific algorithm based on physics. According to Chen, who led the winning group, they used for the first time alphafold3 to generate groups of potential RNA structures, then applied a physics -based model that benefits from the “energy scene” for possible structures to determine the match that is likely to be formed. (Chen team licensed its programs for many biotechnology companies.)

Researchers who only develop AI tools to predict the RNA structure. One of them is that RNA molecules have features that make their structures difficult to nature. RNA molecules contain more flexible spine than proteins, and their structures are more dynamic, which means that they can undergo major consensual changes while carrying out their biological tasks.

Moreover, RNA molecules lack the various chemicals that can be found in proteins, such as acid and essential waste, which allow the formation of stable connections. Instead, sectors of RNA interact in all types of “exotic and wonderful roads”, says Friedolino, such as through various basic peers and the sharing of mineral ions. As a result, the exact differences between the best and worst models are more difficult to discover them with proteins.

A group of RNA structures created from the computer, which consists of one strands in gray and red. Gray was created through experiments and expected the red AI.

Natural (R1116 and R1149) and artificial structures (R1138) RNA, used in the mission of the CASP15 structure predictions, is measured experimentally (gray) and expected to use the artificial intelligence tool (red).Credit: Wang And others./Common.

It is also difficult to explain the chemical alphabet of RNA: the four chemical rules that make up the RNA less distinctive than 20 amino acids found in proteins. This means that each RNA contains less information than amino acids. Friedolino is one of the reasons why one of the tools such as alphafold is very successful, which is the ability to use large sequence databases to determine the patterns of reactions between different amino acids – this is difficult to do with RNA.

Then there is a scarcity of well -known RNA structures. The protein data bank, which is a warehouse of 3D molecular structures, contains approximately 200,000 protein structures and less than 2000 RNA. This data scarcity means that there is less information to feed algorithms that are based on AI’s structure predictions.

“We are doing what we can with the limited data we have,” says Jim Collins, a biomedical engineer at the Massachusetts Institute of Technology in Cambridge. “The field will advance greatly with the collection and installation of many other structures.”

Bring Amnesty International

The researchers are working to face these challenges, and in recent years, many tools have emerged in artificial intelligence -based DNA. Before 2020, most methods of prediction of RNA structure were based on the algorithms specified by specific material or sports models, according to Gianni Yang, the organizing biologist at the University of Shandong University in Qingdao, China. But Alphafold’s success inspired people in the RNA field to apply artificial intelligence to this problem as well.

Yang and his colleagues designed an AI tool (freely available), TRROSETTARNA, which combines deep learning with elements of Rosetta, a mathematical tool used to determine the molecular structures created by David Baker at Washington University in Seattle, which participated in chemistry 2024 Nobel with creators in Alphafold.

As for proteins, the RNA structure occurs at multiple levels: nucleotides (primary); The intermediate structures that are formed when you find foundation pairs of their supplements (secondary); The final, 3D (third) structure. RNAS can also form complexes together and other (quadruple) particles. First, trroSettarna generates predictions of the initial and secondary structures, then, with the help of a classic physics -based model, it rebuilding the triple structures. Yang says that the secondary structures-such as “hairpins” that are formed when short slices of sequence with each other-are more important to the pioneers of proteins, says Yang, and the use of these structures between these is one of the keys to the success of this model.

The Yang TRROSETTARNA team incited other automated tools and found, on the basis1. In 2024, the program ranked fourth in CASP16.

Leave a Comment