AI cannot solve these puzzles that take people only seconds

There are many ways to test intelligence artificial intelligence Liquidity of conversation, understanding of reading, or difficult mind Physics. But some tests that you are likely to find AIS It is the one that finds humans relatively easy, even fun. Although AIS is increasingly outperforming tasks that require high levels of human experience, this does not mean that they are about to reach artificial general intelligence, or AGI. Aji Artificial intelligence requires a very small amount of information and use it to generalize and adapt new situations with them. This ability, which is the basis of human learning, It is still a challenge to AIS.

One of the tests designed to assess the ability of artificial intelligence to generalize is the set of abstraction and logic, or an arc: a group of small and colored puzzles that require solutions to infer a hidden base and then apply to a new network. It was developed by Ai François Cholete in 2019, and it became the basis for the ARC Prize Foundation, a non -profit program that runs the test – a standard in industry that is used by all major artificial intelligence models. The organization is also developing new tests and has been used routinely (ARC-AGI-1 and its most challenging ARC-AGI-2). The Foundation launches this week ARC-AGI-3, which is specifically designed to test artificial intelligence agents-and is based on making them play video games.

American scientific He spoke to the president of the ARC Award Foundation, Amnesty International researcher and businessman Greg Camradat to understand how these tests are evaluating AIS, and what they tell us about the capabilities of AGI and why often challenge deep learning models although many people tend to find them relatively easy. Links to try tests at the end of the article.

[An edited transcript of the interview follows.]

What is the definition of intelligence measured by ARC-AGI-1?

Our definition of intelligence is your ability to learn new things. We already know that artificial intelligence can win chess. We know they can overcome. But these models cannot be circulated to new fields; They cannot go and learn English. So what Francois Chollet made is a standard called Arc-Agi- It teaches you a small skill in the question, then asks you to prove that little skill. We know something mainly and ask you to repeat the skill you have just learned. So the test measures the model’s ability to learn within a tight field. But our claim is that it does not measure AGI because it is still in the field of domain [in which learning applies to only a limited area]. It measures that artificial intelligence can be circulated, but we do not claim that this is AGI.

How do you define AGI here?

There are two ways to look at. The first is more technician forward, which is “Can the artificial system match the efficiency of human learning?” Now what I mean after the birth of humans, they learn a lot outside their training data. In fact, they don’t really do it Ownership Training data, unlike a few evolutionary bririne. So we learn how to speak English, learn how to drive, and learn how to ride a bike – all these things outside our training data. This is called circulation. When you can do things outside what you have been trained now, we define this as intelligence. Now, the alternative definition of the AGI that we use is when we can no longer reach problems that human beings can do and the AI - when we have AGI. This is a monitoring definition. The other side is also true, which is still the ARC or Humanitarian Award in general can find problems that human beings can do but Amnesty International cannot have AGI. One of the main factors about the Chollet standard … is that we are testing humans on them, and the average person can do these tasks and these problems, but Amnesty International is still facing a truly difficult time. The interesting reason is that some advanced AIS, such as GROK, can pass any graduate test or do all these crazy things, but this is thorny intelligence. He still has no power to generalize human. This is what this standard shows.

How do your criteria differ from those that other organizations use?

One of the things that distinguishes us is that we ask that humans be solved by humans. This opposes other criteria, as they make “an extra doctorate” problems. I shouldn’t tell me that artificial intelligence is more intelligent than me – I really know that Openai’s O3 can do a lot of things better than me, but it has no human power in generalization. This is what we measure, so we need to test humans. We have already tested 400 people on ARC-AGI-2. We got to them in a room, gave them computers, and we examined a demographic, then we gave them the test. The average person recorded 66 per cent over ARC-AGI-2. Collectively, although the collected answers from five to 10 people will contain the correct answers to all questions on the ARC2.

What makes this test difficult for Amnesty International and relatively easy for humans?

There are two things. Humans are incredibly effective with their learning, which means that they can look at a problem and perhaps two or two example, they can capture the mini -skill or transformation and can do so. The algorithm that works in the human head are orders of better and more efficient size than we see with artificial intelligence at the present time.

What is the difference between Arc-Agi-1 and Arc-Agi-2?

Even Arc-Agi-1, made Francois Chollet it himself. It was about 1000 tasks. It was in 2019. It was not even approaching. Then the thinking models that appeared in 2024, by Openai, began to make progress, which showed a change in the step level in what artificial intelligence could do. After that, when we went to ARC-AGI-2, we went a little to the bottom of the rabbit hole regarding what humans could do and Amnesty International could not. It requires a slightly larger layout for each task. So instead of solving it within five seconds, humans may be able to do this in a minute or two. There are more complicated rules, and the networks are larger, so you should be more accurate with your answer, but it is the same concept, somewhat … We are now launching a developer inspection of ARC-AGI-3, and this is completely directed at this format. The new coordination will actually be interactive. So think about the matter as an agent.

How will the ARC-AGI-3 test factors differently compared to previous tests?

If you are considering daily life, it is rare for us to have an insecure decision. When I say without sexual, I mean just a question and an answer. Now all criteria are somewhat sexual criteria. If you offer a linguistic model, it gives you one answer. There is a lot that you cannot test with a nationalized standard. You cannot test the planning. You cannot test exploration. You cannot test the maximum around your environment or goals that come with that. So we are making 100 new video games we will use to test humans to make sure that humans can do because this is the basis for our measurement. Then we will drop AIS in these video games and know if they can understand this environment that they have never seen in advance. To date, with our internal test, we have not been able to overcome one level even from a game.

Can you describe video games here?

Each “environment”, or a video game, is a two -dimensional puzzle dependent on the pixels. These games are organized as distinct levels, each of which is designed to teach a small specific skill for the player (human or artificial intelligence). To complete a level, the player must show the mastery of that skill by implementing the planned procedures sequences.

How is video games used to test AGI different from the ways in which video games were previously used to test artificial intelligence systems?

Video games have long been used as standards in artificial intelligence research, with ATAri games a common example. But traditional video games standards face several restrictions. Famous games are available in large -scale training data available to the public, and lacks standardized standardization standards and allows brute force methods that include billions of simulation. In addition, developers who build artificial intelligence agents are usually a prior knowledge of these games – who unintentionally integrate their own visions of solutions.

Trying Arc-Agi-1and Arc-Agi-2 and Arc-Agi-3.

Leave a Comment Cancel reply