AI can address projects that take weeks of human weeks

A new scale is evaluated that offers artificial intelligence models.Credit

Today, artificial intelligence systems (AI) cannot overcome humans in long tasks, but they improve at a rapid pace and can close the gap sooner than expected, according to the analysis of the leading models.1.

Metr, a non -profit organization in Berkeley, California, has established approximately 170 tasks in the real world in coding, cyber security, general thinking and machine learning, and then created the “human foundation line” by measuring the period that expert programmers took to complete.

Then the team developed a measure to assess the progress of artificial intelligence models, which he calls “the horizon of the task completion”. This is the time that programming usually takes to complete the tasks that artificial intelligence models can complete with a specific success rate.

In the Preprint collection that was published on ARXIV this week, Metr mentioned that GPT-2, the early Grand Language Model (LLM) published by Openai in 2019, failed in all the tasks that human experts took more than one minute. Claude 3.7 Sonnet, which was released in February by the emerging anthropologist in the United States, has completed 50 % of the tasks that will take 59 minutes.

In general, the time horizon of 13 models of Amnesty International has doubled almost every seven months since 2019. Si growth has accelerated the AI ​​Time horizons in 2024, when their latest horizons doubled almost every three months. The work was not officially reviewed.

Improve performance. The graph shows the duration of the tasks that artificial intelligence models can complete at 50 % accurately multiplied every seven months.

Source: T. to And others. Preprint in Arxiv https://doi.org/10.48550/arxiv.2503.14499 (2025).

At the progress rate 2019-2024, Metr suggests that artificial intelligence models will be able to deal with tasks that take people about a month by 50 % by 2029, and perhaps sooner.

One month of custom human experience, paper notes, can be sufficient to start a new company or discover scientific discoveries, for example.

But Joshua Gans, a professor of management at the University of Toronto in Canada, wrote about the economies of artificial intelligence, that these types of predictions are not useful. He says: “Induction is the tempting to do, but there is still a lot that we do not know about how to actually use artificial intelligence so that this is meaningful.”

Human evaluation against artificial intelligence

The team chose the success rate of 50 % because it was the harshest small changes in data distribution. “If you choose very low or very high sills, removing or adding a successful or one failed task, respectively, changes your appreciation a lot,” says Lawrence Chan, co -author Lawrence Chan.

The reliability threshold increased from 50 % to 80 % reduced the average time horizon with a five -factor – although the total double time and direction direction were similar.

In the past five years, the improvements in the general abilities of LLMS have been largely driven by increases in size – the amount of training data, training time and number of model parameters. Paper is mainly attributed to the progress of the timeline scale for improvements in the logical logic of AI, the use of tools, error correction, and self -awareness in carrying out the task.

The Metr approach treats the temporal horizon some restrictions in the current artificial intelligence standards, which only plan to work in the real world and quickly “saturate” with the improvement of models. The co -author Ben West says: It provides a continuous and intuitive measure that is better captured by long -term progress.

Amnesty International Models say a supernatural performance in many criteria, but had a relatively small economic impact. Metr’s latest research is a partial answer to this puzzle: the best models sit on a 40 -minute timetable, and there are not many businesses of economic value that a person can do at that time, West says.

But Anton Truekov, a researcher of artificial intelligence and businessman in San Francisco, California, says artificial intelligence will have a greater economic impact if organizations are more willing to experience and invest in models effectively.

Leave a Comment