METR

A substantial amount of METR's research is focused on evaluating the capabilities of AI systems to conduct research and development of AI systems themselves, including RE-Bench, a benchmark designed to test whether AIs can "solve research engineering tasks and accelerate AI R&D". Doubling time estimates s are capable of executing at a 50% success rate doubled every 7 months from 2019 to 2024. The shaded region represents a 95% confidence interval. In March 2025, METR published a paper noting that the length of software engineering tasks that the leading AI model could complete had a doubling time of around 7 months between 2019 and 2024. In January 2026, METR has released a new version of their time horizon estimates model (Time Horizon 1.1). According to their new model the rate of progress of AI capabilities has increased since 2023. They now estimate that the post-2023 doubling-time is 130.8 days (4.3 months). Progress is thus estimated to be 20% more rapid. Time horizon measurements METR releases a "task-completion time horizon" for analysed AI models. This measures the "task duration (measured by human expert completion time) at which an AI agent is predicted to succeed with a given level of reliability." They release it in two variants: The 50%-time horizon, which gives the task duration at which an AI model is estimated to succeed 50% of the time and the 80%-time horizon, which gives the task duration at which an AI model is estimated to succeed 80% of the time. They have two versions of horizon estimates: Time Horizon 1.1, introduced in January 2026, and the original Time Horizon 1.0. the best performing model is Claude Opus 4.6 with a 14 hours 30 minutes 50%-time horizon and a 80%-time horizon of 1 hour and 3 minutes. The following table provides the time horizon estimates ordered by the model's release date: == References ==