AI agents have a declining success rate over time when performing longer tasks due to a constant failure rate, similar to a half-life in radioactive decay.
A study showed that the length of tasks AI agents can reliably complete doubles every 7 months, suggesting consistent progress in AI capabilities.
The constant hazard rate model suggests that AI tasks are composed of several independent subtasks, where failure in any subtask leads to task failure.
The model used to examine AI success rates can predict how success rates change with task length and over time.
There is a difference between human and AI performance on longer tasks, as humans do not experience the same rate of success decline.
Further statistical analysis is needed to confirm the constant hazard rate model and understand its implications for AI development.
Get notified when new stories are published for "🇺🇸 Hacker News English"