ARC-AGI measures something most AI benchmarks do not: the ability to learn and reason about brand-new tasks rather than recall trained knowledge. Each task is a small visual grid puzzle where the system sees a few input-output examples and must infer the underlying rule, then apply it to a new case. These puzzles are designed to be easy for humans but difficult for machines, targeting what the field calls fluid intelligence or skill-acquisition efficiency.
The underlying framework comes from Francois Chollet’s 2019 paper “On the Measure of Intelligence,” which argued that traditional benchmarks reward task-specific skill and data saturation rather than genuine generalization, and introduced the Abstraction and Reasoning Corpus. The work is now stewarded by the ARC Prize Foundation, a nonprofit co-founded by Chollet and Mike Knoop that runs competitions with over $2 million in prizes and describes ARC-AGI as the benchmark that measures fluid intelligence.
ARC-AGI became a standard reference point because it resists the brute-force memorization that inflates other benchmarks, and it has flagged turning points in AI such as the rise of reasoning systems.
Current scores are published on the official ARC Prize leaderboard and change over time, so they are not reproduced here.