GDPval is an OpenAI benchmark introduced in an October 2025 paper, “GDPval: Evaluating AI Model Performance on Real-World Economically Valuable Tasks.” Rather than testing exams or puzzles, it measures whether models can produce the actual deliverables of paid work. The tasks span 44 occupations drawn from the nine sectors that contribute most to U.S. GDP, and they were built from the representative work of industry professionals with an average of 14 years of experience. The full set runs to 1,320 tasks, of which 220 are open-sourced as a gold subset, with a public automated grading service provided.
The findings are notable for how directly they speak to the economics of AI. The paper reports that frontier model performance on GDPval is improving roughly linearly over time, that current best models are approaching industry experts in the quality of their deliverables, and that performance more than doubled from GPT-4o in spring 2024 to GPT-5 in summer 2025. It also estimates that models can complete these tasks roughly 100 times faster and 100 times cheaper than human experts, though quality is the gating factor.
GDPval matters because it tries to connect benchmark scores to economic value rather than abstract capability. For leaders weighing where AI can substitute for or augment expert labor, a benchmark grounded in real occupational deliverables is far more decision-relevant than a trivia or math leaderboard.