LLMflation

“LLMflation” is a term coined by Guido Appenzeller of the venture firm Andreessen Horowitz (a16z) in a post published November 12, 2024. Despite the name, it describes deflation: the rapid fall in the cost of running large language models. Appenzeller defined it as “the rapid increase in tokens you can obtain at a constant price.” His headline finding was that for a model of equivalent performance, the inference cost was dropping by about 10x every year.

The concrete example anchored the claim. When GPT-3 launched in November 2021, generating text cost roughly 60 dollars per million tokens. By late 2024, a smaller model reaching the same quality benchmark cost about 0.06 dollars per million tokens - a roughly 1,000-fold reduction over three years. Appenzeller noted this decline was even faster than the drop in compute cost during the PC revolution or in bandwidth cost during the dot-com era, invoking precedents like Moore’s Law as historical analogues.

There is an important nuance the post highlighted: the cheapest price for a given capability falls dramatically, but the frontier does not get cheaper at the same rate. OpenAI’s then-leading reasoning model cost about the same per output token as GPT-3 did at launch. In other words, you can now buy yesterday’s intelligence for almost nothing, while today’s best intelligence still commands a premium. The deflation happens as capabilities commoditize downward over time.

Why a business reader should care: LLMflation reframes AI budgeting. A workflow that is uneconomical at today’s prices may be cheap within a year or two, which favors building for where costs are heading rather than where they are - while also squeezing the margins of any business whose value rests only on access to a model tier that is rapidly becoming a commodity.

Sources

Related