The model that was 'too dangerous to release'

In February 2019 OpenAI announced GPT-2, a language model that wrote unusually coherent text, and made an unusual decision alongside it: it would not release the full, 1.5-billion-parameter model. The announcement framed this as a precaution against misuse

  • the worry that a model this fluent could be turned to mass-producing fake news, spam, and impersonation - and instead released only a much smaller version, calling the rollout a “staged release.” The press compressed this into a memorable phrase: the AI that was “too dangerous to release.”

OpenAI then did something the headline did not anticipate. It actually ran the staged release as an experiment in responsible disclosure. Over the following months it put out progressively larger versions - a medium model, then a 774-million-parameter model in an August 2019 follow-up that reported on partner research into misuse and detection. Each step was accompanied by analysis of whether the feared harms were showing up.

In November 2019 OpenAI released the full 1.5-billion-parameter model. Its own post explained the reasoning: studies had found the larger model only marginally more convincing to humans than the smaller ones already public, and OpenAI “seen no strong evidence of misuse” of the released versions. The thing held back as too dangerous was, nine months later, simply published.

There are two honest ways to read the arc, and both are in this library’s spirit. One is that the danger was overstated - that the “too dangerous” framing was hype that helped make GPT-2 famous. The other is that staged release was a genuine, defensible caution that, having gathered evidence, correctly concluded the risk was manageable and reversed course. What is not in dispute is the shape: a dramatic restraint, a measured walk-back, and a model that ended up fully public after all. (The model and its capabilities are covered separately in this library’s GPT-2 milestone; this entry tells the release-hype arc.)