GPT-3 makes few-shot learning work at massive scale

In May 2020, Tom Brown and more than thirty OpenAI co-authors published “Language Models are Few-Shot Learners,” the paper that introduced GPT-3. The model was an autoregressive language model with 175 billion parameters, far larger than any comparable dense model before it.

The headline result was few-shot learning. The paper shows that GPT-3 could perform a wide range of tasks when given only a handful of examples described in plain text in its prompt, with no fine-tuning and no gradient updates to the model. It handled translation, question answering, reasoning, and arithmetic this way, though the authors were candid that some tasks remained difficult.

The paper also reported that GPT-3 could generate news articles human evaluators struggled to distinguish from human-written text, and it discussed the broader societal implications of that capability.

GPT-3 is the direct technical ancestor of the conversational assistants that followed. It demonstrated that simply scaling up a language model unlocked new behaviors, setting the stage for the products that brought generative AI to the mainstream.

GPT-3 makes few-shot learning work at massive scale

Sources

Related