A large language model (LLM) is an AI system trained on enormous amounts of text so that it can predict the next piece of text and, in doing so, generate fluent, useful language. The defining trait is scale: billions of internal parameters trained on hundreds of billions of words. That scale lets a single model handle translation, summarization, question answering, coding, and more without being separately built for each.
The modern form of the LLM was crystallized by OpenAI’s 2020 paper “Language Models are Few-Shot Learners” (Brown et al.), which introduced GPT-3. The paper showed that simply making models much larger produced a qualitative jump: GPT-3 could perform new tasks from a few examples given in the prompt, with no retraining. The first sentence of its abstract notes that earlier gains came from “pre-training on a large corpus of text followed by fine-tuning on a specific task” — GPT-3 reduced the need for that task-specific step.
LLMs are built on the transformer architecture and learn purely from patterns in their training text. They do not look facts up; they encode statistical regularities of language, which is both their strength (flexibility) and their weakness (they can be confidently wrong).
Why business readers should care: LLMs are the engine behind tools like ChatGPT and Claude. Understanding that one general-purpose model can be pointed at many tasks helps explain why a single AI vendor relationship can touch customer support, drafting, analysis, and software development at once.