“Toolformer: Language Models Can Teach Themselves to Use Tools,” posted to arXiv on February 9, 2023 by Timo Schick and seven co-authors at Meta AI, asked a pointed question: language models are bad at things that simple programs do well - arithmetic, looking up facts, dates, translation - so can a model learn to call those programs itself? Toolformer’s answer was a self-supervised training method that requires only a handful of demonstrations per tool.
The trick is to let the model annotate its own training data. Starting from a few examples, the model proposes places in ordinary text where an API call might help, actually makes the calls, and keeps only the calls that measurably reduce its prediction error on the following tokens. The model is then fine-tuned on this filtered, tool-augmented text. After training it decides for itself which API to use, when, and with what arguments. The paper wired up a calculator, a question-answering system, two search engines, a translation system, and a calendar.
The result was a model that substantially improved zero-shot performance on a range of downstream tasks - sometimes matching much larger models - without losing its core language ability. Toolformer is a landmark because it framed tool use as something a model learns to do natively, rather than something bolted on by an external orchestration loop, and it sits alongside ReAct as one of the two foundational papers of the tool-use era.