Ollama makes running local LLMs a one-line command

Ollama is an open-source tool, released in 2023 and distributed under the MIT license, for running large language models locally. Its goal was to remove the friction of local inference: instead of compiling code and converting weights by hand, a user runs a single command such as “ollama run gemma3” to download a model and start chatting with it.

Under the hood Ollama uses the llama.cpp project founded by Georgi Gerganov as its execution backend, inheriting that project’s quantization and broad hardware support. On top of it Ollama adds a model library, a packaging format for bundling weights and prompt templates, and a local REST API plus Python and JavaScript client libraries, so developers can build applications against a private model the same way they would against a cloud API. Supported models include Llama, Gemma, Mistral, Qwen, DeepSeek, and others.

Ollama became the most common front door to local AI for developers. By turning a research-grade inference engine into a friendly, scriptable service, it made running models privately, with no data leaving the machine, a realistic default for prototyping and for privacy-sensitive deployments.

Sources

Last verified June 7, 2026