Tabby launched in 2023 as a self-hosted alternative to GitHub Copilot, addressing the core concern that enterprise teams have about sending proprietary code to external APIs. The entire inference pipeline runs on the organisation’s own infrastructure — a Docker container with a Rust-built server, requiring no external database or cloud dependency.
Key capabilities
Self-hosted by default — Tabby runs as a Docker image or standalone binary on the team’s own hardware or private cloud. All code, completions, and chat messages stay within the organisation’s network perimeter.
Repository-level RAG context — Tabby builds a code understanding layer from the team’s own repositories, giving completions and answers that reflect the team’s actual patterns, frameworks, and naming conventions rather than generic training data.
Team Answer Engine — A shared Q&A layer that lets team members ask questions about the codebase and receives answers grounded in the team’s actual code, documented in a searchable history.
Pochi agentic layer — TabbyML’s Pochi project (github.com/TabbyML/pochi) adds a full agentic coding loop on top of Tabby’s platform, with autonomous task execution in isolated git worktrees and parallel agent support.
Autonomy level
Level 2 — Assisted. Tabby’s core product is code completion and chat with codebase context. The Pochi agentic layer extends this to Level 3-4 autonomous task execution, but is a separate product.
Strengths
- Complete data sovereignty — all inference runs in your own environment
- Apache 2.0 licence; 33,600 GitHub stars validate strong adoption
- No external cloud dependency or monthly per-seat fees
- Supports local models (Ollama, llama.cpp) for fully air-gapped deployments
- VS Code and JetBrains plugins; enterprise features available
Limitations
- Requires infrastructure setup and maintenance (Docker, GPU for best performance)
- Self-hosting shifts operational burden to the team
- Core product is autocomplete and chat; agentic capabilities require Pochi separately
- Completion quality depends on hardware available for inference
- Smaller model selection than cloud-based tools without a GPU server