Triton (GPU programming language)

Triton is a language and compiler for writing highly efficient custom deep-learning primitives, originally created by Philippe Tillet and developed at OpenAI. Its goal, in the project’s own words, is to provide an open-source environment for writing fast GPU code at higher productivity than CUDA but with more flexibility than other domain-specific languages. The underlying ideas were published as “Triton: An Intermediate Language and Compiler for Tiled Neural Network Computations” at MAPL 2019.

Writing a fast GPU kernel by hand in CUDA requires managing memory hierarchies, thread blocks, and tiling, which is difficult and error-prone. Triton lets a developer write kernels in a Python-like syntax while the compiler handles much of that low-level optimization automatically, producing code that can rival expert-tuned CUDA. Modern Triton is built on a backend using MLIR and supports NVIDIA and AMD GPUs, with CPU support in development.

Triton became important infrastructure because it sits between high-level frameworks and the hardware. Performance-critical operations such as fused attention kernels are written in it, and PyTorch’s compiler uses Triton to generate optimized GPU code. It widened the pool of people who can write hardware-efficient AI code beyond CUDA specialists.

Why business readers should care: kernels written in Triton directly affect how fast and cheaply models train and run, and tools like it reduce the scarce, expensive expertise needed to wring performance out of GPUs.

Triton (GPU programming language)

Sources

Related