ONNX Runtime is a high-performance engine, created and maintained by Microsoft, for running machine-learning models that have been exported to ONNX, the Open Neural Network Exchange format. ONNX is a common file format that lets a model trained in one framework, such as PyTorch or TensorFlow, be represented in a neutral way; ONNX Runtime is the software that actually executes such a model efficiently.
The runtime is built to be portable and fast. It runs the same model across many hardware targets, CPUs, GPUs, and specialized accelerators, by plugging in hardware-specific execution providers, and it applies graph-level optimizations to reduce latency, increase throughput, and shrink memory use and binary size. It runs on Linux, Windows, macOS, iOS, Android, and in web browsers, with variants like ONNX Runtime Web and ONNX Runtime Mobile, and it has bindings for Python, C#, C++, Java, JavaScript, and Rust. Microsoft uses it inside products such as Windows, Office, Azure services, and Bing. Beyond inference, it also offers training capabilities aimed at reducing the cost of large-model training and enabling on-device training.
Why a business reader should care: ONNX Runtime, paired with the ONNX format, is a practical way to decouple where a model is trained from where it runs, so a model can be deployed efficiently across very different devices and clouds without being rewritten.