MLPerf measures the performance of the computer hardware and software used to build and run AI, rather than the intelligence of a model. Its training benchmarks measure how quickly a system can train a model to a target quality level, and its inference benchmarks measure how fast a trained model can process inputs and produce results. Separate suites cover datacenter, edge, mobile, and tiny devices, plus a supercomputer-scale HPC variant.
MLPerf is run by MLCommons, an industry consortium, and MLPerf is a registered trademark of MLCommons. The benchmark suite was described in the 2019 paper “MLPerf Training Benchmark” and is maintained as a representative suite intended to fairly evaluate system performance across vendors.
MLPerf became the standard way for chip makers, cloud providers, and framework teams to compare hardware on an apples-to-apples basis, which is why its results are widely cited in product launches and procurement decisions.
Current results are published on the official MLCommons leaderboards for each benchmark round and are updated regularly, so they are not listed here.