perf is the standard performance analysis tool of the Linux kernel. It sits on top of a kernel subsystem originally called Performance Counters for Linux and now known as perf_events, which gives userspace a unified way to read the processor’s hardware performance monitoring unit (PMU). Through that subsystem, perf can count and sample events such as CPU cycles, retired instructions, cache misses, and branch mispredictions, on a per-thread, per-process, or per-CPU basis. The perf wiki describes the tool as powerful precisely because it reaches all of these sources from one interface.
The tool is organized as a set of subcommands documented in its man pages. perf stat runs a workload and prints aggregate event counts; perf record samples a running program and writes the samples to a perf.data file; perf report and perf annotate then turn that file into a ranked breakdown and into source- or instruction-level annotations; and perf top gives a live, continuously updating view of the hottest functions. By default perf record samples on the cycles event, which the kernel maps to an appropriate hardware PMU event, so a basic CPU profile requires no special instrumentation of the target program.
Beyond hardware counters, perf reaches the kernel’s software instrumentation. It can attach to static tracepoints, which are predefined instrumentation points placed at logical locations such as system call entry, scheduler activity, TCP/IP events, and filesystem operations, and which carry negligible overhead when not enabled. It can also create dynamic probes on the fly: kprobes for arbitrary kernel functions and uprobes for userspace functions, so that points the kernel developers never anticipated can still be measured. This combination of counters, tracepoints, and dynamic probes is what makes perf both a profiler and a tracer.
perf is maintained in the kernel source tree itself, which keeps it tightly coupled to the kernel version it ships with. Because perf record can capture full call stacks at each sample, it became the most common front end for generating flame graphs on Linux: the recorded stacks are exported, folded, and rendered into the now-familiar stacked visualization. The project’s documentation moved over time from the original perf.wiki.kernel.org to a maintained wiki at perfwiki.github.io, but the tool’s role has only grown, and perf is today the default starting point for CPU profiling on Linux.