A CPU cache is a small block of very fast memory placed close to the processor that holds copies of data and instructions the processor has recently used or is likely to use soon. Main memory is far slower than the processor core, so without a cache the CPU would spend most of its time idle, waiting for data to arrive. The cache hides that latency: when the data the processor wants is already in the cache, the access completes in a few cycles instead of hundreds.
The idea was set out by Maurice Wilkes in his 1965 paper “Slave Memories and Dynamic Storage Allocation.” Wilkes proposed using a fast core memory as a “slave” to a larger, slower main store, arranged so that “the effective access time is nearer that of the fast memory than that of the slow memory.” His slave memory automatically held copies of words drawn from the slow store, transparently to the program. That automatic, transparent fast buffer is exactly what we now call a cache, and Wilkes is credited with naming the concept.
Modern processors use not one cache but several, organized into levels. The L1 cache is smallest and fastest, sitting right next to each core, often split into separate instruction and data caches. The L2 cache is larger and slightly slower, and the L3 cache is larger still, frequently shared among all cores on a chip. Each level trades capacity against speed, so the most-used data lives nearest the processor.
Caches are organized into fixed-size blocks called lines, typically 64 bytes. When the processor requests an address, the cache checks whether the corresponding line is present. If it is, the result is a hit; if not, it is a miss, and the line must be fetched from a slower level or from main memory. Because a whole line is loaded at once, nearby data comes along for free, which is why programs that access memory in sequential or clustered patterns run faster.
Hennessy and Patterson’s textbook devotes an entire chapter to memory-hierarchy and cache design, treating the cache as one of the central performance levers in computer architecture. Cache behavior shapes how fast real software runs as much as raw clock speed does, which is why programmers who care about performance learn to write code that is friendly to the cache.
The cache works only because real programs reuse data and touch nearby addresses, a property called locality of reference. Without that empirical regularity, a small cache could never stand in for a large memory; with it, a few kilobytes can capture the bulk of a program’s accesses.