Cache Coherence

Cache coherence is the problem of keeping the private caches of several processors consistent when they all share the same main memory. In a multiprocessor or multi-core system, each core typically has its own cache. If two cores both hold a copy of the same memory location and one core writes to it, the other core’s copy becomes stale. Cache coherence is the set of rules and hardware that guarantee every core eventually sees a single, agreed-upon value for each location, despite the private copies.

Without coherence, shared-memory parallel programming would be almost impossible to reason about, because a value written by one thread might never become visible to another. A coherence protocol restores the simple intuition that memory holds one value per address: a read returns the most recent write, no matter which core performed it.

The dominant family of solutions tracks the state of each cache line. The widely used MESI protocol gives every line one of four states - Modified, Exclusive, Shared, or Invalid. A line in Modified state is the only valid copy and differs from memory; Exclusive means this cache has the only copy and it matches memory; Shared means several caches may hold identical clean copies; and Invalid means the line holds no usable data. Transitions between these states, driven by reads and writes, enforce the rule that at most one cache may hold a writable copy at any moment.

The foundational paper behind this approach is Mark Papamarcos and Janak Patel’s 1984 work, “A Low-Overhead Coherence Solution for Multiprocessors with Private Cache Memories,” presented at the International Symposium on Computer Architecture. They built their scheme around a shared bus, with each cache watching, or “snooping,” the bus traffic of the others. Their stated aim was a solution that reduced bus traffic and “does not use any global tables,” making the design modular and easy to extend by adding more processors.

Snooping protocols like this work well when a single bus connects a modest number of cores, because every cache can observe every transaction. As systems grow to many cores, the shared bus becomes a bottleneck, so larger machines use directory-based coherence instead, where a central directory records which caches hold each line and sends targeted messages rather than broadcasting to everyone.

Hennessy and Patterson’s “Computer Architecture: A Quantitative Approach” treats both snooping and directory coherence as core material in its chapters on multiprocessors, framing coherence as one of the defining challenges of parallel hardware. The choice between snooping and directories is one of the central trade-offs in designing a multi-core processor.