B-tree

A B-tree is a balanced search tree designed for data that lives on disk rather than in memory. Instead of storing one key per node like a binary tree, a B-tree packs many keys into each node, and every node corresponds to a block (or “page”) of storage. Because the tree is kept balanced, the path from the root to any key is short, so the database reads only a handful of pages to find, insert, or delete a key even when the index holds millions of entries.

The structure was introduced by Rudolf Bayer and Edward McCreight in their 1972 paper “Organization and Maintenance of Large Ordered Indexes,” published in Acta Informatica. Their key result was that an index organized this way allows retrieval, insertion, and deletion of keys in time proportional to log_k of the index size, where k is a device-dependent number chosen so the scheme performs near-optimally. Tying the node size to the physical block size of the storage device is what makes the structure efficient: each step down the tree costs roughly one disk read.

The B-tree stays balanced automatically. When a node fills up, it splits, and the split can cascade upward; when nodes become too empty, they merge. These local rebalancing operations keep every leaf at the same depth, which is what guarantees the logarithmic worst case for every operation.

The design proved so durable that it became the default index structure across the industry. PostgreSQL, for example, ships a standard B-tree implementation and notes that “any data type that can be sorted into a well-defined linear order can be indexed by a btree index.” Variants of the B-tree also sit underneath SQLite, filesystems, and key-value stores, making it one of the most widely deployed data structures in computing.

Sources

Related