Bigtable: A Distributed Storage System for Structured Data

“Bigtable: A Distributed Storage System for Structured Data” by Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber was presented at the 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI) in 2006. The paper describes a system for managing structured data that scales to petabytes across thousands of commodity servers.

Bigtable is not a relational database. Its data model is a sparse, distributed, persistent multidimensional sorted map, indexed by a row key, a column key, and a timestamp. This flexible model lets very different applications store their data in the same underlying system while keeping the storage sparse, so empty cells cost nothing.

The paper reports that Bigtable was used by more than sixty Google products and projects at the time, including web indexing, Google Earth, and Google Finance. These applications place very different demands on the system, from high-throughput batch jobs to latency-sensitive serving of data to users, and the authors describe how a single design handles both.

Bigtable, built on top of the Google File System, completed the trio of Google infrastructure papers alongside GFS and MapReduce. Its column-family data model directly inspired open-source wide-column stores such as Apache HBase and influenced the design of Apache Cassandra, helping define an entire branch of the NoSQL movement.

Bigtable: A Distributed Storage System for Structured Data

Sources

Related