Snowflake is a cloud data warehouse built so that storage and compute scale independently. Snowflake’s own documentation describes it as a “self-managed service” with “an advanced data platform” that is “natively designed for the cloud,” running on “public cloud infrastructure to host virtual compute instances and persistent data storage.” Crucially, it “separates storage and compute, which simplifies some traditional challenges of data engineering, such as infrastructure management and performance tuning.”
That separation is the system’s defining architectural idea. The company’s research paper, “The Snowflake Elastic Data Warehouse,” was accepted for publication and presentation at SIGMOD 2016, where Snowflake’s engineers described their distinctive architecture, the ability to resize compute dynamically, and support for varied workloads on shared storage. Decoupling storage from compute lets many independent compute clusters read the same data, and lets each be scaled or paused on demand.
Snowflake also popularized data sharing, which lets one account give another live, query-able access to data without copying it. Combined with elastic per-second compute, this reframed the warehouse from a fixed cluster into an on-demand service.
The company was founded in 2012 and made its product generally available in 2014. Its 2020 initial public offering was, at the time, the largest software IPO on record, a milestone that signaled how central cloud data warehousing had become to the industry.