Feature Store

A feature store is a piece of machine-learning infrastructure that manages features, the processed input signals a model consumes, as a shared, governed resource rather than something each project rebuilds on its own. It computes features from raw data, stores both historical values (for training) and fresh values (for live serving), and serves them through a common interface so that the same feature definition is used in both places.

The concept was popularized by Uber’s Michelangelo platform, described publicly in September 2017. Michelangelo included a centralized Feature Store holding on the order of ten thousand features that any team could discover and reuse, with features automatically calculated and updated on a schedule. This solved two persistent problems. First, duplication: without a shared store, many teams independently re-implement the same features, wasting effort and introducing inconsistencies. Second, and more subtly, training-serving skew: when the code that builds a feature for training differs from the code that builds it at prediction time, the model sees subtly different inputs in production than it was trained on. A feature store enforces a single definition used everywhere, and open-source and commercial feature stores later generalized the idea beyond Uber.

Why a business reader should care: a feature store turns the messy, repeated data-preparation work behind every model into reusable infrastructure, which is one of the practical foundations of running machine learning reliably at scale.

Sources

Related