Dirichlet Process

The Dirichlet process is a probability distribution over probability distributions, introduced by Thomas Ferguson in his 1973 Annals of Statistics paper. It is the workhorse of Bayesian nonparametric modeling, where the goal is to learn the shape of an unknown distribution without committing in advance to a fixed number of parameters. The Dirichlet process serves as a prior on that unknown distribution, and Ferguson proved it has the convenient property that the posterior, after seeing data, is again a Dirichlet process.

A defining feature is that distributions drawn from a Dirichlet process are discrete: they place all their mass on a countable set of points, and those points tend to repeat. This is often pictured through the so-called Chinese restaurant metaphor, in which each new data point either joins an existing group with probability proportional to that group’s current size or starts a brand new group.

That self-reinforcing behavior is exactly what makes the Dirichlet process useful for clustering. Rather than telling the model how many clusters exist, you let the prior generate as many as the data support, with the number growing slowly as more data arrive.

For a general reader, the appeal is adaptivity. In tasks like discovering customer segments or document topics, the right number of groups is rarely known beforehand, and the Dirichlet process provides a principled way to infer it from the data instead of guessing.