Two-Tower Model

A two-tower model, also called a dual encoder, is an architecture for retrieving relevant items from a huge collection quickly. It uses two separate neural networks: one “tower” encodes the query (a search string, or a user and their context) into a vector, and the other encodes each candidate item into a vector in the same space. Relevance is measured by how close the two vectors are, typically a dot product or cosine similarity.

The reason this design matters is efficiency. Because the item tower does not depend on the query, every item’s vector can be computed once, in advance, and stored in an index. At serving time the system only needs to encode the incoming query and then find its nearest item vectors, a fast approximate nearest neighbor search over precomputed embeddings. This makes it feasible to retrieve good candidates from catalogs of millions or billions of items in milliseconds, which a model that had to jointly process the query with every item could never do. The Dense Passage Retrieval work by Karpukhin and colleagues in 2020 is a clear published example, using a two-tower BERT setup that outperformed keyword search; the same pattern appears throughout large-scale recommendation.

Two-tower models typically serve as the first, recall-oriented stage of a system: they cheaply narrow billions of items down to a few hundred candidates, which a heavier ranking model then orders precisely.

For a business reader, the two-tower model is the standard way modern platforms solve the “needle in a haystack” step: it turns recommendation and semantic search into a geometry problem, finding the items whose vectors sit nearest to yours.

Sources

Related