Word Embeddings

A word embedding is a way of representing a word as a list of numbers - a vector - in a high-dimensional space, learned so that words used in similar contexts end up close together. This replaces older “one-hot” schemes, in which every word was an isolated symbol with no notion of similarity, and gives software a usable sense of meaning derived purely from how words co-occur in text.

The idea became practical in 2013 when Tomas Mikolov and colleagues at Google released word2vec, described in “Efficient Estimation of Word Representations in Vector Space.” word2vec trained quickly on huge corpora and produced embeddings with a striking property: simple vector arithmetic captured analogies, so that the vector for “king” minus “man” plus “woman” landed near “queen.” Stanford’s GloVe followed in 2014 with a related method based on global word co-occurrence statistics. Both became standard building blocks across translation, search, classification, and recommendation.

Word embeddings were a key step toward today’s large language models. Modern transformers still begin by turning tokens into learned vectors, but instead of a single fixed vector per word they produce context-dependent representations that change with the surrounding text.

Why business readers should care: word embeddings are the reason search, chatbots, and recommendation systems can match on meaning rather than exact keywords, and they remain the entry layer of nearly every language AI in production.

Sources

Last verified June 7, 2026