Query Optimizer

A query optimizer is the part of a relational database that decides how to actually run a query. SQL lets a user say what data they want without saying how to fetch it; the optimizer fills that gap by choosing among many possible execution strategies, such as scanning a whole table versus using an index, and picking the order in which to join tables.

The foundational treatment is the System R paper “Access Path Selection in a Relational Database Management System” by Patricia Selinger and colleagues at IBM (ACM SIGMOD, 1979). It describes System R as “an experimental database management system” and addresses, in its own words, how the system “chooses access paths for both simple (single relation) and complex queries (such as joins), given a user specification of desired data as a nonprocedural query expression.”

The key idea introduced there is cost-based optimization. Rather than rely on the programmer or on fixed rules, the optimizer estimates the cost of each candidate plan using statistics about the data, then selects the cheapest. The paper’s dynamic-programming approach to choosing join orders and access paths is still the conceptual backbone of optimizers in modern database systems.

Because the optimizer makes SQL practical, it is one of the reasons the relational model won. Users could write queries declaratively and trust the system to find an efficient plan, instead of hand-coding access paths as in the navigational databases that came before.