A Relational Model of Data for Large Shared Data Banks (1970)

“A Relational Model of Data for Large Shared Data Banks” by E. F. Codd was published in Communications of the ACM, Volume 13, Number 6, pages 377 to 387, in June 1970. It is the paper that founded relational databases. IBM’s own publication record lists it under Codd’s name with the CACM venue and the June 1970 date.

The paper opens by arguing that future users of large data banks should be protected from having to know how the data is organized in the machine. Codd called this protection from the internal representation “data independence,” and he made it the goal of the whole approach: programs and queries should keep working even when the storage structure, indexes, or access paths change underneath them.

To achieve this, Codd proposed organizing data as relations, which can be pictured as tables of rows and columns. He defined relations formally as sets of tuples and described operations on them, and he introduced the idea of normalizing relations to remove certain kinds of redundancy and awkward structure. Because the model rested on the mathematics of sets and relations, queries could be expressed in a high-level, non-procedural way rather than as step-by-step navigation.

The paper redirected database research away from the hierarchical and network systems that dominated at the time and toward the relational approach. Within a decade it had inspired working systems and query languages, and it remains the foundational reference for every relational database in use today.