GPT-4 Passes the Bar Exam (Katz et al., 2023)

In March 2023, days after OpenAI announced GPT-4, a team led by Daniel Martin Katz and Michael Bommarito (with Shang Gao and Pablo Arredondo) released “GPT-4 Passes the Bar Exam.” The paper experimentally evaluated GPT-4’s zero-shot performance on the entire Uniform Bar Examination (UBE) - not just the multiple-choice Multistate Bar Exam (MBE) but also the open-ended Multistate Essay Exam (MEE) and the Multistate Performance Test (MPT). The work was later published in the journal Philosophical Transactions of the Royal Society A in 2024.

The reported results were striking. On the MBE, GPT-4 significantly outperformed both human test-takers and prior GPT models, a roughly 26% increase over ChatGPT (GPT-3.5) and beating humans in five of seven subject areas. On the essays and performance test, GPT-4 averaged 4.2 out of 6.0. Scored across all UBE components the way a human applicant would be, the authors estimated GPT-4 reached approximately 297 of 400 points - comfortably above the passing threshold (typically 266 to 270) in every UBE jurisdiction. The widely repeated headline was that GPT-4 scored near the “90th percentile” of human test-takers.

The result became one of the most-cited demonstrations that large language models could perform a complex, high-stakes professional task. It was also quickly contested: a separate analysis by Eric Martinez argued the 90th-percentile framing was inflated by the unusual makeup of the comparison group, and put GPT-4’s true standing far lower. The two papers are best read together.

Why business readers should care: this paper crystallized the moment professionals across many fields started asking whether AI could do parts of their job. It also became a case study in benchmark interpretation - the difference between “passed the exam” and “outperformed 90% of humans” turned out to matter enormously, and the gap drove a careful re-examination of the underlying statistics.