John Schulman

John Schulman is an American AI researcher best known for the reinforcement learning algorithms behind modern language-model training. He earned a physics degree from Caltech in 2010 and a Ph.D. at UC Berkeley advised by Pieter Abbeel, where his work focused on deep reinforcement learning for robotics and control.

During his Berkeley years and after, Schulman developed two algorithms that became foundations of deep reinforcement learning: Trust Region Policy Optimization (TRPO) in 2015, which made policy-gradient training stable by constraining how far each update could move the policy, and Proximal Policy Optimization (PPO) in 2017, which kept the stability benefits with a far simpler clipped objective. PPO became the field’s default policy-optimization method and the workhorse of reinforcement learning from human feedback (RLHF), the technique used to align ChatGPT, Claude, and other instruction-following models.

In December 2015 Schulman co-founded OpenAI alongside Sam Altman, Elon Musk, Ilya Sutskever, Greg Brockman, and others. He led OpenAI’s reinforcement learning work for years and has been described as a central architect of ChatGPT. In August 2024 he announced he was leaving OpenAI for Anthropic, and in February 2025 he left again to become chief scientist at Thinking Machines Lab, the startup founded by former OpenAI CTO Mira Murati.

Sources

Related