Jan Leike

Jan Leike is an AI safety researcher focused on alignment, the problem of getting advanced AI systems to reliably do what their operators intend. According to his personal website, he holds a PhD in reinforcement learning theory from the Australian National University and has worked at DeepMind, OpenAI, and Anthropic.

Leike frames his central question as “the hard problem of alignment”: how can we train AI systems to follow human intent on tasks that are difficult for humans to evaluate directly? At DeepMind he was an alignment researcher who worked on reinforcement learning from human feedback. At OpenAI he co-led the Superalignment team and contributed to the alignment of InstructGPT, ChatGPT, and GPT-4. He now leads the Alignment Science team at Anthropic, where his work covers automated alignment researchers, scalable oversight, weak-to-strong generalization, and jailbreak robustness.

TIME magazine named Leike one of the 100 most influential people in AI in both 2023 and 2024. His career tracks the rise of alignment from an academic niche to a core function at the leading AI labs.

Sources

Related