Redwood Research

Redwood Research is a nonprofit AI safety and security research organization focused on understanding and mitigating risks from advanced AI systems that might purposefully act against the interests of their developers. It frames its goal as better understanding these risks and developing methodologies to manage them while still realizing the benefits of AI.

Redwood is best known for introducing the research area it calls AI control, which proposes protocols for monitoring and constraining potentially malign LLM agents so they can be deployed safely even if they are not fully trusted. The organization describes this as a bedrock approach for mitigating catastrophic risk from misaligned AI, distinct from trying to guarantee alignment in advance.

The lab has also worked on strategic deception. In collaboration with Anthropic, Redwood demonstrated that large language models can hide misaligned intentions, contributing what it calls the strongest concrete evidence that LLMs might naturally fake alignment. Beyond research, Redwood advises AI companies, including Google DeepMind and Anthropic, and governments on assessing and mitigating risks from misaligned systems.

For a business reader, Redwood’s contribution is a pragmatic one: rather than assuming we can make AI trustworthy, it asks how to get useful work out of systems we cannot fully trust.

Sources

Related