Devin and the AI-agent hype

On March 12, 2024, the startup Cognition introduced Devin in a blog post that called it “Devin, the first AI software engineer.” The launch was accompanied by demonstration videos showing Devin taking a task and carrying it through to a finished result on its own, and it became a focal point for a broader surge of interest in autonomous AI coding agents.

Cognition’s own post made specific capability and benchmark claims. It listed abilities including that Devin can “learn unfamiliar technologies,” “build and deploy apps end to end,” “autonomously find and fix bugs,” “train and fine tune its own AI models,” and “contribute to mature production repositories.” On the SWE-bench coding benchmark, the post stated that “Devin correctly resolves 13.86% of the issues end-to-end, far exceeding the previous state-of-the-art of 1.96%,” and noted that even when previous models were given the exact files to edit they resolved only 4.80% of issues, compared with Devin’s unassisted 13.86%.

The launch drew skepticism about whether the demonstrations reflected what the system reliably did in practice. Much of that scrutiny came in the form of third-party video analyses and press commentary, which are secondary sources and are not reproduced here as primary claims. What can be documented from primary material is the gap between the framing — “the first AI software engineer” able to complete real freelance jobs — and the benchmark figure in the same post, in which Devin resolved under 14 percent of issues end-to-end. The episode is a representative example of how a single launch can set expectations for an entire product category well ahead of measured capability.

Sources

Related