Developers Think AI Makes Them 24 Percent Faster but It Actually Makes Them 19 Percent Slower

The study is notable for its methodology. Rather than testing developers on artificial coding challenges, METR measured experienced programmers working on real tasks in codebases they already knew well. The developers selected their own tasks and used their preferred AI tools, making the results harder to dismiss as artificial or unrepresentative.

The slowdown came from activities that feel productive but consume time: reviewing AI-generated suggestions, debugging code the AI wrote incorrectly, re-prompting after poor outputs, and context-switching between the developer's own reasoning and the AI's suggestions. Each of these micro-interruptions adds friction that accumulates over a working session.

The findings align with the Stack Overflow 2025 Developer Survey, which found that trust in AI coding accuracy dropped to 29 percent from 40 percent the previous year. Nearly half of developers now actively distrust AI tool accuracy, yet usage continues to rise, creating an unusual dynamic where adoption and scepticism increase simultaneously.

The implications extend beyond individual productivity. Companies making hiring and staffing decisions based on assumed AI productivity gains may be operating on flawed assumptions. The perception gap suggests that self-reported productivity metrics from developers using AI tools are unreliable.

Analysis

Why This Matters

Billions of dollars in enterprise software spending are predicated on AI coding tools delivering productivity gains. If those gains are illusory, the economic case for many AI investments needs reassessment.

Background

Previous studies of AI coding productivity have generally used less rigorous methodologies, often measuring completion of isolated tasks rather than real-world development workflows.

Key Perspectives

AI tool vendors dispute the findings, pointing to their own internal metrics. Independent researchers note that the perception gap itself is the most concerning finding, as it suggests developers cannot self-assess AI's impact.

What to Watch

Whether companies adjust their AI tool strategies based on this data, and whether larger-scale replication studies confirm the perception gap finding.

ZOTPAPER