← Back to switchcasestudios.com
SWITCHCASE STUDIOS
switchcasestudios.com
FIELD NOTES · AI & PRODUCTIVITY
● SwitchCase Studios — AI Strategy

AI Is a Power Tool,
Not a New Hire

The productivity gains are real, measurable, and conditional — and quietly dependent on a human who knows what they're doing at the controls.

01

The Gains Are Real
(and Bigger Than Most)

The same headline keeps coming around, just with the noun swapped out. AI replaces developers. AI replaces analysts. AI replaces your whole marketing team. It makes for a great tweet and a terrible strategy.

The good news: we no longer have to argue from vibes. There's a solid stack of real studies — randomized trials, field experiments, large-scale telemetry — and they all rhyme. Start with the study that kicked the field off.

Erik Brynjolfsson and colleagues looked at 5,179 customer support agents at a Fortune 500 software firm as they rolled out a generative-AI assistant. Agents with the tool resolved 14% more issues per hour on average. Customers were happier, agents quit less, and resolution quality held steady. Brynjolfsson's own reaction is the telling part — he'd spent years studying IT rollouts where a 1–2% productivity gain was cause for celebration. Fourteen percent isn't an improvement, it's a regime change.

Then there's the BCG / Harvard "jagged frontier" experiment — 758 management consultants, randomized, given real consulting tasks. For work that sat inside what the AI was good at, consultants using GPT-4 finished 25% faster, completed 12% more tasks, and produced noticeably higher-quality output. These aren't entry-level temps — BCG admits roughly 1% of applicants — and the AI still moved the needle hard.

+34%
Productivity gain for novice support agents
Brynjolfsson, Li & Raymond · NBER 2023
+14%
Average gain across all support agents
Brynjolfsson, Li & Raymond · NBER 2023
+25%
Speed gain for consultants on in-frontier tasks
Dell'Acqua et al. · BCG–Harvard 2023
+12%
More tasks completed by AI-assisted consultants
Dell'Acqua et al. · BCG–Harvard 2023

So far this is the slide every vendor shows you. The next section has the one they don't.

Measured Productivity Change by Population / sources vary; see bibliography
Novice support agents
+34%
All support agents
+14%
Consultants — speed (in-frontier)
+25%
Consultants — output (in-frontier)
+12%
Consultants — accuracy (outside frontier)
−19% accuracy
Gains skew toward less-experienced workers
Outside AI's competence, results go negative
02

The Productivity
Placebo

In 2025, the research nonprofit METR ran a proper randomized controlled trial — the clinical-drug-trial kind — on 16 experienced open-source developers working through 246 real tasks on codebases they'd lived in for an average of five years.

Going in, the developers predicted AI would make them 24% faster. After they finished, they reported feeling about 20% faster. The actual result:

19% slower
What actually happened when those experienced developers used AI tools — while feeling 20% faster the whole time.

Let that gap sink in, because it's the single most important finding in this entire space. It's not just that AI didn't help these particular developers — it's that they had no idea. The acceleration was a feeling, not a fact. Generate code, watch it appear instantly, feel fast — but the clock doesn't lie.

METR RCT 2025 — Predicted vs. Felt vs. Actual / 16 experienced developers, 246 tasks
Predicted before
+24% faster (expected)
Felt after
+20% faster (perceived)
Actually happened
−19% slower (measured)
43-percentage-point gap between perception and reality

Before anyone screenshots that chart with a "SEE? AI IS USELESS" caption — slow down. The METR result isn't the opposite of the Brynjolfsson result. It's the same lesson from a different angle.

These were senior engineers on codebases they knew cold, holding work to a high standard — exactly the population that gained the least in every other study. Brynjolfsson found the 14% average masked a 34% jump for novices and basically nothing for experts. The BCG consultants only won on tasks inside the AI's frontier; push them outside it and the AI made them 19% less likely to get the right answer. METR's developers spent a big chunk of their "saved" time cleaning up generated code that didn't meet their standards.

AI's value depends entirely on who's holding it and what you point it at. That's not a knock on the tools — that's the definition of a power tool.

— SWITCHCASE STUDIOS · AI PRACTICE
03

Why the Human Still
Earns Their Seat

If AI were actually a replacement, we'd be shipping its output straight to production. We are emphatically not doing that.

71%
Of developers won't merge AI code without manual review
Second Talent · DORA survey data
48%
Of AI-generated code flagged for potential security issues
Second Talent · DORA survey data
1.7×
Relative defect rate of AI vs. human-written code
CodeRabbit · 470-PR analysis, Dec 2025
3.8%
Report enough confidence to ship AI code unreviewed
Qodo · State of AI Code Quality 2025
The Human-Review Reality / developer attitudes toward AI-generated code
Won't merge without review
71%
Flagged for security issues
48%
AI vs. human defect rate (relative)
1.7× higher
Confident shipping unreviewed
3.8%

None of this means the tools are bad. The workflow that actually works has a human in the loop by design — not as a bottleneck, but as the part of the system that supplies judgment, taste, context, and a sense of what "done" means. The AI drafts; you decide. That's the whole game.

This is also why the "just let it run" agentic fantasy is dangerous. An agent can burn through real money grinding on a problem a human would've abandoned after the second try, because the agent doesn't know when it's stuck. Knowing when to stop is judgment. Judgment is still ours.

04

Centaurs, Cyborgs &
How to Actually Use It

The BCG researchers put labels on people who use AI well, and they've stuck. Centaurs split the work cleanly — human does the parts humans are good at, AI handles the rest, with a clear boundary. Cyborgs blend continuously, handing micro-tasks back and forth in a tight loop. Both beat the people who either ignored the AI or trusted it blindly. The consistent losers were the ones who outsourced their judgment.

Four Things the Research Actually Tells You to Do

🎯

Point it at the right work

AI shines on boilerplate, first drafts, unfamiliar territory, and high-volume repetitive tasks. It struggles on deep, context-heavy work in domains you already know cold. Know which task you're on before you reach for the tool.

📏

Stay calibrated

The feeling of being faster is not evidence of being faster. If it matters, measure it. The most dangerous person on your team is the one who's certain the AI doubled their output and has never once checked.

🔁

Keep the review loop tight

The gains live downstream of human verification. Skip the review and you're not saving time — you're just moving the cost to whoever debugs it in production at 2 a.m. (Probably also you.)

📈

Use it to level up, not to coast

The Brynjolfsson study's most striking finding: AI worked like a knowledge equalizer — it handed novices the patterns of the best workers and pulled them up the experience curve fast. Used that way, it's a teacher. Used as a crutch, it's a way to never learn anything.

The unglamorous conclusion

AI isn't coming for your job. A person who's genuinely good with AI might be — and the difference between those two sentences is the entire point.

Accelerator, not replacement. Power tool, not new hire. Boring framing — but it's the one the data keeps backing up.

The productivity gains are real. So is the productivity placebo. So is the 1.7× defect rate on unreviewed AI code. What unifies all of it is that the human is still the critical variable — in skill, in judgment, and in the ability to know which tool to reach for, when, and on what.

Bibliography

  1. Brynjolfsson, E., Li, D., & Raymond, L. Generative AI at Work. NBER Working Paper No. 31161, 2023. Available: nber.org/papers/w31161
  2. METR. Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity. July 2025. Available: metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/
  3. Dell'Acqua, F., McFowland, E., Mollick, E., et al. Navigating the Jagged Technological Frontier (BCG / Harvard Business School). Organization Science, 2026. Available: pubsonline.informs.org/doi/10.1287/orsc.2025.21838
  4. CodeRabbit. State of AI vs Human Code Generation — 470-PR analysis. December 2025.
  5. Qodo. State of AI Code Quality 2025. Available: qodo.ai/reports/state-of-ai-code-quality/
  6. Second Talent. AI Coding Assistant Statistics & Trends, 2025 (DORA survey data).