roimeasurementgovernance

What an AI ROI number can and can't tell you

Dana Okafor, Eng Leadership

A few months into a Claude rollout, someone is going to ask you for the ROI number. Probably finance, probably in a planning meeting, probably with a spreadsheet already open. And you will feel the pull to give them something clean: a multiplier, a dollar figure, a slide.

Resist that pull a little. The clean number is usually the dishonest one.

I've sat on both sides of this. I've built the dashboard that says "$2.4M saved this quarter" and I've watched a CFO poke exactly one hole in it and watch the whole thing deflate. So this is the version I wish someone had handed me earlier: what the math actually supports, and what it never will.

Why the easy formula collapses

The standard AI ROI calculation is some flavor of: hours saved per engineer, times headcount, times a loaded hourly rate. It looks rigorous because it has three numbers in it. It isn't, because every one of those numbers is soft.

The multiplier problem. "Claude makes us 30% faster" — where does the 30% come from? Usually a survey, or a vendor benchmark, or a vibe. The honest answer for most teams is that nobody has a clean before-and-after on the same work, because the work isn't repeatable. You shipped feature A with AI and feature B without it, but A and B weren't the same difficulty, the same person, or the same week. The multiplier is doing enormous load-bearing work and it's almost always made up.

The baseline problem. Hours saved against what? You're comparing actual time spent to a hypothetical world where the same engineer did the same task without Claude — a world that doesn't exist and can't be observed. Self-reported estimates are the usual fallback, and people are reliably bad at this. They overestimate time saved on tasks that felt good and forget the twenty minutes they spent re-prompting something into the ground.

The attribution problem. Velocity went up this quarter. Was that Claude? Or the two senior hires, the dependency upgrade that stopped the flaky-test bleeding, or the fact that you finally killed that one cursed service? AI shows up in the same window as a dozen other changes and confidently claims credit for all of them.

Stack three soft numbers and multiply, and the error doesn't average out. It compounds.

Activity is not value

There's a quieter failure mode underneath the math. It's tempting to measure what's easy to count: messages sent, sessions opened, tokens burned, lines suggested. Those are activity metrics, and activity is not the same as value.

An engineer who runs forty Claude sessions a day might be shipping more, or might be stuck in a loop, asking the model to fix something it keeps getting wrong. A busy tool is not a productive one. If your ROI story is really just an engagement chart wearing a dollar sign, the first skeptical question will take it apart, and it should.

What you can actually defend

Here is the reframe that has held up for me in front of finance: stop trying to prove the big number. Build a small, conservative, bounded estimate you can stand behind, and present it as a range with the assumptions written down next to it.

That means grounding the math in things you can actually point to:

This is the one place I'll mention what we build: Mojule grounds the calculation in real configured hourly rates and actual session counts with idle sessions filtered out, rather than handing you an invented multiplier. That doesn't make the number true. It makes it defensible, which is a lower and far more useful bar.

Then present it honestly. Not "$2.4M saved." Something closer to: "Assuming a conservative 8 minutes saved per qualifying session and our standard loaded rate, this quarter sits somewhere between $180K and $340K, and here are the four assumptions that move that range." A CFO can work with that. A CFO cannot work with a magic number that evaporates under one question.

A defensible range beats an impressive point estimate every time. The range survives scrutiny. The point estimate is built to be attacked.

What it will never capture

Be just as clear about the limits, because they're real and pretending otherwise is how you lose credibility.

ROI math can't see capability shifts — a junior engineer now attempting work they'd have escalated before. It can't see the morale difference between fighting boilerplate and getting to the interesting part. It can't measure the work that didn't happen: the incident that didn't fire, the refactor someone finally felt safe doing. And it can't catch second-order effects, like a team moving faster because the person who used to be the bottleneck got their afternoons back.

None of that lands in a spreadsheet. That's fine. The mistake is forcing those benefits into the dollar figure to make it bigger, which is exactly what makes the dollar figure fragile. Let the number be the number, and describe the rest in plain language as the things you believe are happening but aren't claiming to have measured.

Where this leaves you

Measure conservatively. Write your assumptions next to the result so anyone can check your work or argue with it. Treat the output as a directional signal — "this is paying for itself, probably by a healthy margin" — and not as proof.

The goal was never a number that wins the meeting. It's a number that's still standing after the meeting, when someone goes looking for the hole.