marginalium
AI Dark Output. I vacillate about how useful AI is. It’s got classic Malcolm Gladwell Shit/Karstic energy. It feels productive, for many people using it. But there are really very little signs of this productivity in productivity measures. I’ve had two big arguments about this, and both times I’ve come out thinking the opposite thing at the end. Here are some of my other posts on this:
- The productivity case for my skepticism: AI isn’t changing anything yet and the slow gains despite apparent capability bursts and the uneven adoption across sectors.
- The case for the likelihood that we’ll overestimate the benefits of AI, like we always overestimate tech benefits in the early stages (though, per Amara’s law, we underestimate in the long run).
- It only pays where you find the actual use—not a drop-in replacement (and also here), more procrastination than productivity tool.
- One of these is quite clearly discrete tasks which are chained together in a useful-for-AI shape.
Anyway. This paper is actually not really worth reading in full, because it gets all excited about its own little dark-matter analogy. What it does indicate is that productivity measures don’t necessarily capture areas that AI could be adding value.
This seems likely, but equally, it might be that the difficulty of getting tasks into a useful-for-AI shape is a hard and mostly unsolved problem.
That is, AI might be adding value we’re not measuring, or AI might not be adding value because any gains we perceive are eaten by the fact that we then need to take the output and do other slow and tedious stuff with it—email people, insert into word docs, take into a meeting. If an AI gives you a productivity boost of 50% on something that’s 1% of your job, you only get a 0.5% increase in productivity.
Both of these things would produce a change in productivity that’s difficult to distinguish from human noise.
There are a couple other problems, on the top of my mind. One is a verification problem.
A lot of people talk about AI’s jagged edge—it’s (increasingly less) surprisingly good at some stuff and (increasingly less predictibly) bad at other stuff.
Verification is the same thing but one level up. AI is getting better across the board, but we need to be able to check whether AI is getting better.
For something like mammography, human analysis is expensive. If an AI can triage which mammograms to read, then you get a huge workload cut. Verifying this is cheap—you can easily analyse how well an AI reads a mammogram for oddities. It’s a fairly mechanical thing.
In contrast, if you want to work out how good AI is at solving difficult or intricate coding tasks, you need someone who understands the code as well as if they’d written it themselves in order to verify that. Regression tests don’t capture future-oriented code structure—code designed for new features or version changes. This kind of verification is do-able, but the verification process is really expensive. This is at least part of that infamous study, in which AI use seemed to slow expert coders.
So, if you can’t predict how good an LLM is at a task, you need to factor verification cost into the process. Productivity gains require verification to be cheaper than production. This will be better as we get better at making things AI-shaped—we can verify a bundle of tasks at once, rather than slowing down to collaborate with AI more frequently. But you still have the fixed cost of verification to worry about.
The last thing I’ll mention here, since I’ll probably want to find it again somewhere, is scope creep. AI helps people do tasks they wouldn’t have otherwise done. Backlogs and deferred projects. Vibe-coding helper tools. If AI boosts productivity outside of the stuff that characterise your productivity, then is it a productivity gain, or is it procrastination? Open question.
So. I guess it’s not the just the productivity measures that are making me skeptical.
It occurs to me that my fixation on this is probably a case of my own inability to tolerate incoherence
filed under:
search
Start typing to search content...
My search finds related ideas, not just keywords.