The Speed Drug: What a 19th-Century Economist Tells Us About Throttling AI's Most Productive Users
Mar 11, 2026 · 10 min read · Harsha Cheruku
There’s a moment every power user of an LLM knows.
You’re in the middle of something — a document, a codebase, a strategy deck. The thinking is flowing. You send a prompt. The cursor blinks. You wait. Three seconds. Five. Ten.
And something breaks.
Not the tool. Something in you.
A flicker of irritation. An urge to open another tab. A slight, involuntary recalibration of how much you actually valued what you were doing.
That moment isn’t impatience. It’s the precise instant you realize a tool has restructured your cognition around its availability — and is now making you feel the terms.
This is worth examining carefully, because the pattern behind it is 160 years old. And understanding it tells you something important about where AI productivity is actually headed — including for the people who have it fastest.
Part 1: An Old Pattern, Cleanly Stated
In 1865, the British economist William Stanley Jevons was studying steam engine efficiency. Engineers had made real improvements — each engine now produced more work per unit of coal. The natural expectation was that coal consumption would stabilize, maybe decline.
The opposite happened. Coal consumption doubled.
Jevons’ observation: when you make a productive resource more efficient, you don’t use the same amount more carefully. You expand what you try to do. Efficiency creates demand. Cheap and easy means more, not less.
The pattern has held across 160 years and every major productive technology:
- Cars got more fuel-efficient. People drove more miles, not fewer. Total fuel consumption rose.
- Broadband speeds increased tenfold. Data consumption didn’t plateau — it exploded. Video, streaming, always-on applications filled every new bit of headroom.
- Smartphones dropped from $600 to free-with-contract. Screen time went from minutes to hours per day.
Efficiency creates consumption, not conservation. You don’t use a faster, cheaper tool the same way. You use it for things you wouldn’t have attempted before.
AI is not going to be the exception. Power users running 50, 100, several hundred prompts a day haven’t used the same effort more efficiently. They’ve expanded what they try to do — strategies that used to take a week now take a day, codebases that needed a team now need one person with a good prompting habit.
The question isn’t whether Jevons applies to AI. It’s what happens when you run the engine backward. And what happens when you run it forward too fast.
Part 2: Jevons in the Time Dimension
The efficiency principle operates on cost and price. But it also operates on speed — and the speed version has an additional property that makes it harder to undo.
Every technology that dramatically raised the speed baseline didn’t just increase usage. It moved the patience threshold permanently upward.
In 1844, the first commercial telegraph line opened between Washington and Baltimore. Same-day communication replaced two-week letters. Within a decade, American business had reorganized itself around telegrams. Then, in the 1870s, demand began outstripping cable capacity. Delays crept back in. The merchants who had rebuilt their operations around instant communication were furious. They hadn’t gone back to tolerating two-week letters. They had gone forward to tolerating nothing.
The telephone repeated the pattern. In the 1880s, talking across a city was a marvel. By 1920, “hold” — a few seconds of silence — was a professional offense.
Then the internet sharpened this to a measurable point. In 1993, Jakob Nielsen codified the “10-second rule” for web interfaces. That was considered ambitious at the time. By 2009, Google’s internal research found that 400 milliseconds of added latency — invisible to conscious perception — reduced daily search queries by 8 million. Amazon found that 100 milliseconds of additional load time correlated with approximately 1% revenue loss.
That’s the speed ratchet: every generation of technology moves the threshold, and thresholds, once moved, do not move back.
The LLM ratchet is already in motion. Users who have calibrated their thinking pace to instant response are not going to recalibrate downward. Jevons’ principle guarantees it — they’ve expanded what they try to do to fill the available speed.
Part 3: The Part Where This Gets Different
Every prior application of these patterns operated on communication or logistics. Faster packages. Faster messages. Faster retrieval of information already in your head.
LLMs are the first technology where both principles — the efficiency expansion and the speed ratchet — operate on thinking itself.
When a search page loads slowly, you’re waiting for information. When an LLM responds slowly, you’re waiting for the next step in your own cognitive process. The latency isn’t between you and a tool. It’s inside the thinking.
One person using AI seriously today is doing the analytical work that used to require a team of three. One engineer is producing code at the pace of four. The Jevons expansion isn’t happening at the level of convenience. It’s happening at the level of cognitive output.
This changes what happens when you restrict supply — and, as we’ll see, what happens even when you don’t.
Part 4: Running the Engine Backward
So: what actually happens when cost pressure, infrastructure limits, or pricing policy forces the tap to slow?
The intuitive answer is proportional reduction. Less supply → less usage. People adjust.
That’s not what happens with inelastic cognitive dependencies.
The people who have most deeply integrated the tool into their thinking are the ones for whom it’s most valuable — and they will pay, route around, or optimize aggressively to maintain their access level. Gloria Mark’s research at UC Irvine found that once a task is interrupted, resuming deep cognitive work takes an average of 23 minutes. An LLM that responds in 2 seconds keeps you in flow. One that takes 15 seconds kicks you out of it entirely. The quality of the work degrades — not just the pace — because the interruption tax is fixed regardless of what the work was worth.
What throttling actually produces is not reduced consumption. It’s stratification.
This is Jevons in reverse. The forward direction: efficiency → everyone expands usage. The reverse direction: restriction → the highest-value users find workarounds and consume at maximum budget; lower tiers fall behind. Demand doesn’t compress proportionally. It bifurcates.
The enterprise license holder gets the fast model. The free tier gets throttled to something slower and less capable. The gap isn’t just a speed difference — it’s an output quality difference. Not because one person is smarter or works harder. Because they have access to faster loops.
Broadband is the clearest prior example. In the early 2000s, high-speed internet access was strongly correlated with income. Rural and lower-income households stayed on dial-up years after urban professionals moved to cable. The productivity gap between the two groups is hard to measure precisely, but the structural advantage was real and it compounded.
Token throttling, priced by tier, risks reproducing that dynamic — but at the level of cognitive output rather than information access. Not what you can find. What you can think through, at speed. That’s a different and harder inequality than anything the internet produced, because it operates invisibly, inside the quality of decisions made and strategies formed.
Part 5: The Question Jevons Didn’t Have to Ask
Jevons was worried about coal running out. That was the right worry in 1865.
We are not running out of AI. But the stratification problem in Part 4 is only half the picture — and it only affects people without fast access. There’s a second problem, less discussed, that affects the people who have it fastest.
Think about the donut machine analogy. If a machine produces donuts at 100x the rate you can eat them, it doesn’t make you better fed. It makes you anxious about the backlog. You start rushing through each one without tasting it. You grab the next before finishing the last. You are technically consuming more and meaningfully nourishing less.
The same dynamic applies to AI output. Even the power user with the fastest access hits a consumption ceiling — because human attention is still the scarce resource even when generation is abundant. The bottleneck doesn’t disappear. It relocates from generation to absorption.
And the super user’s response to this isn’t to slow down. It’s to generate more, faster, and process each thing more shallowly. You start skimming your own outputs. You produce the next thing before fully synthesizing the last. You move through ideas instead of actually thinking them. The tool is fast enough to outrun the reflection it was supposed to support.
This is the mechanism behind what might be the most important sentence in this whole piece: there is a version of LLM dependency that produces the output of many people but the wisdom of none. It doesn’t happen despite being a power user. It happens because of it.
Speed, applied to logistics, is almost unambiguously good. Faster packages, faster search — delivery mechanisms for things that already exist.
Speed applied to thinking is different. Thinking has necessary friction: uncertainty, revision, the discomfort of not-yet-knowing. A certain amount of slowness isn’t a bug in cognition. It’s where synthesis forms. When you shorten the feedback loop between question and answer dramatically enough, you change which questions get asked. Fast, conversational, iterative questions flourish. The slow, structural ones that require sitting with ambiguity get quietly deprioritized — not because they’re less important, but because they don’t fit the interaction model.
This isn’t an argument for throttling. It’s an argument for being deliberate about what we route through the fast loop — and what we’re choosing, consciously or not, to skip.
The cursor is still blinking.
You know what’s happening now: it’s not just impatience. It’s the ratchet. It’s the dependency the tool installed quietly, that you only noticed when the response was late. Jevons running in real time, in your browser, in your thinking.
Whether the tap speeds up or slows down from here, the principle holds: you’ll use more, not less. You’ll expand to fill the available speed. But expanding to fill speed and actually absorbing what you produce are two different things — and the gap between them is where the wisdom goes.
The telegram merchants didn’t choose to become impatient with letters. The expectation arrived with the technology and was only visible in retrospect.
We have more notice than they did.
Data references: William Jevons, “The Coal Question” (1865); Jakob Nielsen, response time limits (1993); Google “Speed Matters” internal study (2009); Greg Linden, “Make Data Useful” (Amazon, 2006); Gloria Mark, “The Cost of Interrupted Work” (UC Irvine, 2008); Cal Newport, “Deep Work” (2016); Csikszentmihalyi, “Flow: The Psychology of Optimal Experience” (1990).
Enjoyed this?
Get posts like this in your inbox. No spam, unsubscribe anytime.
No comments yet. Be the first to share your thoughts!