In Today’s Issue:
💸 Why capable inference has fallen to a fraction of a cent, and how much of that is architecture rather than subsidy
📊 What four independent evaluations say about how far behind the best Chinese and open models really are
⚖️ The two-market split at the heart of the story: developers versus enterprises
🔓 Why "open weights" is not the same as "China," and what that does to the causation question
🏛️ The fight over who gets to write the rules, and who quietly gets squeezed
A note from us: University students receive our Saturday Deepdive for free when they register with their university email address at: https://getsuperintel.com/plus-whitelist
Dear Readers,
On a routing dashboard this week, a developer chooses between two models for the same job. One is an American flagship that charges thirty dollars to write a million tokens of output. The other, a Chinese model anyone can download for free, charges twenty-eight cents. Not thirty percent less. A hundred and seven times less. A year ago the American labs supplied roughly seventy percent of the tokens moving through that particular marketplace. This June they supplied about thirty.
That collapse has acquired a name that is starting to feel inevitable: the commoditization of intelligence. The claim behind it is that a capable language model, the kind that writes usable code or answers a hard question, is turning into a raw input priced near its cost of production, like electricity or memory, rather than a scarce luxury. Two loud stories have grown up around that claim. One says the frontier labs are already doomed, undercut by Chinese open weights that do nearly the same work for a fraction of the money. The other says a capability gap and a trust barrier still protect the West, and the cheap models are a sideshow. Both are half right, and each turns misleading the moment it is told as the whole truth.
They mislead because they fold two different things into one. On the developer side, capable inference really has become close to a commodity, with identical open weights served by dozens of competing hosts and priced toward the marginal cost of running them. On the enterprise and frontier side, a defended tier persists, where American closed labs still take most of the spending and still hold a measurable lead on the hardest tasks. Sitting on top is a second question the headlines rarely separate from the first: even where prices are genuinely collapsing, is China's strategy of giving its models away the cause, or is China riding an economics of cheap inference that would have arrived anyway? Prices are plainly falling, so the yes-or-no version of the question is already settled. What is worth pulling apart is harder: which intelligence is becoming cheap, for whom, and whether China is driving the collapse or merely riding it.
All the best,

Kim Isenberg

How Cheap Intelligence Got,
and What It Still Cannot Buy
The price collapse is real, and not all of it is subsidy
Start with the number no one disputes. DeepSeek's V4-Pro, the flagship of the Chinese lab that reset the industry's price expectations, officially costs $0.435 per million input tokens and $0.87 per million output tokens, and its smaller V4-Flash costs $0.14 and $0.28 (DeepSeek pricing docs, 07/03/2026). Set against Anthropic's Claude Opus 4.8 at $5 and $25, that is roughly 11.5 times cheaper on input and 28.7 times on output, and against the $10 and $50 Anthropic charges for its top Fable 5 model it is cheaper still (Anthropic pricing, 07/03/2026). Put V4-Flash next to OpenAI's GPT-5.5 at $5 and $30 and the ratio reaches about 36 times on input and 107 times on output (CSIS, 07/02/2026; LMArena price column, 07/01/2026). Tokens, for readers new to the jargon, are the chunks of text a model reads and writes, and the price per million of them is the closest thing the field has to a unit cost.
One caveat rides with those ratios, and the commodity thesis tends to skip it. The order-of-magnitude gap is a DeepSeek phenomenon, not a general property of open models. Zhipu's GLM-5.2, also open-weight and also Chinese, costs $1.40 and $4.40, which makes it only about 3.6 times cheaper than Opus on input and 5.7 on output (CSIS, 07/02/2026). The floor is falling fastest under one aggressive player, not evenly across the open field. Prices are not clean quality comparators either: Anthropic's newer models use a tokenizer that produces roughly 30 percent more tokens for the same text, so the real gap in cost-per-page is a little narrower than the sticker ratio implies (Anthropic pricing, 07/03/2026).

The gap is not in dispute; its cause is. DeepSeek V4-Pro runs about 11.5x/28.7x below Claude Opus 4.8, and V4-Flash about 36x/107x below GPT-5.5. But the order-of-magnitude gap is a DeepSeek phenomenon: GLM-5.2 is only around 3.6x cheaper than Opus. (Source: DeepSeek and Anthropic pricing, 07/03/2026; CSIS, 07/02/2026; LMArena, 07/01/2026)
The live question is whether this is a real cost floor set by engineering or a subsidized loss leader set by policy, and for once both sides have named evidence. The structural case is strong. DeepSeek's V4 technical report describes V4-Pro as a mixture-of-experts model with 1.6 trillion total parameters but only 49 billion active at any moment, meaning it fires roughly 3 percent of itself per token, trained on 33 trillion tokens (DeepSeek V4 tech report via The Register, 04/24/2026). A mixture-of-experts design routes each token to a small subset of specialized sub-networks instead of the whole model, which is much of why serving it is cheap. The report pairs that with a compressed form of sparse attention delivering a million-token context window at 9.5 to 13.7 times less memory than the prior version, plus low-precision FP8 and FP4 arithmetic; notably, it makes no claim that the model was trained on Huawei's Ascend chips, validating them only for serving after an earlier Ascend training run reportedly failed and sent the lab back to Nvidia (The Register, 04/24/2026). Even Dario Amodei, no friend of the Chinese labs, calls DeepSeek's cost an "expected point on an ongoing cost reduction curve" of roughly fourfold a year, the only novelty being that a Chinese firm reached it first (Amodei, 01/2025).
The subsidy case is now concrete rather than speculative. A US government commission found that the provinces of Gansu, Guizhou and Inner Mongolia offer cloud providers up to 50 percent discounts on electricity for AI training and inference, while "Beijing is subsidizing user access to existing models through APIs and the purchase of pre-trained model licenses outright" (USCC, 03/23/2026). The same report supplies the deflating detail that data centers in those subsidized regions run at only 20 to 30 percent utilization, and that Chinese consumers pay less for software in general, so aggressive pricing is partly a market necessity. The source's angle deserves naming: the commission is a hawkish US body, so its subsidy findings read as directionally credible but motivated. The honest synthesis is that both mechanisms operate at once, architecture lowering the true cost of serving while subsidies push the sticker price lower still, and no public number cleanly separates the two.

Subscribe to Superintel+ to read the rest.
Become a paying subscriber of Superintel+ to get access to this post and other subscriber-only content.
UpgradeA subscription gets you:
- Discord Server Access
- Participate in Giveaways
- Saturday Al research Edition Access

