In partnership with

In Today’s Issue:

💥 Google pressures OpenAI in consumer AI

🛡️ Anthropic explains how it contains agentic products

💸 Uber's AI coding bill gets harder to justify

⬇️ Xiaomi pushes API prices toward the floor

And more AI goodness…

The Signal

Today's issue is about AI research escaping the paper and landing in the product bill. DeepSWE shows that coding agents need harder, cleaner tests before developers can trust leaderboard jumps; DeepSeek's visual-token work shows why efficiency research matters; and Xiaomi's MiMo pricing move shows what happens when inference systems get optimized hard enough to turn model access into a commodity fight.

The important point is not that one paper caused one price cut. It is that the AI race is splitting into two connected tracks: who can make models more capable, and who can make those capabilities cheap enough to use at scale.

All the best,

Kim Isenberg

🔎 Google Is Turning Distribution Into an AI Weapon

The Economist argues that Google is clawing back consumer AI momentum by pushing Gemini and AI Search through products people already use every day. The key story is distribution: Search, Android, Workspace, Chrome, and YouTube give Google a path to make AI feel less like a separate chatbot and more like default internet infrastructure.

👉 tl;dr: OpenAI may still own the cultural center of AI, but Google owns the surfaces where billions of users already live.

🧱 Anthropic Shows the New Safety Problem: Containment

Anthropic's engineering team says the central question for agentic AI is no longer just whether the model behaves well, but what it is physically able to touch.

Their post breaks containment into product-specific architectures for claude.ai, Claude Code, and Claude Cowork, with sandboxes, virtual machines, egress controls, and access boundaries doing more work than endless permission prompts.

👉 tl;dr: As agents get useful enough to deploy, safety becomes infrastructure: limit the blast radius before the model gets a chance to make a high-impact mistake.

💸 Uber's AI Coding Bill Meets the CFO Question

Fortune reports that Uber burned through its 2026 AI coding tools budget in four months, even as executives struggle to link usage to consumer-facing product gains.

COO Andrew Macdonald said the connection between higher Claude Code use and more useful customer features is still hard to draw.

👉 tl;dr: Cheaper tokens do not automatically mean cheaper AI work. Agentic workflows can consume so much more compute that the ROI question comes back sharper, not softer.

When a model vendor announces a big price cut, ask your assistant to separate model capability, serving efficiency, and pricing strategy before you react.

Why it helps: A cheaper API can come from better architecture, better caching, lower margins, a strategic land grab, or a mix of all four. Treating every price drop as a pure research breakthrough will make you miss the business incentives.

Try this: Paste a pricing announcement and ask: "List the technical changes that could reduce serving cost, the business reasons to lower price anyway, and the evidence I would need to tell those apart."

🎬 Watch This

Two Minute Papers' DeepSeek’s New AI Is A Game Changer looks at visual-token efficiency: the idea that models can reason over images more cheaply when they use structured visual references instead of spending huge token budgets describing everything in words. The bigger lesson for today’s issue is simple: research that reduces token waste does not stay abstract for long; it eventually shows up as faster inference, cheaper products, or pressure on everyone else’s pricing.

– Xiaomi MiMo API Platform, MiMo-V2.5 price adjustment announcement

DeepSWE Shows Why Coding Agents Need Harder Tests

The Takeaway

👉 DeepSWE is a new long-horizon software engineering benchmark built from original tasks, not copied PRs or existing commits.

👉 Its 113 tasks span 91 active open-source repositories across TypeScript, Go, Python, JavaScript, and Rust.

👉 The tasks are shorter to describe but much larger to solve: reference solutions average 668 lines of code versus 120 for SWE-Bench Pro.

👉 The benchmark also audits grading quality, reporting far lower verifier error rates than SWE-Bench Pro in its sample.

The uncomfortable truth about coding agents is that the easy benchmarks are starting to lie by omission. DeepSWE was built to test the kind of engineering work developers actually hand to agents: short natural-language requests, unfamiliar repositories, multi-file changes, and behavior that has to work rather than merely match a known patch.

The benchmark's design attacks two weak spots at once. First, its tasks are original, which reduces contamination risk from models memorizing public fixes. Second, its verifiers are written to check behavior, not one exact implementation path, so agents get credit for solving the problem rather than guessing the benchmark author's code shape.

DeepSWE's results also show how wide the gap can be once the task becomes more realistic. On its public leaderboard, top configurations separate much more sharply than they do on existing SWE benchmarks; models that look nearly tied elsewhere can diverge badly when the job requires exploration, integration, and sustained implementation work.

That is the research story underneath the product hype. Coding agents are getting real enough that companies are willing to spend serious money on them, but the measurement layer has to catch up. If the benchmark is too narrow, the market buys a score. If the benchmark is closer to work, the market buys capability.

Why it matters: The next coding-agent race will not be won only by the model with the best demo. It will be won by systems that can reliably solve messy, multi-step engineering tasks without hiding behind contaminated benchmarks or brittle graders.

Sources:

🔗 https://deepswe.datacurve.ai/blog

🔗 https://datacurve.ai/research

The IT strategy every team needs for 2026

2026 will redefine IT as a strategic driver of global growth. Automation, AI-driven support, unified platforms, and zero-trust security are becoming standard, especially for distributed teams. This toolkit helps IT and HR leaders assess readiness, define goals, and build a scalable, audit-ready IT strategy for the year ahead. Learn what’s changing and how to prepare.

Today’s benchmark is not model performance,
but model economics.

The chart: Xiaomi permanently cut MiMo-V2.5 API pricing by up to 99%, with MiMo-V2.5 now at $0.14 input / $0.28 output per 1M tokens and MiMo-V2.5-Pro at $0.435 input / $0.87 output. Cache-hit input is now almost free.

The lesson: The next AI race is not only about who has the smartest model. It is also about who can make inference cheap enough for developers to actually build, test, and scale products at massive volume.

The caveat: Low prices are not the same as frontier performance. The real question is whether MiMo can match Western and Chinese competitors on reliability, latency, tool use, coding, multilingual quality, and ecosystem support. But economically, this is a very clear proof: AI model access is becoming aggressively commoditized.

Why AI APIs Are Suddenly Getting So Cheap

⚡ Bottom line: The Wednesday research story is not just better models. It is better methods for spending fewer tokens, moving less cache data, and serving more intelligence per dollar.

💡 Why it matters: When inference gets cheaper, AI stops being a premium feature and starts becoming infrastructure that developers can call constantly.

🔎 What it means: The price cuts are not proof that every model is frontier-grade. They are proof that the cost floor is falling fast, and that research papers, systems engineering, and pricing strategy are now tangled together.

AI pricing is falling for a simple reason with complicated machinery underneath: providers are learning how to waste less compute. Xiaomi says its MiMo-V2.5 price cut is backed by inference-system work, including SWA based on SGLang HiCache, less KV-cache data movement across GPU memory, CPU memory, and SSD, higher cacheable-token capacity, and better expert parallelism and input bucketing.

DeepSeek is pushing the same broad direction from the research side. Its API docs now list DeepSeek-V4-Flash at $0.14 per 1M cache-miss input tokens and $0.28 per 1M output tokens, while V4-Pro pricing is set to remain at one quarter of its original listed rate after the promotion window. Separately, DeepSeek-OCR explores a more fundamental trick: use visual representations as a compressed medium for long text, reaching roughly 9-12x text compression with high OCR precision in the paper's reported setup.

The important caution is that prices are not a clean benchmark. A low price can mean better inference, a more efficient architecture, aggressive competition, or a provider choosing to buy market share. But the direction is hard to miss: model access is being pushed toward utility economics, where the winner is not only the smartest model but the one cheap enough to call millions of times. And China keeps pushing.

$20.8B in Redemption Requests. Percent Was Issuing Deals and Paying on Schedule.

Those requests came from non-traded BDC investors in Q1 2026, and most got back roughly half of what they asked for. Moody's U.S. BDC sector outlook: Negative.

On Percent's marketplace that same quarter: new issuances, scheduled payments, 0.44% lifetime net loss rate on asset-based deals since inception.† The difference is structural: concentrated corporate loans with redemption windows that close at manager discretion vs. asset-based finance with 6–24 month deal terms. 14.6% net ABS returns LTM after losses (3/31/26).† Starting at $500.

Alternative investments are speculative. No assurance can be given that investors will receive a return of their capital. †Past performance is not indicative of future results. Terms apply.

Reply

Avatar

or to participate

Keep Reading