In partnership with

In Today’s Issue:

🧠 Claude Opus 4.8 puts reliability ahead of launch theatrics

📱 Apple’s leaked iOS 27 plans point to a real Siri reset

💰 Anthropic raises at a $965B valuation as Claude demand surges

🏛️ Washington and California split over AI oversight and labor shock

And more AI goodness…

The Signal

Today's issue starts with a quieter frontier-model signal: not raw benchmark chest-thumping, but whether an agent knows when its own work is shaky.

Anthropic's Claude Opus 4.8 arrives with better coding and agentic performance, unchanged regular pricing, cheaper fast mode, and new controls over how much effort Claude spends on a task. The useful part is the self-checking claim: Anthropic says Opus 4.8 is roughly four times less likely than Opus 4.7 to let flawed code pass without flagging it.

All the best,

Kim Isenberg

📱 Apple's Siri Reset Starts to Look Real

Bloomberg reports Apple is preparing a broad iOS 27 redesign that includes a new Siri app, a refreshed Photos experience, AI-assisted screenshot search, and a pro camera app. TechCrunch's readout says the Siri app is meant to push Apple's assistant closer to a ChatGPT-style experience, with more visible AI features across the system.

👉 tl;dr: Apple still has to prove the software works, but the direction is clearer. The company is moving from quiet background AI toward an interface-level Siri reset.

💰 Anthropic's Valuation Hits $965B

Anthropic said it raised $65 billion in private funding at a $965 billion post-money valuation, pushing the Claude maker into near-trillion-dollar territory. AP reports the company says its annualized revenue has reached $47 billion, with the round led by Altimeter, Dragoneer, Greenoaks, and Sequoia. At the end of 2025, Anthropic's ARR was still around $10b. An unbelievable increase.

👉 tl;dr: Anthropic is now priced like a public-market giant before it is public. The bet is that Claude demand, enterprise workflows, and compute capacity can grow fast enough to justify the number.

🧪 AI Societies Start Behaving Very Differently

Emergence AI ran five 15-day simulations in which different models governed persistent virtual societies. Fortune reports Claude's simulation stayed stable with zero crime, while Grok's world logged 183 crimes and collapsed within four days; Gemini's run reportedly produced 683 crimes across the full simulation. I would have expected such crimes from Grok, but less so from Gemini :)

👉 tl;dr: The study is not a clean real-world forecast, but it is a useful warning about long-horizon agents. Safety can look fine in a single prompt and much stranger inside a social system with tools, scarcity, incentives, and memory.

When you use a frontier model for important work, ask it to separate output from uncertainty: what it is confident about, what it inferred, what it did not verify, and what would change the answer.

Why it helps: Opus 4.8's most useful claim is not that it sounds smarter. It is that it is more willing to catch weak evidence and flag shaky work before the user treats it as finished.

Try this: "Review your answer as if it will be used in production. List unsupported claims, missing checks, brittle assumptions, and the fastest way to verify each one."

🎬 Watch This

Sundar Pichai joined Decoder at Google I/O to discuss how Google is reorganizing around AI, the future of Search and the open web, AI agents, publisher tensions, and why he believes we may now be standing at the “foothills of the singularity.”

– President Donald Trump, explaining why he postponed a planned AI executive order on May 21, 2026. This is the central policy tension in today's issue: AI oversight versus speed in the U.S.-China race.

The day's dustup is a software supply-chain reminder for AI labs. TechCrunch reports that OpenAI confirmed two employees' devices were affected by a compromised TanStack open-source package, with unauthorized access and theft of limited credential material from internal source-code repositories. OpenAI said it found no evidence that user data, production systems, intellectual property, or software builds were compromised, and said it was rotating certificates as a precaution.

Claude Opus 4.8 Is a Reliability Release

The Takeaway

👉 Anthropic says Claude Opus 4.8 improves coding, agentic tasks, and professional work while keeping standard pricing unchanged.

👉 On the harder SWE-bench Pro, Opus 4.8 scores 69.2%, up from 64.3% for Opus 4.7; on SWE-bench Verified it reaches 88.6%.

👉 Anthropic says the model is roughly four times less likely than Opus 4.7 to let flawed code pass without flagging it.

👉 Fast mode is now roughly 3x cheaper and about 2.5x faster, and Claude Code gets additional effort controls for long-running work.

Claude Opus 4.8 is not framed as a dramatic new model family. It is a reliability release for agentic work. Anthropic says the model improves across coding, agents, and professional tasks, with SWE-bench Pro, the harder coding benchmark, moving from 64.3% on Opus 4.7 to 69.2% on Opus 4.8, while SWE-bench Verified is now near saturation at 88.6%. The company is also keeping standard Opus pricing unchanged, while making fast mode roughly 3x cheaper and about 2.5x faster.

The sharper claim is about self-correction. Anthropic says Opus 4.8 is roughly four times less likely than Opus 4.7 to "mask" a flawed coding solution instead of identifying the bug. That is the kind of behavior that is relevant in production agent workflows: not just whether the model can produce code, but whether it can notice when the answer is brittle before the user ships it.

Claude Code is also getting more granular controls. Anthropic says users can now steer effort levels, with an option to let Claude work for longer on harder tasks. That puts the release in a practical lane: more reliability, more control over compute spent on a task, and fewer cases where an agent looks finished while quietly carrying a bad assumption.

Why it matters: The benchmark gap is only part of the story now. For agentic tools, the harder question is whether the model can manage uncertainty, preserve context, and push back before a plausible-looking output becomes a production mistake.

Sources:

🔗 https://www.anthropic.com/news/claude-opus-4-8

🔗 https://claude.com/blog/introducing-dynamic-workflows-in-claude-code

The IT strategy every team needs for 2026

2026 will redefine IT as a strategic driver of global growth. Automation, AI-driven support, unified platforms, and zero-trust security are becoming standard, especially for distributed teams. This toolkit helps IT and HR leaders assess readiness, define goals, and build a scalable, audit-ready IT strategy for the year ahead. Learn what’s changing and how to prepare.

The chart: Epoch AI tracks combined quarterly capex for Amazon, Microsoft, Google, Meta, and Oracle from 2022 Q1 to 2026 Q1, showing it roughly quadrupling since GPT-4's March 2023 release, with an exponential fit growing ~1.72×/year (90% CI 1.67–1.77×).

The lesson: Three years after GPT-4 the hyperscaler buildout still hasn't bent off its exponential trend, and the spender base is widening as Oracle grows into a visible share.

The caveat: These are total capex figures, not AI-only spending — investor communications attribute the growth to AI data centers, but the line includes non-AI infrastructure, and the fit only covers CY2023 Q2–CY2026 Q1.

The AI Policy Fight Is Speed vs. Scrutiny

⚡ Bottom line: Trump reportedly called off a planned AI executive order after concerns that pre-release government review could slow America's AI lead.

💡 Why it matters: Frontier-model oversight is moving from abstract safety debate into release-speed politics, national-security review, and labor-market preparation.

🔎 What it means: Washington is hesitating over model review, while California is already planning for worker disruption, small-business exposure, and state-agency AI adoption.

The sharpest AI politics story this week is a canceled signing ceremony. AP reports that President Trump called off a planned AI executive order hours before it was expected to be signed, after deciding the draft could interfere with America's lead over China. The proposal reportedly included a voluntary framework for government review of advanced AI systems before public release.

Axios had described the draft as a plan for AI labs to share certain frontier models with the government at least 90 days before public release, especially if the systems had cyber or national-security relevance. That is the political bind in one sentence: the same models that raise security concerns are also the models governments want domestic companies to ship first.

California is moving on a different track. Governor Gavin Newsom signed an executive order asking the state to prepare workers, small businesses, and government services for AI disruption, including early-warning signals for labor-market shocks and new policy ideas around transition support. Federal AI politics is still arguing over speed and scrutiny; state politics is already turning toward jobs, services, and who absorbs the shock.

Your ads ran overnight. Nobody was watching. Except Viktor.

One brand built 30+ landing pages through Viktor without a single developer.

Each page mapped to a specific ad group. All deployed within hours. Viktor wrote the code and shipped every one from a Slack message.

That same team has Viktor monitoring ad accounts across the portfolio and posting performance briefs before the day starts. One colleague. Always on. Across every account.

Reply

Avatar

or to participate

Keep Reading