AI SLOPCAST #9: Lehrwerkstatt and Agent's Dead Silence

I've been writing about cybersecurity for weeks now, and I've had enough of it. Let's talk about something else — the social mechanics of working with AI agents, which turn out to be more interesting than the security stuff anyway.

On May eleventh, Tobi Lütke, the chief executive of Shopify, posted a piece on his blog called "Learning on the Shop Floor."

The post is about Shopify's in-house coding agent, which goes by the name River. It is, in most respects, a perfectly unremarkable agent. Lütke doesn't tell us which model is inside — Claude, ChatGPT, some chimera of the two — and the question turns out not to matter. What matters is a single peculiar fact.

River will not work in direct messages.

Slide into its DMs for help with some code, and you get a polite brush-off: sorry, I don't do private; please open a public channel and we'll continue there. Thanks for understanding.

The thing is striking for two reasons.

Your first instinct is that it has to be a bug, or some forgotten configuration toggle. It isn't. It's deliberate, and Lütke says so in so many words. Private conversations with the agent are forbidden at Shopify, on purpose.

The second reason is what's actually going on inside Shopify's Slack. This is the largest tech company in Canada — a serious operation by any measure — and it has gone all in on radical transparency. Every interaction any engineer has with the AI is visible to colleagues. Not tucked away in some security audit log, but right there in the channel, alongside the office banter. Lütke himself works with River in a channel called #tobi_river, which has more than a hundred people in it, all of whom can sit and watch their CEO ask an AI to write some code.

There are three things I want to get into here. Why this strange ban on private messages actually works, and works well. What gets quietly omitted from this very tidy corporate parable — because something always gets quietly omitted from a very tidy corporate parable, usually involving some flavor of corporate serfdom and human chow. And what the whole story means for small teams and open-source projects, a question Lütke conspicuously declines to ask.

Let's go.

Lehrwerkstatt

Lütke pegs his idea to a German word — Lehrwerkstatt, pronounced roughly LEHR-vehrk-shtaht, the stress on the first syllable. It translates, literally, as "teaching workshop": a workshop floor that is at the same time producing and instructing. An apprentice doesn't sit in a classroom; he stands next to a master, watches him work, and absorbs the trade by osmosis.

This, incidentally, is one of the things that has long distinguished the old German engineering culture from the American one, with its lecture halls and its MBAs. The Germans had, and still have, a deep conviction that knowledge transmits through presence — that you learn by standing beside someone who knows how, and watching him do it.

Lütke, who is himself German by way of Canada, takes this cultural inheritance and bolts it onto the practice of working with an AI agent.

The logic is simple. If every interaction with River takes place in a public channel, the company is continuously teaching itself. I ask the agent a question, and twenty people see how I asked it. I write a bad prompt, and they all see exactly what made it bad. I land on something that works, and anyone can copy it. River fumbles my task, and someone more experienced strolls into the thread to show me how to reframe.

Every exchange becomes course material for everyone else. No internal training program required, no documentation, no formal onboarding. Watching is enough.

I do livestreams on YouTube, and I find I have some sympathy for this idea. I don't stream because the streams are some Library of Alexandria of accumulated wisdom — they aren't — and I don't make a dime from them. But there's a real thing in just demonstrating, by example, what I'm doing with neural networks all day: what worked, what didn't, what was a waste of an afternoon. Anyone can drop in and watch.

Back to Lütke. It's a pretty story. And — unusually, for pretty stories — there are numbers behind it. The ZenML LLM-Ops write-up has the details. In a single month, 5,938 Shopify employees interacted with River, across 4,450 channels. River opens roughly one out of every eight pull requests merged in the company's main monorepo.

But the figure that really matters is this one. Over two months, the share of River's pull requests that actually get merged climbed from thirty-six percent to seventy-seven. That is, more than twice as much of what the agent produces is now considered good enough to ship. No model update. No fine-tuning. No new prompt.

What changed? The company learned how to use the thing.

This, I think, is the actual lesson of the story. The largest performance gain from a coding agent may have nothing to do with miraculous progress in machine learning, and everything to do with how the work around the agent is organized — and how much the team is genuinely using it.

We are accustomed, in this industry, to thinking about it the other way around. To get more out of an agent, you swap in a bigger model, or polish the prompt, or wire up more tools. All of these interventions happen inside the agent. What Lütke is proposing is that you leave the agent more or less alone and reshape the environment around it.

This is, when you sit with it, a genuinely counterintuitive move. AI performance is a function of both the model and the organization that wields it. And the organizational half can dominate the technical one. The tech is not the end of the story. The tech is the beginning.

Why the Ban Works: Three Mechanisms

Let's pry apart what's actually happening when you forbid an agent from working in DMs. The effect isn't single. There are three of them, operating on different time horizons.

The first is incidental learning — education by drift. Short horizon. I happen to scroll past a channel where a colleague is doing something with River. I don't have to stop. I can keep scrolling. But if the task looks like mine, I linger and study what's going on. We learn to write prompts by reading other people's mail. A kind of sanctioned workplace voyeurism. It is an extraordinarily cheap form of education. No courses to enroll in, no docs to crawl through, no real effort at all. You just live in the shared room, and the room teaches you.

The second is the diffusion of know-how. Medium horizon. When somebody hits on a good technique — a prompt that consistently works, a clever way to chop a task into pieces, some tool worth handing the agent — that technique is right there in the open. From there, it travels like a meme. A few weeks later, the company has informal conventions for working with the agent that nobody mandated from above. They coalesced organically, the way folklore does.

The third is collective debugging. Long horizon. When enough people watch River systematically blow it on certain kinds of tasks, the information accumulates. Somebody documents the parts of the codebase River doesn't understand. Somebody else adds an instruction in MCP that compensates for the gap. A third person writes the agent a custom tool that papers over a recurring failure. What you end up with is crowdsourced collective improvement of the agent, without ever retraining the model. Instead of retraining the model, you retrain the society and the tooling around it.

The third mechanism is almost certainly what accounts for that climbing merge rate. The agent did not get smarter. The company has just accumulated a corpus of small fixes that, together, gradually fill in its weak spots.

That kind of accumulation can only happen if the interactions are visible to everyone. If every engineer is working from their own private Cursor on their own laptop, every engineer is fighting the same fights alone, and the shared corpus of fixes never coalesces. Everybody reinvents the wheel.

Which, by the way, is why so many companies that have deployed private coding agents see nothing like Shopify's gains. They invest in the model. They don't invest in making the practice of using the model legible to anyone else. And without that legibility, the collective learning never catches fire.

The Parallel with Pair Programming

In his post, Lütke draws one parallel — to Midjourney, whose early users worked primarily in public Discord channels, producing the same kind of teaching effect. The parallel is fine, but it's a weak one.

A much sharper parallel sits right underneath everything Lütke is describing: the culture of pair programming and code review in engineering teams. The culture that emerged in the seventies and never quite went mainstream.

Since the seventies, programming has had a cluster of practices animated by a single conviction: code comes out better when it's written in front of other people. Pair programming was a pillar of Extreme Programming. Code review has been a baseline expectation in any halfway serious team for twenty years. Open source as a way of working is, at bottom, the proposition that your code is visible to the entire planet. These ideas were all in the air half a century ago.

And they all run on the same principle as Lütke's Lehrwerkstatt. The visibility of work is, in itself, an engineering tool.

The objection to pair programming and code review has always been the same. It's slower. A programmer writes more slowly with a colleague at his shoulder. Code review takes time, and a second person. These practices don't scale gracefully, which is why most teams have either abandoned them or kept them on as a formality.

The agent changes one thing about all of this, and it changes it completely. The agent doesn't mind being watched.

It has no ego. It has no stage fright. It is not embarrassed by a hundred people watching it solve a problem. And, crucially, it does not slow down under observation.

What this amounts to is that visibility, as an engineering practice, has just become cheap, in a domain where it was once expensive.

That is the real reversal in the story. Visibility used to come at a cost, and you applied it selectively. With an agent, it costs nothing, and you can apply it by default. Lehrwerkstatt as a mass practice has become possible for the first time, precisely because the principal worker on the shop floor is no longer a human being and no longer suffers from being watched.

This is a class of organizational arrangement that is available only now, in the age of agents. No company could have operated this way in 2020. Not for lack of imagination, but because the tools didn't allow it. Now they do, and Shopify is out ahead of everyone else.

A Touch of Corporate Cringe

Now, a dose of skepticism. Lütke is a gifted storyteller, and the story has the unmistakable cadence of a Steve Jobs production, minus the funky turtleneck and the beat-up sneakers. And any time a story is built to that template, it's worth asking what isn't being said.

To begin with: the numbers are, almost certainly, a curated selection. They are all positive. The merge rate is up. The channel count is up. The user count is up. And the bug count? Production incidents in code River wrote? Technical debt accruing in the corners? The load on whoever has to review River's pull requests — because if River is opening one out of every eight, somebody is reading them, and I'll bet my last dollar that somebody is the senior engineers, whose time is not free?

All of these negative metrics exist inside Shopify. They have production monitoring. They have incident reports. They have internal surveys. None of it is in the post. Only the things that fit the story made it in.

This is normal corporate behavior. Shopify is not exceptional here; the corporate fable is its own well-developed literary genre. But the entire River narrative rests on this curated set of metrics, and not a single person in the broader discussion is asking the obvious question: what got worse over those two months? Not in PR terms — overall.

Second. The ban on private chats is, yes, textbook Lehrwerkstatt. But beyond the teaching-workshop frame, there is something else going on that Lütke tactfully neglects to mention.

When I'm working with an agent, I think out loud. I say what I don't understand. I expose the gaps in my knowledge. I make false assumptions. I make mistakes. This is a deeply intimate part of engineering work. With "your private Cursor and Claude on your laptop," that intimacy is yours. With "River in a public channel," it is broadcast to the entire company — and archived for years.

For an HR department, this is a treasure trove. You can track who gets stuck on easy problems. Who leans on the agent too hard. Who never touches it at all, and why. Who has what gaps. In sheer resolution, this kind of data leaves any 360, any 5+, any other instrument the corporate handlers have come up with in the dust. It is the platonic ideal of corporate serfdom.

Lütke does not say any of this out loud. It's possible that, inside Shopify, none of it is being used for performance evaluation — officially. But the capability is wired into the architecture, and the precedent has been set. Companies that copy River's design will inherit, along with the metric gains, the most fine-grained employee-surveillance system anyone has ever built.

Lütke is not eager to put his name to the proposition that River is the most exquisite instrument of employee monitoring Shopify, or arguably anyone, has ever deployed, and that it has been rather elegantly disguised as a teaching workshop.

And third — the part that, to me, is the most unsettling.

River is hitting seventy-seven percent on its merge rate. That is, roughly, the bar for a junior engineer at a large company. Lütke is, in parallel, saying out loud that Shopify intends to hold its headcount at seven and a half to eight thousand people for the next five years. Which is to say: no new juniors. The people already inside will keep working.

Do the math. Lehrwerkstatt works wonderfully for engineers who are already Shopify employees. They learn through observation, and their productivity climbs. But where, exactly, is the next cohort of senior engineers going to come from, if the junior positions have evaporated? Somebody has to walk the road from junior to senior. People enter a company by being hired — a thin steady stream. Pinch off that stream, and in five years your problem is not a shortage of juniors. Your problem is that you have no one to grow into a senior.

The Lehrwerkstatt architecture is engineered to keep current employees productive as the work shifts wholesale to AI agents. It has nothing to say about where the industry's next generation of engineers is supposed to come from. Shopify's management has nothing to say about it either.

Lütke partly dances around the problem by gesturing at new roles emerging in the company — the rise of the "context engineer," a person who manages agents rather than writing code directly. The best context engineers, he says, are former team leads — people with experience "prompting" actual humans. This is a convenient story for the current seniors and managers. For a junior, that career path is effectively shut. And new hires will never accumulate the requisite experience, for the simple reason that juniors aren't being hired anymore.

You end up in an organizational cul-de-sac. The new context-engineer role is available only to people who already have the experience. And no new experience is being produced, because juniors no longer have anywhere to grow.

I run into the same effect, on a smaller scale, all the time. I spend a lot of energy telling people how to work with AI, how to work with agents, and they simply can't apply the advice. They have never delegated a development task to another human being. They have no managerial reflexes. To them, my instructions read as empty noise. So when you find yourself eyeing some hustler's course called "Become an AI Engineer in 30 Days," understand that it is, in all likelihood, complete garbage. You do not need more videos. You need real experience.

What This Means for Small Teams

A few words on practice, before I wrap up.

Shopify is seven and a half thousand people. They have Slack, a sprawling monorepo, and a thicket of formal development processes. River, with some effort, fits.

What does any of this mean for a team of ten? Or for an open-source project?

The good news, for the team of ten, is that the Lehrwerkstatt effect will be even stronger. Everybody already knows what everybody else is working on. Banning private chats with the agent, in such a team, is an even more natural decision than it is at Shopify. At Shopify, nobody can read every chat — the volume is infinite. In a team of ten, reading every work chat is not a burden at all.

A simple experiment you could run tomorrow: configure your AI agent so that every action it takes is published into a single shared team channel. No private Cursor sessions. No off-the-books Claude Code. Everything in plain view. I suspect that, for a small team, the effect will be much the same — and you can set it up in an evening.

The one wrinkle, for a small team, is that you are probably running Claude Code on a subscription, not against the API. Corporates don't have to think about this — their plans are essentially priced against API usage anyway. For you, it matters. You can't operate through the API. Which is why I'll soon be publishing a hack of my own that lets you run a Lehrwerkstatt-style setup over Telegram and other messengers using ordinary Claude Code. Watch the YouTube channel and subscribe on Telegram; it'll be there.

For open source, the story gets more interesting still. Open-source projects, by their nature, already operate in public. Every PR, every discussion, every review is visible to anyone who cares to look. Drop agents into an open-source project, and remarkably little changes. The publicness is baked in, and the Lehrwerkstatt effect arrives free of charge.

Which yields a curious prediction. Open-source projects may end up better at adopting agents than commercial companies are. Not because they're smarter. Because public work is a native property of open source, and it delivers the Lehrwerkstatt effect at no cost. Corporations have to manufacture that publicness artificially, with tools like Slack. Open source gets it for nothing.

My bet is that, a year from now, the best practices for working with agents in engineering teams will come not from corporations but from open-source communities. The Linux Foundation. Apache. The Rust Foundation. These organizations have everything they need to wring the most out of public agent work. Assuming, of course, they can summon the political will to begin.

In Closing

So, what have we learned.

One. Tobi Lütke made a small architectural and organizational decision — he forbade River from working in DMs — and got a large and surprising effect. The merge rate climbed from thirty-six percent to seventy-seven over two months, with no model update of any kind. It is a vivid demonstration that, in the age of agents, the right architecture of a company or a community may yield more interesting results than any quantity of ML wizardry.

Two. The ban on DMs works through three mechanisms: incidental learning by observation, the diffusion of techniques inside the company, and collective debugging of the agent's weak spots. The third, it seems, is currently where most of the gains are concentrated.

Three. The story rhymes loudly with the older histories of pair programming and code review, but with one critical difference. Visibility of the work — visibility of the production process itself — used to be expensive, because human beings tire of being watched. With an agent, all of it comes for free. The agent does not get self-conscious. The agent does not get tired. This opens up an entire class of organizational arrangements that simply weren't available before.

Four — the unpleasant part — is the corporate cringe. The numbers in the public version of this story have been carefully curated to fit the KPI. Employee surveillance is wired into the architecture, as a direct consequence of the design, even though nobody is saying so out loud. And the central question the industry still has no answer to: where, exactly, are the next seniors coming from, when the junior positions go away?

Five. There's a window cracked open in here. The real take-up of this idea may happen not in corporations but in open-source communities, because they have publicness as a free property. And we — as the community gathered around this channel, on VK, on YouTube, on Telegram, and elsewhere — will absolutely be making use of that fact.

Lehrwerkstatt and Agent's Dead Silence

Sources

Primary Sources

Quantitative Data

Lütke as a Public Figure — Context

Historical Parallels for the Lehrwerkstatt Discussion