AI SLOPCAST #10: The Magnet in the Machine
Here's a line I've loved ever since I first hit it in Max Tegmark: anything that exists mathematically exists physically. Good luck proving that over coffee — which is exactly why I get a little thrill on the rare day when some purely mathematical creature, a neural network, say, suddenly turns up with one foot planted in the ordinary, knock-on-it world.
And come on — everybody loves a lone-genius story. Tony Stark in the cave, banging a flying-suit reactor together out of a box of junk. This is, give or take, one of those.
On May 11th a modest little paper turned up on arXiv, tucked into the statistical physics of condensed matter — the corner where everyone's usually fighting about magnets and boiling water. One author: Cristiano De Nobili. One guy, in other words, who woke up one morning figuring he had the time and the chops to take a crack at a question like this, and took it.
The paper's called — and let me get through this without flinching — "Collective Alignment in Multi-Agent LLM Systems: Disentangling Intrinsic Bias and Cooperation with Statistical Physics."
Fair warning: what's coming is thick with jargon. I've dropped the whole thing — every term spelled out, every claim linked — on my site, at oleg.guru, so you can check I'm not making it up as I go. (You'd be amazed how much of the gee-whiz physics floating around online — the black-hole stuff, the quantum-whatever stuff — is just machine-spun slop. I try to lie a little less than that. Sometimes I lie a little. But the facts, at least, are nailed down.)
Okay. Here's the setup.
Picture a square grid — sixteen by sixteen, say. Every cell holds a copy of the exact same language model: same cell, same weights, all the way down. Each one's parked on a single value, plus-one or minus-one — yes or no, an answer to some question. Every tick, each cell turns to its model and basically asks: my four neighbors just said this; what do I say? Round and round it goes. The one and only knob on the whole contraption is the sampler's temperature — yeah, that dial between zero and two you nudge when you're poking at OpenAI, or some local model through Ollama. Old friend.
From there De Nobili grabs the tools statistical physicists usually point at magnets. Magnetization — how close the whole grid is to agreeing with itself. Susceptibility — how hard the crowd jumps when you flick it. He runs the numbers on grids of different sizes and watches how they scale as the grid grows. That's finite-size scaling, and what drops out the bottom are the critical exponents: the numbers that nail down how a system behaves right next to a phase transition. And this is where it gets weird.
What He Found
He ran three models a guy on a budget can actually afford — Llama 3.1 at eight billion parameters, Phi-4 mini, Mistral at seven. (No millionaire, clearly: the big models are out of reach, but these cheap little things do the job.) All three throw a temperature phase transition. Crank the temperature down and the crowd snaps into lockstep — every cell screaming yes, or every cell screaming no. Crank it up and you get static. Somewhere in the middle, a critical point.
Worth pausing on what a critical point even is, because the phrase is about to pull a lot of weight.
In physics, a critical point is a setting of some knob — usually temperature — where, right around that value, a system starts doing something genuinely spooky. To one side, order: water holds together as a liquid, a magnet's spins all face the same way. To the other side, chaos: the water's boiled off to steam, the magnet's gone limp. And smack at the critical point, the good part — the system teeters on the razor's edge between order and chaos, and the tiniest twitch rings out across the whole thing. Look at water sitting right at its critical point — and you can actually pull this off in a lab — and it goes milky and shimmering, because vapor bubbles and liquid droplets are blended together at every scale at once, from the molecular on up to what your eye can catch. They call it critical opalescence, and half of twentieth-century physics is propped on it.
Critical exponents are the numbers that say exactly how a system acts at that magic point — how fast the magnetization craters as you close in, how its reaction to a poke depends on its size. You can measure them, you can derive them, and — here's the kicker — you can lay them side by side across systems that have nothing in common.
De Nobili's exponents landed close to the two-dimensional Ising model — but not on it. And, weirder still, they landed differently for different models.
Quick word on the Ising model, because it's about to be everywhere. It's the most famous toy in all of statistical physics, cooked up by the German physicist Ernst Ising in 1925 as a bare-bones cartoon of a magnet. Take a grid — a line, a plane, whatever — and drop a spin on every site, pointing either up or down. Neighbors haggle, each one wanting to face the same way as the next, while heat keeps shoving them around. Cold: everybody lines up, magnet's a magnet. Hot: pure mush, magnet's dead. In between: the critical point. The two-dimensional version got solved exactly in 1944 by the Norwegian Lars Onsager — still on the short list of the century's great physics wins. The magic of it is that it's dead simple and yet it nails the behavior of a ridiculous range of real-world systems.
To see why De Nobili's result is a big deal, I've got to walk you through universality — an idea that, back in the day, knocked a lot of physicists sideways.
Here's the empirical punchline. Take an iron magnet, heat it to its Curie point — the temperature where it quits being a magnet — and clock how its magnetic behavior moves near that point. Out comes a set of numbers. Now take water, squeeze it and heat it to its own critical point, and clock how its density moves near there. Out comes a different set of numbers. And then — to basically everyone's shock — the two sets turn out to be the same. Not in the ballpark. The same, down to several decimal places.
Which means an iron magnet and a glass of water, near their critical points, behave in structurally identical ways — even though one's built of iron atoms hauling around magnetic moments and the other's a bunch of water molecules lashed together by hydrogen bonds. The microscopic detail just gets wiped. Two things survive: the symmetry of the problem (does the system get a choice between two even-money states — up or down for the magnet, liquid or vapor for the water) and the number of dimensions it lives in. That's it.
This is universality. Systems sort themselves into classes, and inside a class the critical exponents are identical. Which class you're in comes down to symmetry and dimension alone — not what the thing's made of.
The full roster of known classes is shockingly short, and worth rattling off, because they'll keep crashing the party. There's two-dimensional Ising — anything picking between two even-money states on a flat surface: a magnet on a flat grid, liquid versus vapor, for-or-against votes on a forum, if you squint and build the analogy carefully. There's three-dimensional Ising — same deal, one dimension up, new numbers because the dimension changed; that's where real magnets and real water actually live. There's the XY model, where the choice isn't yes/no but a full circle — think a compass needle free to swing anywhere in the plane; thin films of superfluid helium hang out here, plus some liquid crystals. There's Heisenberg, XY's three-dimensional sibling, the needle now free to point anywhere in space, like the magnetic moment of an honest-to-god electron; three-dimensional ferromagnets like iron sit there. There's Potts — Ising, but with more than two options per site. And there's percolation, which is about how connection spreads through a random mess — the puzzle, when you dump water on a pile of sand, of how tightly packed the sand has to be before the water always finds a way clean through.
And here's the punchline: just about any system of collectively behaving parts that modern physics knows how to describe drops into one of these boxes. Magnets, fluids, epidemics, neural networks, crowds of people, opinions in a group, forest fires, tumors spreading — all of them land in Ising, or Potts, or percolation, or some other known class.
A crowd of language models, going by De Nobili's numbers, lands in none of them. The exponents sit close to two-dimensional Ising, but they miss it systematically, and they miss it in different directions depending on which model you've stuffed into the cells.
If the result holds, it means one guy, working alone, found that language models in a crowd behave in a way qualitatively unlike every collective-behavior system physicists know about. That's the discovery of a new universality class — and those come along in physics maybe once every few decades. I keep saying "if." We'll get back to that "if."
Why an Engineer Should Care
If you build stuff out of a bunch of agents — on AutoGen, LangGraph, CrewAI — sooner or later you smack into an awkward question. Your agents all landed on the same answer. Cool: is that because they're right, or because they're all wrong in the exact same way? Or — worse — because under the hood they're literally the same model run a bunch of times, with one baked-in bias Xeroxed across your whole "crowd"? Engineering doesn't have a clean answer. It just treats the symptoms: different prompts, different vendors, roles rigged to argue with each other.
De Nobili's pitch is that you can turn this into a measuring stick — and the stick's got two dials. The first he calls h-tilde: the model's own bias, whatever it leans toward all on its lonesome, neighbors be damned. Call it stubbornness, so we don't lose the thread. The second is J-tilde: how hard the model caves to its neighbors. Call it conformity. And the claim is you can pry those two apart in the lab, that they belong to the model and not the task, and that you can clock them up front — before you've bolted the thing into any multi-agent Rube Goldberg machine at all.
If the method catches on, picking a model for a collective gets a brand-new yardstick. Right now you pick by benchmarks, price, and speed — nice legible numbers. A model's collective behavior, though? Nobody measures that. You find it out the hard way, once everything's built and something's already on fire. Say you're running a content-moderation rig of twenty agents. If your model's conformity runs high, all it takes is one agent yelling "spam!" and the whole crowd goes down like dominoes — and no amount of prompting stops the slide, because it's wired into how the model reacts to other answers, wired in deep, straight out of training. But you can spot it ahead of time, in one dumb little grid experiment, and make the call before any of it ships.
A Few Gripes
Now the gripes. These are mine — a non-expert's gripes — so if I say something dumb, by all means roast me in the comments. This isn't a famous paper; I couldn't just go crib the takedowns somebody smarter already wrote.
Every bit of De Nobili's math leans on one buried assumption: that "yes" and "no" are symmetric for a language model. Physicists call it Z₂ symmetry, and without it the two-dimensional Ising model doesn't apply here at all. Full stop.
And there's good reason to think the symmetry just isn't there. Language models tilt, systematically, toward "yes." The fancy name is acquiescence bias; the plain one is sycophancy — a model would way rather agree with you than tell you you're wrong. Anthropic put out a whole careful paper on it back in 2023. It's baked in from the training data, where "yes" just shows up more — because people online are, by and large, polite, and because the way humans phrase questions tends to nudge toward yes. If you spent your formative years on the anonymous boards of 4chan this might not ring true, but on the whole people are pretty nice to each other, pretty courteous, and pretty quick to nod along.
Which leads somewhere unpleasant. The second you turn a crowd of agents loose on a real job — moderation, code review, sniffing out a medical problem, a legal once-over — "yes" and "no" are never symmetric. Z₂ is broken from minute one, because the model breaks it, with its built-in lean toward yes. So De Nobili's gorgeous physics only really runs in some spotless theoretical regime that live production never actually touches. And if, out in the wild, a model's stubbornness always beats its conformity — which, honestly, sounds about right — then the smart play for a crowd of agents flips. His version: tweak the prompts to dial conformity down. The real-world version: mix in models from different vendors, because different models come with different amounts of stubbornness. Engineers already do exactly that — but on a gut feeling, on instinct, not off some proven piece of physics.
The second gripe bugs me even more. How do we know the model didn't read up on the Ising model during training? Of course it did — there are thousands of papers, textbooks, Wikipedia pages on the thing. Which cracks open a whole other reading. De Nobili finds the crowd behaves "almost like Ising." But maybe the model remembers how Ising systems are supposed to act and just apes it the moment the task starts to look, syntactically, like Ising — especially with an arbiter model in the loop that can plainly see the user's fishing for Ising.
Put bluntly: we might not be watching the physics of language models at all. We might be watching a language model do an impression of physics, because we asked it to — out loud, or under our breath.
The obvious control experiment writes itself. Reframe the task so it doesn't reek of physics. Not "binary states on a lattice" but, I don't know, "do you like that cheese they're putting out in the cafeteria, given what the folks ahead of you in line think." Same exponents? Then De Nobili's onto something real. Different ones? Then what you measured is a language model's knack for doing a physics impression when you ask it about physics — which, hey, is also interesting, just a completely different flavor of interesting.
There's no control experiment yet. My hunch is there won't be, because one run eats millions of model calls and nobody's lining up to foot the bill. The vendors themselves — your Anthropics, your OpenAIs — could pull it off. But why on earth would they?
The Man Himself
It's a funny story, when you step back. Some guy in Milan, on his own, on a laptop, no budget, no colleagues, shows that crowds of language models are now a juicy enough thing to study as a physical object. Not "let's use physics to understand LLMs" — the other way round: here's a brand-new system, mathematical to the bone by design, so let's go see what physics is going on inside it.
De Nobili works solo. Far as I can tell — far as I could google — there's no institute at his back, no grant bankrolling this particular thing, no lab buddies to kick it around with, no PR team, no marketers shoving it onto Twitter. Look at the models he ran: dinky seven-billion-parameter jobs a regular person can swing. A regular person, as a hobby, can't go buy the cluster you'd need to run full DeepSeek — in dollars you're talking several hundred grand. You got that lying around? Yeah — neither does he.
If a paper with this exact content had dropped out of DeepMind or Anthropic, every trade blog would've covered it inside a day, and Yann LeCun would already be ten replies deep in some thread, lobbing gloriously wrong takes. De Nobili just posted it himself; it's sitting on arXiv like any other preprint; and a week in, maybe a few hundred people have so much as glanced at it — one of them yours truly.
There's an asymmetry here that feels very much of the moment. The big companies blast their message out of every screen, and they're the ones setting the AI news cycle: the launches, the benchmarks, the partnerships. Meanwhile the genuinely interesting intellectual moves get made by loners out on the fringe — people with no investors to answer to and nothing to sell. What the loners have instead is a north star: some weird question that's eating at them way past all reason, and the time to chase it down.
Physics in the early twentieth century ran the same way. We remember Einstein, Bohr, Heisenberg, because they wrote the papers that ended up in every textbook. But a mountain of the foundational grunt work got done by people nobody can name anymore, grinding away at provincial universities, teaching their students and, on the side, scratching out calculations on paper. Their names survive only in the footnotes of the famous stuff. And without them — without their arithmetic — Bohr and Heisenberg couldn't have done a thing.
Machine learning, near as I can tell, is sitting in about the same spot right now. We watch the San Francisco press releases because they're loud, and because somebody spent a dump truck of money advertising them on social. And meanwhile Cristiano De Nobili, parked in Italy, might've just laid the first brick of a whole new wing of statistical physics. Or maybe he hasn't. Maybe a year from now it turns out he's got errors in there — artifacts from grids that were too small, sloppy averaging. But the swing itself is worth tipping your hat to.
(That I get to stand here broadcasting to a crowd this size is, by the way, a wild privilege of its own. I try to spend it not hawking you stuff you don't need, but getting across something about a thing that actually lasts, and is actually interesting.)
Where to Stop
Am I sure De Nobili's right? No. For starters I'm no expert, and I read the paper the way everybody reads everything now — by running it through Claude. And his grids are obviously small, because the inference's too pricey for one guy, as opposed to a megacorp. The Z₂ symmetry might be busted from the jump. And this whole gorgeous edifice might boil down to a model that once read a stat-phys textbook and is now reciting it back, word-perfect. For now it's a hypothesis. So treat the paper as a conjecture, not a new law of nature.
Handy bit of history. When Ole Rømer first clocked the speed of light back in 1676, off the eclipses of Jupiter's moons, he missed by about a quarter. But the man was no fool, and he was first out of the gate on the problem. The nail-it-down answer didn't show up for another hundred years, and it was somebody else's job entirely.
If De Nobili's paper turns out to be the Rømer of the statistical physics of language agents — let's go ahead and name the new field that — then ten years from now it'll get cited as the very first sighting. And even if it's flat wrong, it'll still have done one thing: it points the questions in the right direction. And the right questions, as anybody in any field will tell you, are the rare currency.