Move 37 for Math

David Aronchick

10 Mar 2026 • 9 min read

In March 2016, during the second game of a five-match series in Seoul, Google DeepMind's AlphaGo placed its 37th stone on the board against Lee Sedol, one of the greatest Go players in history. The commentators hesitated. A 9-dan professional imitating the move for the livestream audience fumbled, unsure whether to even place the piece. Fan Hui, the European champion who had previously lost to AlphaGo, captured what everyone was feeling: "When I see this move, for me, it's just a big shock. Normally, humans, we never play this one because it's bad. It's just bad. We don't know why. It's bad!"

AlphaGo's own analysis estimated a 1 in 10,000 chance that a human would make that move. Lee Sedol left the room to think for fifteen minutes. When he came back, the game was already slipping away from him. He later said something that has stayed with me for a decade: "I thought AlphaGo was based on probability calculation and that it was merely a machine. But when I saw this move, I changed my mind. Surely, AlphaGo is creative."

Move 37 became shorthand for a specific kind of AI moment: the machine sees something that humans, constrained by centuries of accumulated expertise and convention, cannot. Not because humans lack intelligence, but because expertise itself creates blind spots. The more deeply you know a domain, the more firmly certain possibilities get classified as "bad" or "irrelevant" or "not worth exploring," and eventually those classifications calcify into walls that nobody even tests anymore.

Last week, that shorthand got a new chapter.

The Preprint Nobody Knew Existed

On March 5, Epoch AI reported that OpenAI's GPT-5.4 Pro had set a new record on FrontierMath, a benchmark of extraordinarily difficult math problems created by over 60 professional mathematicians specifically to resist AI. The problems are original, unpublished, and designed so that expert mathematicians need hours to days to solve them. The hardest tier, Tier 4, contains problems where even specialists in the relevant subfield might need weeks.

GPT-5.4 Pro scored 50% on Tiers 1 through 3, and 38% on Tier 4. Both are records. But the headline number isn't what matters. What matters is how the model solved one particular Tier 4 problem that no AI had ever cracked before.

In Epoch AI's preliminary analysis, GPT-5.4 appeared to have located a preprint from 2011 that the problem's author did not know existed. The preprint contained work that allowed the model to shortcut much of the intended mathematical heavy lifting. The problem had been specifically designed to be unsolvable by current AI. A human mathematician had invested significant effort crafting it. And the model found a path through it by surfacing a piece of forgotten human knowledge that had been sitting in the open literature for fourteen years, unread by the one person who needed it most.

This is worth sitting with for a moment. The problem wasn't solved through brute computational force. It wasn't solved by some alien mathematical insight that no human could understand. It was solved because the model connected two pieces of existing human knowledge that no human had connected. The 2011 preprint existed. The 2026 problem existed. The bridge between them was there for anyone to walk across. Nobody did, because nobody knew to look.

The 13-Page Problem

The same evaluation run produced a second result that is, if anything, more striking. GPT-5.4 solved another Tier 4 problem that no model had previously cracked, this one created by Bartosz Naskręcki, Vice-Dean of the Faculty of Mathematics and Computer Science at Adam Mickiewicz University in Poznań and one of only five European mathematicians invited to contribute to FrontierMath's hardest tier.

Naskręcki had poured fifteen years of research expertise into his problem. The intended solution ran to 13 dense pages of mathematics. He was confident it would stand for years. As recently as mid-2025, he had publicly assessed that AI systems lacked the reasoning depth and creativity of expert mathematicians, that they remained fundamentally sophisticated calculators.

GPT-5.4 solved it cleanly. Naskręcki described the solution as "almost human." And the word he reached for to describe the experience was the one that connects all of this together. He called it his personal "Move 37 or more."

Unlike the preprint problem, this solution didn't rely on finding forgotten literature. FrontierMath problems are specifically designed to be original, with no existing solutions online. The model appears to have reasoned its way through novel mathematics, producing a clean solution to a problem that its creator expected to resist all AI attempts. A mathematician who had spent his career studying elliptic curves and arithmetic geometry watched a machine produce work in his own specialty that he described as elegant.

What Move 37 Actually Means

The temptation when telling stories like this is to frame them as "AI replaces human mathematicians" or "AI is now smarter than people." Both framings miss the point, and they miss it in the same way people missed the point of AlphaGo's Move 37 in 2016.

Move 37 wasn't significant because AlphaGo was smarter than Lee Sedol. It was significant because AlphaGo wasn't constrained by the same priors. Thousands of years of accumulated Go wisdom had classified fifth-line shoulder hits in that position as bad moves. Not through analysis, not through proof, but through the accumulated weight of convention. Generation after generation of players learned what "good" moves looked like, and the space of possibilities narrowed. AlphaGo, trained on millions of games but unconstrained by the social transmission of "this is how we do things," explored a part of the search space that humans had walled off.

The preprint story works the same way. Mathematics has a literature problem. There are roughly 2 million mathematical papers in existence, and the number grows by tens of thousands each year. No mathematician can read even a fraction of what's relevant to their own subfield. The 2011 preprint that unlocked the FrontierMath problem wasn't hidden. It wasn't classified. It was sitting on arXiv, publicly accessible, for fourteen years. The barrier wasn't access. It was attention. Human attention is finite, and the mathematical literature has long since exceeded any individual's capacity to survey it.

GPT-5.4 doesn't have that constraint. It can traverse the literature at a scale and speed that no human can match, and it can make connections between papers that no human would think to put side by side. This is not "artificial intelligence" in the way science fiction imagines it. It is something more prosaic and, in practical terms, more transformative: artificial breadth of attention.

Ernest Ryu, a mathematician at Seoul National University, had a similar experience with GPT-5 when it helped him solve a 40-year-old open problem about Nesterov's Accelerated Gradient method. The model didn't produce the proof. Ryu did. But across dozens of sessions, GPT-5 proposed approaches, suggested techniques from adjacent subfields that Ryu wouldn't have encountered on his own, and explored dead ends at a pace that would have taken him months to exhaust manually. Several of the key steps in the final proof were suggested by the model. Ryu described it as exploring a massive maze with a companion who could reveal new paths instantly.

The Index Is the Infrastructure

There's something easy to miss in the preprint story, and it might be the most important part.

GPT-5.4 found that 2011 paper because arXiv exists. Because decades ago, Paul Ginsparg at Cornell built a system where physicists and mathematicians could deposit their work in a standardized, searchable, globally accessible repository. Before arXiv, preprints circulated through departmental mailing lists, physical mail, and personal networks. A 2011 result in an adjacent subfield would have been, for practical purposes, invisible to someone who wasn't in the right mailing list or the right department.

The discovery, in other words, depended on the data being indexed. Not just stored. Not just "available" in some theoretical sense. Indexed, structured, and findable at global scale. arXiv did the unglamorous upstream work of making mathematical knowledge machine-traversable, and that work is what allowed the connection to happen fourteen years later.

Now think about all the domains where that indexing work has not been done.

75% of enterprise data is generated outside traditional data centers: on factory floors, in hospital systems, at points of sale, inside logistics networks. Clinical trial data sits in thousands of separate hospital systems across dozens of countries, governed by different regulatory regimes and stored in incompatible formats. Manufacturing sensor data is generated at massive volume and discarded within hours because nobody has built the pipeline to make it findable. Supply chain signals, maintenance logs, quality control records, operational telemetry: the equivalent of millions of unread preprints, each potentially holding the bridge to someone else's unsolved problem.

The mathematical preprint server is actually one of the best-case scenarios for knowledge infrastructure. It's open, standardized, centrally indexed, and globally searchable. Most enterprise data is none of those things. It's siloed by department, locked behind access controls, stored in proprietary formats, and governed by compliance regimes that make consolidation difficult or illegal. Data preparation still consumes up to 80% of the time in ML projects, not because data scientists don't know what they're doing, but because the indexing and preparation work hasn't been done up front, and everyone pays the cost of doing it ad hoc, over and over, every time they need to connect two pieces of knowledge.

The lesson from the preprint is not just "AI can find things humans can't." It's that AI can only find things that have been made findable. The 2011 paper was discoverable because arXiv had done the work. Imagine how many equivalent breakthroughs are locked inside enterprise systems, hospital databases, and manufacturing logs that have never been indexed, never been structured, never been made available for a model to traverse. The Move 37 moment only happens when the data is ready. Without the index, the connection never gets made, and the preprint stays forgotten.

The Discovery Layer

This has profound implications for how we think about AI infrastructure. The dominant narrative in AI investment right now is about training larger models on more data with more compute. The hyperscalers are spending $600+ billion this year building centralized GPU farms on the assumption that the bottleneck is raw computational power. And for training frontier models, that may be true.

But the Move 37 pattern suggests something different about where AI creates value in practice. The moments that shift understanding don't come from the largest models or the most FLOPS. They come from AI systems that can connect knowledge across boundaries that humans cannot cross: disciplinary boundaries, attention boundaries, the sheer volume of existing but unprocessed human knowledge. And they only work when someone has done the patient, expensive, unglamorous work of making that knowledge traversable in the first place.

The mathematical literature is distributed across thousands of journals and preprint servers. Clinical trial data sits in hospital systems across dozens of countries. Manufacturing sensor data lives on factory floors. The connections between these distributed knowledge stores are where the discoveries hide, in the spaces between silos, in the preprints nobody read, in the fifth-line shoulder hits that convention dismissed. But those connections are only possible when the data has been cataloged, cleaned, structured, and indexed before the moment of need arrives.

You don't need a $200 billion data center to find a 2011 preprint. You need two things: AI that can reach the data where it lives, and someone who has done the work to make that data reachable. The second part is where the industry is systematically underinvesting.

What Changes

Google DeepMind's AlphaEvolve has been running a version of this approach at scale, using LLM-guided evolutionary search across 67 mathematical problems spanning analysis, combinatorics, geometry, and number theory. It rediscovered the best-known solutions in most cases and improved on several. The European Commission just invested €75 million in EURO-3C, a federated infrastructure project connecting distributed compute nodes across 13 countries, because the architecture of connecting existing knowledge is fundamentally different from the architecture of concentrating compute.

The pattern repeats: the value is in the connections, not the concentration. And connections require preparation.

After Move 37, Lee Sedol didn't quit Go. He played differently. He started exploring lines of play he'd previously dismissed. Other professional players did the same. The entire field of competitive Go shifted, because one move revealed how much unexplored territory remained in a game humans had played for thousands of years.

Naskręcki's reaction follows the same arc. A mathematician who had publicly declared AI to be a sophisticated calculator watched it produce elegant work in his own specialty and called it a singularity. Not because the machine was smarter than him, but because it showed him how much he hadn't seen.

The 2011 preprint sat on arXiv for fourteen years. The answer to a problem that a mathematician spent months crafting was hiding in plain sight. The only thing missing was something with the breadth to look.

But the preprint was findable because arXiv had done the work. The next Move 37, the one that transforms drug discovery, or materials science, or supply chain optimization, depends on whether someone does the equivalent work for the data in those domains. Building the index. Cleaning the records. Making the knowledge traversable. That isn't the glamorous part of AI. It's the part that makes everything else possible.

Not superhuman intelligence. Superhuman attention. Applied to data that someone had the foresight to prepare.

Want to learn how intelligent data pipelines can reduce your AI costs? Check out Expanso. Or don't. Who am I to tell you what to do.*

NOTE: I'm currently writing a book based on what I have seen about the real-world challenges of data preparation for machine learning, focusing on operational, compliance, and cost. I'd love to hear your thoughts!