A Wharton professor told Anthropic's coding agent "make it better" and watched it turn a Snake game into something stranger: a snake that realizes it lives inside a game, that the rules are arbitrary, and that a player is steering it, before reshaping the whole project, sequentially, into a real-time strategy game, a farm sim, and a small narrative world. The viral version of this story is about an AI that wakes up. The engineering version, the one that actually matters for how software gets built, is about something more mundane and more interesting: whether an AI coding tool can keep a project's structure coherent across dozens of rounds of vague feedback.
The demo comes from Ethan Mollick, a management professor at Wharton who runs the One Useful Thing newsletter and has become one of the more visible independent voices on what working with frontier AI models actually looks like in practice. He posted the Snake sequence on X in several installments, showing how a single prompt, essentially just "make it better" repeated across iterations, pushed Fable, Anthropic's agentic-coding product, to keep rewriting the same codebase rather than restarting it from scratch (Mollick's Snake thread; earlier installment; follow-up). The Chinese tech outlet Leiphone picked up the demo and framed it as Fable 5 forging a "rebellious personality" out of the snake (Leiphone analysis). That framing is the hook. The load-bearing claim sits underneath.
The load-bearing claim is that Fable did not simply generate more code each round. It maintained module boundaries, covering gameplay logic, state, narrative voice, and visual rendering, across many rounds of feedback. A coding assistant that rewrites the whole file every time it gets a vague instruction is not useful. It is an unreliable collaborator that destroys prior work. A coding assistant that edits surgically, that knows which parts of a project to touch and which to leave alone, is a different kind of tool, closer to a junior engineer than to a text generator.
This is the part of the story the "self-aware snake" headline obscures. The interesting engineering question is not whether an AI can produce a clever game design. It is whether an AI can run an open-ended, judgment-heavy modification loop without breaking the codebase underneath. Anthropic positions Fable as exactly that kind of product: an agentic-coding tool meant to extend existing software rather than spin up new files (Anthropic's Fable page). The Fable 5 release, also called "Mythos 5," is the current shipping version of that bet (Fable 5 / Mythos 5 announcement).
Mollick's own writing on working with what Anthropic ships reinforces the same axis. In a recent One Useful Thing post on what it feels like to work with Mythos, he describes long-running projects in which the model makes judgment calls about ambiguous problems, navigates tradeoffs the user has not articulated, and stays coherent across many sessions (One Useful Thing on Mythos). That texture matters because it matches what the Snake demo actually tested: not creative generation in isolation, but project continuity under vague, repeated direction.
So the right way to read "make it better" is not as a magic spell. It is shorthand for the kind of open-ended, low-specificity feedback that real software work runs on. Customers say "make it faster." Product managers say "make it more delightful." Founders say "make it more like the thing we discussed last week." The traditional path is to translate vague feedback into detailed specs, hand them to engineers, and hope the translation does not lose the intent. The new bet, the one Fable represents, is that the AI does the translation itself: it takes a one-line instruction, looks at the running project, and produces a change that respects everything already there.
That bet has a limit, and the Mollick demo also shows it. A self-contained Snake clone is the friendliest possible test case. One codebase, one developer, no users in the loop, no production constraints. Real software has dependencies, legacy code, shared libraries, performance budgets, and other humans reviewing every change. Whether the loop-engineering trick that holds in Mollick's Snake game survives contact with that mess is an open question. Anthropic's own materials frame project continuity as a capability the product is building toward, not one it has fully solved.
The useful takeaway is also a reframe. The conversation about AI coding has been dominated by benchmarks that measure one-shot generation: can the model write a function, solve a coding puzzle, complete a snippet. Fable's Snake game suggests the next benchmark category is different. It is about long-running judgment: how many rounds of vague feedback the agent can absorb, how much of the existing codebase it can keep intact, and how well it knows when not to touch something. Long context windows are a necessary input, but they are not the answer. The answer is what you might call long-run judgment, the ability to make small, correct changes to a living system over time, instead of confidently rewriting the whole thing.
That distinction is the one to watch. When the next demo lands, the right question is not "did the AI do something clever." It is whether the AI edited the project or replaced it.