The $100 ChatGPT: Why Karpathy's nanochat Represnts the Next Big Thing

Andrej Karpathy just released nanochat - "The best ChatGPT that $100 can buy." In 4 hours on an 8xH100 node, you get a working ChatGPT clone. Not a toy. An actual LLM that writes stories, answers questions, and attempts math problems.
This isn't about edge computing or distributed systems. This is about democratization of training - putting in the hands of anyone the ability to train a model that two years ago would would have been shockingly great. And it's just the start.
The Real Revolution: Readable AI
The entire nanochat codebase is 8,304 lines. As a prep for this article, I just packaged it up and fed it to Claude to ask questions about it. Think about that - the entire system fits in a single context window.
Compare that to:
- PyTorch: 1.8 million lines of code
- TensorFlow: 2.1 million lines
- Hugging Face Transformers: 500,000+ lines
For the first time, a regular developer can understand every line of a complete LLM training and deployment pipeline. No black boxes. No framework magic. Just clean, readable code that does exactly what it says.
The Numbers That Actually Matter
Everyone's focused on the $100 price tag, but look at what you get:
- 560M parameters trained to GPT-2 level performance
- CORE score of 0.22 (better than GPT-2 large at 0.21)
- 4e19 FLOPs total compute (1/1000th of GPT-3)
- Full stack: tokenizer, pretraining, finetuning, RL, inference, web UI
After supervised fine-tuning, it hits 31.5% on MMLU and 38.8% on ARC-Easy. Random chance on these is 25%. This isn't just change, it's learning.
Check out Karpathy's original Twitter thread where he walks through the entire process, or dive deep with his detailed technical walkthrough.
Why This Breaks the Moat Wide Open
For years, the narrative has been that you need:
- Millions in compute
- A team of ML PhDs
- Proprietary training infrastructure
- Access to massive datasets
Karpathy just proved you need:
- $100
- The ability to run
bash speedrun.sh
- 4 hours of patience
- Public data from Hugging Face's FineWeb-EDU
Machine learning has been going this way for a long time, but this is a really big breakthrough. ML development stops being the exclusive domain of big tech and becomes something any developer can understand and modify.
I'm really optimistic!
The Hidden Genius: Progressive Complexity
Look at how nanochat structures the training pipeline:
- Base training (3 hours): Learn to predict next tokens
- Midtraining (8 minutes): Learn conversation format and tools
- SFT (7 minutes): Tighten performance on clean data
- RL (optional): Optimize for specific tasks
Each stage is comprehensible. Each stage is hackable. You can literally watch it get smarter in real-time through the wandb plots.
The midtraining phase alone is so... simple? Teaching the model to take multiple choice quizzes by just showing it 100K examples from MMLU. Teaching it to use Python by wrapping code in special tokens. No complex reward modeling. No constitutional AI. Just examples. Feels REALLY similar to what the breakthrough for AlphaZero.
What You Can Actually Do Right Now
Here's what's possible today with nanochat:
- For Researchers: Test hypotheses without waiting for compute allocation. Change one line, retrain, see what happens. The entire experimental cycle in an afternoon. This builds on Karpathy's earlier nanoGPT but goes SO much further.
- For Startups: Build domain-specific models for less than a day of your AWS bill. A legal-focused variant. A medical assistant. A code reviewer. All trainable for pocket change. Compare this to the $4-6 million foundation models spend on training runs.
- For Educators: Something students can actually run and understand. Not a Colab notebook that calls mysterious APIs. Real training on real hardware. Perfect companion to Karpathy's upcoming LLM101n course at Eureka Labs.
- For Enterprises: Proof-of-concept custom models before committing millions. Test whether your proprietary data actually improves performance. No vendor lock-in.
The Scaling Laws SEEM To Hold Up
Here's the mind-bending part: nanochat scales linearly with investment.
- $100 (depth=20): GPT-2 level, handles basic tasks
- $300 (depth=26): Surpasses GPT-2, coherent conversations
- $1000 (depth=30): Approaching GPT-3 small capabilities
Want GPT-3.5 level? Probably $10,000 and a week. Want GPT-4? You can PROBABLY do it, but... why? A more focused model, trained on exactly your data at GPT 3.5 is probably plenty. (Caveat emptor! But there certainly data to say that it's possible, and for so much cheaper, why not!)
This transparent scaling breaks the black box of AI costs. You're not paying for "API credits" or "compute units." You're paying for specific, measurable improvements in capability.
The Cultural Shift This Enables
We're about to see an explosion of AI understanding. Not usage - understanding.
When you can read the entire codebase, you can:
- Actually debug when things go wrong (unlike with LangChain's 300+ abstractions)
- Understand WHY models hallucinate
- See exactly how attention mechanisms work
- Trace errors back to training data
This is the difference between being a user and being a builder. Between consuming AI and creating it.
The modded-nanoGPT community has already shown what happens when you make models hackable - rapid innovation and real understanding. Now nanochat takes that philosophy to the entire stack.
What Karpathy Didn't Say (But Definitely Knows)
This release is perfectly timed. Right as everyone's getting disillusioned with the "just scale it" approach (see the Chinchilla paper and subsequent debates - came out in 2022 but fills like a million years ago), here comes proof that you can do meaningful work at tiny scale. Or more recently, this SLM paper from Samsung tells a similar story.
The subtext is clear: The future isn't exclusively one massive model controlled by a hyperscaler/hyper-AI company. It's millions of small, specialized, understood models built by developers who know exactly how they work.
This feels a lot like PC revolution all over again. IBM thought computing meant mainframes. Then suddenly everyone had a computer on their desk. Foundation model companies thinks AI means massive models behind APIs. Now everyone can train their own.
Short NVidia? Slow Down.
If you take away one thing from nanochat, let it be this: recreating your own AI/model is now more possible than ever.
Does this mean that centralized models and massive hardware build out are done? No, not even close. There will still be so much need for foundation models that are generalizable to hundreds of millions or billions of people. What this says is that we're going to augment this with additional layers of models that are now more approachable than ever.
A complete LLM system fits in 8,000 lines of clean code will likely be the path to do so. When training costs $100, there's no excuse for not experimenting. When the entire pipeline is readable, there's no excuse for black-box thinking. Karpathy didn't just democratize AI. He ALSO demystified it.
Onward!
Want to learn how intelligent data pipelines can reduce your AI costs? Check out Expanso. Or don't. Who am I to tell you what to do.*
NOTE: I'm currently writing a book based on what I have seen about the real-world challenges of data preparation for machine learning, focusing on operational, compliance, and cost. I'd love to hear your thoughts!