They Don't Have the Money (And Neither Do You): The Coming Era of Small Models

OpenAI is burning $115B through 2029. If they don't have the money, you definitely don't. The future of production AI isn't trillion-dollar datacenters—it's small, efficient, specialized models running on your infrastructure with your data.

They Don't Have the Money (And Neither Do You): The Coming Era of Small Models

Charles Fitzgerald at Platformonomics just laid out the math on OpenAI, and it's brutal.

$115 billion burn through 2029. That was before the latest announcements. $300 billion for Oracle. $22.4 billion for CoreWeave. $10 billion for Broadcom. The FT estimates they've "signed about $1tn in deals this year for computing power."

While the numbers are absolutely eye watering, at least they have a vision for what to do with it (even if they can't fund it without the GDP of a medium sized country). Your take away? If OpenAI doesn't have the money, you DEFINITELY don't have the money.

The Wile E. Coyote Moment

I love the Wile E. Coyote metaphor. OpenAI keeps running off the cliff, suspended in midair by nothing but investor vibes and AGI fever dreams. Theoretically they could fly! Possibly! But only so many people have the ability to maintain the reality-distortion field forever. Most folks eventually fall pray to gravity.

The NVIDIA investment? $100 billion headline, $10 billion upfront. To unlock the next $10 billion, OpenAI needs to spend $50+ billion first. That's not really funding at all, that's vendor financing with a press release.

Bespoke Investment Group warned: "If NVDA has to provide the capital that becomes its revenues in order to maintain growth, the whole ecosystem may be unsustainable." NVIDIA has become "the funder of last resort." That's some high risk.

Folks over there keeps saying they have "a plan" but won't tell us what it is. Great. Very reassuring. Meanwhile they're throwing apps at every wall—trying to take on Amazon, Apple, Google, Meta, Microsoft, LinkedIn, and TikTok simultaneously.

The Harsh Math

Bain & Company's 2025 Technology Report projects that by 2030, AI companies will need $2 trillion in annual revenue just to fund the compute demand. Even if companies shift all their on-premise IT budgets to cloud AND reinvest all projected AI savings, they'll still fall $800 billion short.

Building enough datacenters to meet demand would cost $500 billion per year. The industry simply doesn't have the revenue to cover it.

I'm not an AI doomer. I genuinely believe this technology is transformative. But we can't just draw a line from the thing that is happening now into the future; even at linear progression (let alone exponential), we'll run out of everything.

What Actually Happens Next

Here's what's coming (and what should happen):

1. Foundation models become commodities

OpenAI, Anthropic, Google, Meta—they'll keep building massive models. But those models will be infrastructure, not products. Margins compress. Differentiation narrows. They look more like cloud providers than AI companies.

2. Value moves to distillation and specialization

You don't need 175 billion parameters to answer customer service questions.
You don't need GPT-5 to extract structured data from internal documents.

You need a small, fast, cheap model distilled from a foundation model and fine-tuned on your specific data.

3. Data governance becomes the moat

The Anthropic settlement quantified the risk: $3,000 per work, tens of billions in potential liability.

But if you're training on your own data—data you own, licensed, generated—those risks evaporate.

Your legal data governance isn't compliance theater anymore. It's your competitive advantage.

4. Edge and on-premises models dominate

What's cheaper than renting a cloud GPUs at scale? Running inference on a model that fits in a few GB of RAM on existing infrastructure.

What's faster than OpenAI API calls? Local inference.

What's more private? Not sending proprietary data to someone else's datacenter.

How This Actually Works

The pattern:

  1. Start with a foundation model (Claude, GPT-4, Llama)
  2. Distill it into a smaller, specialized model
  3. Fine-tune on your specific data (the data you own)
  4. Deploy locally where you control costs, latency, privacy
  5. Iterate on your metrics instead of hoping OpenAI's next update doesn't break everything

This isn't far-future vision. Companies are doing this now. The tools exist. The techniques are well-understood.

I should stress - this does NOT mean you remove the foundation/cloud scale models from your portfolio! You just don't use them for everything.

The research backs this up:

Google's "Distilling Step-by-Step" method showed that a 770M parameter T5 model can outperform a 540B parameter PaLM model using only 80% of training data. That's a 700x model size reduction with better performance.

Berkeley's NovaSky lab trained their Sky-T1 reasoning model for under $450, achieving results comparable to much larger models. DeepSeek demonstrated that smarter algorithmic design can push compute efficiency dramatically forward.

The techniques are well-understood. Google's Vertex AI, Amazon SageMaker, Microsoft's ONNX Runtime, and OpenAI's fine-tuning API all support distillation workflows now. It's become fundamental technique, not experimental.

What's missing is the institutional knowledge that this is how you should be doing AI.

The Economics Are Obvious (Once You Do The Math)

Say you're spending $50K/month on cloud API calls. That's $600K/year.

For many use cases, you could:

  • Hire two ML engineers
  • Distill and fine-tune a specialized model
  • Host it yourself for a fraction of inference costs
  • Own your entire stack
  • Eliminate vendor lock-in

Break even in 12-18 months. Save money every year after.

But doesn't that require expensive GPUs? Not really. Inference on a properly-sized model runs fine on CPUs or modest GPU instances. Training (fine-tuning, really) happens periodically on rented compute.

You're not building a GW data center. You're running a focused, practical system.

The Great Unbundling

What we're witnessing is the unbundling of AI.

The mega-models will exist. They'll be plumbing. The real value—the part that actually makes money—will be targeted applications built on small, distilled models that understand your domain using your data.

You don't need to play their game.

You can't afford to play their game.

What You Can Do RIGHT NOW

The future of production AI isn't in datacenters with trillion-dollar valuations.

It's in small, efficient, specialized models running on your infrastructure, trained on your data, solving your actual problems.

Here's your roadmap:

  1. Audit your current AI spend - Where is the money going? What are you actually getting?
  2. Identify high-volume, low-complexity use cases - Customer service, data extraction, classification
  3. Evaluate distillation options - Distillers like Snorkel, distillation frameworks, or roll your own
  4. Start with one use case - Prove the economics on something small
  5. Build the muscle - This becomes your core competency, not your API credit budget

Birds gotta fly. Bees gotta sting. AI companies gotta take over the future (and burn capital).

But you? You can build something sustainable.

Onward!

Want to learn how intelligent data pipelines can reduce your AI costs? Check out Expanso. Or don't. Who am I to tell you what to do.*

NOTE: I'm currently writing a book based on what I have seen about the real-world challenges of data preparation for machine learning, focusing on operational, compliance, and cost. I'd love to hear your thoughts!