In Cloud Native Compute, VMs Feel Like the Enemy (But They Aren't)

A quick confession

The first production system I was ever a part of, long before anyone said "Kubernetes," ran on a lovingly hand-tuned SunOS box at the NIH with 2 (TWO!) CPUs. We cursed at shared library issues, noisy neighbors, and slow disks, and swore that if we could only break free and add one more machine, life would be glorious.

Then came the advent of VMs. I was lucky enough to be part of shipping many more production systems—at Microsoft, Google, and Amazon—this time on virtual machines instead of physical hardware. Many problems were solved. And many more sprung up.

Fast-forward a few more years and, sure enough, we have a new layer of abstraction: containers. But here's the twist: the VM never actually left. It just sank a layer lower in the stack and quietly kept the lights on.

Why VMs Started to Feel Like the Bad Guy

Containers gave us millisecond start times, tiny footprints, and the fantasy that hardware had vanished. By comparison, VMs looked heavy, slow, and dangerously stateful. At scale, those traits translated into higher bills, fragile infrastructure, and longer lead times to ship code. The industry raced to abstract VMs away entirely, hoping to let a thousand microservices bloom.

Here's what made VMs feel like the enemy:

  • Slow Boot Time: VMs often took 30-90 seconds to start versus a container's sub-second startup. When you're auto-scaling under pressure, every second counts. That latency directly impacts your application's responsiveness and your cloud bill.
  • Slower Deployment Time: Setting up the VM was one thing, but deploying the application on top of it required an entirely separate toolchain. The most common path was to rebuild your entire virtual machine image from scratch, leading to extremely slow pipelines that delayed deployments even more.
  • Resource Waste: Each VM carried a full guest operating system, eating 1-2GB of RAM and significant CPU just to exist before your application even started. Containers share the host kernel, letting you pack 10x more workloads per machine.
  • Snowflake Syndrome: VMs encouraged creating "pet" infrastructure—manually configured servers that become irreplaceable and impossible to recreate accurately. This mutable approach meant configuration drift was inevitable, leading to those nightmarish "it works on my machine" debugging sessions.
  • Patching Nightmares: Managing OS updates and security patches across a fleet of VMs meant coordinating maintenance windows, testing for compatibility, and praying that a patch didn't break a critical application. With containers, you just rebuild the image with the new base layer and redeploy.

The Reality Check

Strip away all the noise, and three things are still immutably true. First, isolation is non-negotiable. Second, legacy isn't something you can just discard. Third, knowing the messy underlying details provides significant value. VMs solve these problems, and here's why we still desperately need them:

  • Security Boundaries That Matter: When you're running untrusted code or handling regulated data, the hypervisor-level isolation of a VM is the gold standard. Container escapes are rare but real; VM escapes are mostly theoretical.
  • Hardware Affinity: Need to pin specific GPU models, NUMA topologies, or specialized accelerator chips for an ML workload? VMs give you bare-metal levels of control without the bare-metal headaches.
  • Compliance Checkboxes: Auditors understand VM isolation. Explaining container namespaces and cgroups in a SOC 2 report is a special kind of fun no one wants. The distinct, auditable boundary of a VM simplifies those conversations immensely.
  • Legacy and COTS Compatibility: That critical Oracle database or third-party Windows app that powers your entire business? It's probably happier and more stable living in a VM than it would be after an attempt to containerize 20 years of vendor-specific tuning.
  • Multi-Tenancy at Scale: Every major cloud provider uses VMs to safely run your workloads next to someone else's Bitcoin miner without either of you knowing. They are the foundation of the public cloud.

Every managed container service—from GKE to EKS to AKS—rides on a foundation of VMs for these exact reasons.

The Declarative North Star

The real enemy was never the VM. It was snowflake infrastructure that demanded imperative, manual babysitting. Containers felt liberating because kubectl apply finally let us declare our intent and let the system chase reality.

What "declarative" actually gives you is transformative:

  • Describe Intent, Not Steps: Instead of writing a script that says "SSH to server A, run script B," you write a manifest that says, "I need 3 replicas of this app running." The system figures out the how.
  • Self-Healing by Default: When a server fails or a container crashes, the orchestrator automatically detects the deviation from your declared state and works to fix it—no 3 AM pager duty to manually restart a service.
  • Reproducible Environments: The same YAML file creates identical setups in dev, staging, and production. Configuration drift becomes a thing of the past.
  • Version Control for Infrastructure: Your entire system state lives in Git. Rolling back a failed deployment is as simple as creating a new git commit.
  • Audit Trails That Matter: Every change is tracked, reviewed, and applied through a pull request. Compliance teams can see exactly who changed what, when, and why.

This exact same lesson is repeating itself in the world of data, right now.

From Compute to Data Locality

Moving an application into a container shaved hours off deployments. But moving petabytes of data into a single, centralized data warehouse before you can analyze it adds weeks to every experiment, all while landing you with a compliance migraine and a shocking egress bill. The data layer still behaves like it's 2010: copy first, process later, and justify the transfer cost in a finance meeting next quarter.

We need the data analogue of container orchestration: the ability to declare the job, point it at the data where it's being created, and let a smart scheduler handle the rest while keeping governance and security baked in by design.

Where Expanso Fits

Expanso exists to make "compute where the data is being created" as reliable and boring as launching a container. You declare the workload, and we schedule it to the dataset—whether that's a parquet file in Frankfurt, an S3 bucket in São Paulo, or a real-time LIDAR stream on a wind turbine in the North Sea.

The underlying runtime might do any number of things to get the job done—run a script, start a container, issue a simple lambda function. You won't care, and that is exactly the point. Abstraction wins when it solves a real headache and then gets out of the way.

Call to Action

If you spend more time arguing about replication factors and data transfer costs than you do about business outcomes, let's chat. The next decade belongs to teams that treat data locality and governance as compile-time decisions, not post-mortem regrets. I would love to compare scars.


Series RoadmapThis piece continues our series examining Kubernetes at ten: where it's excelled, where it's stalled, and how we bridge the gap between compute and data in a truly distributed world.