The Myth of Portability: Helm and Kubernetes and the Data Pipeline Problem

The Myth of Portability: Helm and Kubernetes and the Data Pipeline Problem

I spent years helping companies migrate from bash scripts to Chef, and later to containers. The conversation always started the same way: "We want modern infrastructure, but it has to work with what we have."

The early migrations were brutal. We'd take a cookbook that deployed an app across a dozen servers, containerize it, and then hand-write YAML files to get it running in Kubernetes. Every ConfigMap, Service, and Deployment was crafted manually. One typo in an indent and the entire system failed.

The application itself ran beautifully once you got it working. You got automatic scaling, rolling updates, and identical behavior in dev, staging, and prod. But getting there meant taming an explosion of interdependent YAML files that broke in mysterious ways. Then came the real questions. Where does the app read its configuration? How does it manage state? How does it connect to the database?

The application was in a container, but everything it depended on—its configuration, its state, its data—was still locked to a specific environment.

The YAML Explosion

This pattern played out everywhere. An infrastructure team would champion Kubernetes for its operational benefits, containerize a few applications, and immediately hit a configuration management wall.

The problem was that every deployment was bespoke. Each environment needed slightly different YAML. Each application had its own set of interconnected resources. When something broke, you had to trace dependencies across dozens of files to find any of a myriad of problems.

The real work started when we tried to migrate applications that mattered: complex data workflows with dependencies on shared filesystems, databases, and batch systems. Each one required its own sprawling collection of YAML files, all manually kept in sync.

Helm: A Breakthrough in Templating

When Helm arrived, it felt revolutionary. Instead of managing static YAML, you could template an entire application stack and deploy it consistently. You could parameterize database connections, adjust resource limits, and toggle features with simple value overrides.

This wasn't a panacea. As Brian Grant has talked about in I shouldn't have to read installer code every day, templating often forces you to read and write far too much code just to get a simple outcome. There should be a better way.

Even so, Helm solved one of the biggest problems killing Kubernetes adoption. You could version complex deployments and share them across teams, moving from artisanal YAML crafting to industrial-scale orchestration.

But Helm also introduced its own fragility. Your chart worked perfectly until someone changed a dependency, updated a sub-chart, or modified a shared values file. The templating that made deployment flexible also made it brittle. The Kubernetes ecosystem is still working on a better way to manage dependencies.

The Parallel with Data Pipelines

Here's the lesson every one of those migrations taught me: one of the biggest underlying problems was that we were solving the application packaging problem while completely ignoring the dat/state dependency problem.

When you write a Helm chart that references ${DATABASE_HOST}, you aren't just parameterizing a connection string. You're making a huge assumption that the database exists, the network path to it is open, the security policies allow access, and the schema matches what your application expects. Move that workload to another cloud or region, and the Helm chart deploys perfectly but the application fails. It can't work because it's fundamentally coupled to a data environment that stayed behind.

I watched this happen repeatedly. Infrastructure teams would celebrate a successful Kubernetes deployment, while data teams quietly maintained a parallel universe of region-specific configs and manual processes for every new environment. Data pipelines today are exactly where Kubernetes applications were before Helm. They are brittle collections of scripts and environment-specific assumptions that break the moment you try to move them.

This fragility isn't a bug. It's the natural result of tightly coupling processing logic to specific data locations. Every assumption about where data lives becomes a potential point of failure.

True Portability is Data Independence

Real portability means an application can ask for "customer transaction data from the last 30 days" without caring if that data lives in an on-prem Oracle database, a cloud data warehouse, or a collection of microservice APIs. This isn't about abstraction layers. It's about building infrastructure that can move data to compute as easily as we move compute between environments. When your Helm chart deploys in a new region, the system should ensure the data your application needs is already there, or it should know how to get it there efficiently and securely.

The Expanso Vision

This is what we are focused on at Expanso. We believe you should declaratively state what data an application needs, and how your data should be delivered, not where that data lives. The system should be responsible for figuring out how to deliver it. Your application becomes truly portable because it is no longer coupled to where its data happens to be today.

The cloud-native revolution gave us portable applications, but we chained them to fragile data pipelines. True workload portability isn't about moving stateless code between clusters; it's about making applications independent of data location.

Is your application truly portable if moving it requires rebuilding your entire data pipeline from scratch? What would change if you could deploy workloads anywhere, confident the data would follow?