The Time Value of Data
Your data is rotting.
Not metaphorically. Not in some hand-wavy "data quality matters" sense. Your data is losing measurable economic value with every hour it sits unprocessed, and the infrastructure you built to manage it has no concept of that loss. No depreciation schedule. No freshness SLOs. No expiration dates. Just an ever-growing lake of records that your systems treat as equally authoritative whether they were ingested this morning or eighteen months ago.
Finance figured this out centuries ago. A dollar today is worth more than a dollar tomorrow. That single insight, the time value of money, underpins every financial instrument on earth: discounted cash flow, net present value, bond pricing, options theory. The entire machinery of modern capital allocation is built on the assumption that value decays over time unless actively maintained.
Data has the same property. We just pretend it doesn't.
The Depreciation Curve Nobody Draws
A Harvard Business School study by Valavi, Hestness, Ardalani, and Iansiti put numbers on what practitioners already suspected: 100MB of text data, after seven years, becomes equivalent in predictive value to just 50MB of current data. Half. Not slightly degraded, not "still useful with caveats." Half the value, gone.
And the decay isn't linear. More stale data doesn't compensate for the staleness. You can't make up for outdated observations by accumulating more of them, because the underlying distribution has shifted and the old data is measuring a world that no longer exists. The study's conclusion is direct: massive data accumulation over time does not create a significant competitive barrier if the data-generating distribution changes. A competitor with less data but fresher data builds a more accurate model.
We depreciate hardware. Three to five years and it's written down on the books. We depreciate software licenses, office equipment, vehicles. We have detailed accounting frameworks for tracking how physical and intangible assets lose value over time, and we make capital allocation decisions based on those curves every quarter.
Data gets none of this. Gartner estimates bad data costs companies 15% of revenue annually. IBM puts the aggregate US figure at $3.1 trillion per year. Those numbers would trigger an audit for literally any other asset class. For data, they're treated as the cost of doing business.
Why AI Makes This Catastrophically Worse
Traditional software using stale data produces wrong answers that look wrong. A dashboard shows last week's numbers. A report pulls from a cached table. Someone notices the graph looks off and files a ticket.
AI using stale data produces wrong answers that look perfect. The model is fluent, confident, and completely authoritative about information that was true three weeks ago and isn't anymore. Glen Rhodes described this as "freshness rot": a silent failure mode. No error spike, no latency blip, no alert. The system just gets progressively less accurate in ways that are impossible to attribute without specifically looking for them. The evals stay green. The answers drift toward wrong.
And the problem compounds dramatically when agents start making decisions, not just generating text. A chatbot serving stale information wastes a customer's time. An autonomous agent acting on stale information executes. If a fraud detection model is scoring transactions against behavioral profiles last updated an hour ago, every minute of staleness is a window. If an inventory agent is optimizing stock levels based on demand signals from yesterday's batch job, it's optimizing for a market that has already moved.
The agent doesn't know the data is stale. It can't know, because nothing in the pipeline tells it.
This is the amplification effect. Stale data in a spreadsheet is a nuisance. Stale data feeding an autonomous system is a liability that scales with every decision the system makes.
What an AI Coding Agent Learned the Hard Way
There's a feature rolling out in Claude Code right now called Auto Dream. The mechanics are simple: while you're not using the coding agent, a background process wakes up, reviews everything the agent has written down about your project across previous sessions, throws out contradictions, converts relative dates to absolute ones, merges duplicates, prunes references to deleted files, and rebuilds the index. The system prompt says: "You are performing a dream, a reflective pass over your memory files."
One observed cycle processed 913 sessions of accumulated project notes in under nine minutes. The feature exists because without it, the agent's own memory becomes adversarial. After about 20 sessions, the notes that were supposed to help Claude remember your project start actively confusing it. "Yesterday we decided X" means nothing six weeks later. A debugging fix for a file you deleted in Sprint 4 is worse than useless in Sprint 9; it's actively misleading.
The theoretical foundation comes from UC Berkeley and Letta's "sleep-time compute" research: AI systems doing useful preprocessing during idle periods can reduce inference-time compute by roughly 5x while maintaining or improving accuracy. Charles Packer from Letta put it precisely: sleep-time compute is deeply tied to persistent state. You can only do useful work during idle time if the agent has memory that can be rewritten and improved.
What's interesting about Auto Dream is not the feature itself. It's the implicit admission it represents. A major AI lab built a coding agent, gave it memory, watched that memory degrade over time, and had to build what amounts to a garbage collector for stale knowledge. The agent's own accumulated context was losing value on a predictable curve, and without active maintenance, it crossed from helpful to harmful.
That's the time value of data made concrete. In a single application. Running on a single user's laptop.
Now think about what's happening in every enterprise data pipeline feeding every production AI system, at a scale of millions of records instead of hundreds of session notes.
The Batch Mindset Is Destroying Value
Most organizations treat data movement as a cost to minimize. Moving data from edge to cloud costs bandwidth. Processing it costs compute. Storing it costs storage. The natural optimization under that framing is to move less, process in batches, and store everything in case you need it later.
But if data's value is decaying continuously, the batch mindset is systematically destroying value. A Promethium analysis found that if a core data feed updates once daily at midnight and agents start processing at 6 AM, they're already working with data six hours stale. By late afternoon, decisions are based on data more than 18 hours old.
Think about what that means economically. The batch pipeline that runs once a day isn't free. It costs compute, storage, and engineering time. But the data it delivers has already depreciated by the time it arrives. You're paying full price for a depreciating asset and delivering it after most of the value has evaporated.
That's not cost optimization. That's systematic value destruction, and we've been calling it architecture.
The economic logic points the other direction entirely: process data where it's generated, as close to the moment of generation as possible, and move only the results. Not because distributed processing is trendy, but because the depreciation curve means every hour of latency between generation and consumption is destroying measurable business value. The freshest data is the data that never had to travel.
Expiration Dates for Everything
Auto Dream's four-phase cycle works because it encodes temporal assumptions about relevance. A debugging note about a deleted file expires immediately. A relative date reference expires the moment it's ambiguous. An architecture decision persists until contradicted. The system doesn't treat all memory as equally valuable. It assigns implicit shelf lives.
Production data pipelines need the same discipline. Rhodes advocates explicit TTLs at ingestion time, categorized by content type: policy documents get one window, product specs get another, anything tied to pricing or people or external integrations gets a short one. When a record exceeds its TTL, it should be re-ingested from source or pulled from retrieval until verified.
This isn't conceptually hard. It's operationally neglected. DataKitchen's 2026 analysis puts it directly: a single schema drift that once meant a broken report now means thousands of incorrect predictions per second, because AI amplifies data quality failures exponentially. The consequences of ignoring temporal value have scaled with the capabilities of the systems consuming the data.
The practical framework is three questions. Every data source gets a freshness SLO: how old can this data be before it's dangerous? Every pipeline stage gets a latency budget: how much of the freshness SLO does this transformation consume? Every downstream consumer gets a staleness circuit breaker: if the data feeding this model exceeds its SLO, fail loud instead of fail silent.
92% of data leaders say data observability will be core to their strategy in the next one to three years. Most of them are still running nightly batch jobs and hoping the data is current.
The Consolidation Imperative
The reason Auto Dream had to exist isn't specific to AI coding agents. It's structural.
Data engineering grew up in a world where the primary risk was losing data. The entire discipline is oriented around durability, completeness, and never dropping a record. Those instincts are valuable, and they're also incomplete for the world we're building. In a world where AI systems act on data autonomously, at speed and at scale, the risk of serving stale data can exceed the risk of losing data entirely. A missing record produces an error that somebody catches. A stale record produces a confident wrong answer that propagates through every downstream system before anyone notices.
Operating systems have had garbage collection, memory compaction, and cache invalidation for decades. We consider it a fundamental failure if a filesystem doesn't reclaim space from deleted files or if a cache serves explicitly invalidated data. And yet we run data pipelines that accumulate without consolidating, that never assess whether their contents have depreciated, and that treat a record from three years ago with the same authority as one from this morning.
The MemGPT team that proposed the sleep-time compute framework started from work on giving LLMs OS-like memory management. That framing is exactly right. What Auto Dream does for an individual agent's memory is what every data pipeline in production should be doing for the knowledge it manages: orient, gather signal, consolidate, prune. Continuously. Not as a quarterly data quality initiative, but as a core operational capability running alongside ingestion and delivery.
The time value of data isn't a theoretical framework. It's an operational reality that most data infrastructure ignores, and that AI makes impossible to keep ignoring.
A coding agent needed to dream because its own accumulated knowledge was rotting. Every production data pipeline has the same problem. The difference is that when a coding agent serves stale context, a developer gets a confusing suggestion. When an enterprise pipeline serves stale context to an autonomous agent making real decisions, the consequences compound at machine speed.
The question isn't whether your data is depreciating. It is. The question is whether you'll start accounting for it before or after the wrong answers ship.
Want to learn how intelligent data pipelines can reduce your AI costs? Check out Expanso. Or don't. Who am I to tell you what to do.*
NOTE: I'm currently writing a book based on what I have seen about the real-world challenges of data preparation for machine learning, focusing on operational, compliance, and cost. I'd love to hear your thoughts!