data-quality - Distributed Thoughts

Distributed Thoughts

Sign in Subscribe

data-quality

A collection of 3 posts

The Natasha Problem: Why Your Data Pipeline Only Fits One Person

data-engineering

The Natasha Problem: Why Your Data Pipeline Only Fits One Person

For most folks, you probably don’t think about clothing sizes. There’s a number, you pick it, you try on the clothes, and if they fit, then congrats, you’re that number. But how’d they pick that number? And why does every style/line/person fit slightly differently?

Your 2026 Resolution: Add Context to Your Data (Before It Breaks You)

data-engineering

Your 2026 Resolution: Add Context to Your Data (Before It Breaks You)

Last week I sat in an executive review where two teams spent forty minutes arguing about "active users." Not about strategy. Not about growth. About what the number meant. One team counted anyone who logged in. The other excluded users who bounced in under 30 seconds. Neither knew

Why Your 'AI-Ready' Data Isn't: The Hidden Pipeline Problem Breaking Production AI

Companies spent millions on GPUs and AI talent, only to discover their data pipelines can't actually feed production AI. The revolution isn't waiting for better models—it's waiting for intelligent data pipelines.