Distributed Thoughts
  • Home
  • About
Sign in Subscribe

data-quality

A collection of 3 posts
The Natasha Problem: Why Your Data Pipeline Only Fits One Person
data-engineering

The Natasha Problem: Why Your Data Pipeline Only Fits One Person

For most folks, you probably don’t think about clothing sizes. There’s a number, you pick it, you try on the clothes, and if they fit, then congrats, you’re that number. But how’d they pick that number? And why does every style/line/person fit slightly differently?
08 Jan 2026 5 min read
Your 2026 Resolution: Add Context to Your Data (Before It Breaks You)
data-engineering

Your 2026 Resolution: Add Context to Your Data (Before It Breaks You)

Last week I sat in an executive review where two teams spent forty minutes arguing about "active users." Not about strategy. Not about growth. About what the number meant. One team counted anyone who logged in. The other excluded users who bounced in under 30 seconds. Neither knew
06 Jan 2026 8 min read
ML & AI

Why Your 'AI-Ready' Data Isn't: The Hidden Pipeline Problem Breaking Production AI

Companies spent millions on GPUs and AI talent, only to discover their data pipelines can't actually feed production AI. The revolution isn't waiting for better models—it's waiting for intelligent data pipelines.
09 Sep 2025 5 min read
Page 1 of 1
Distributed Thoughts © 2026
  • Sign up
Powered by Ghost