vision-models - Distributed Thoughts

Distributed Thoughts

Sign in Subscribe

vision-models

A collection of 1 post

A Picture Is Worth Ten Thousand Tokens

ai-infrastructure

A Picture Is Worth Ten Thousand Tokens

DeepSeek's new OCR model achieves 97% accuracy while using one-tenth the tokens by rendering text as images. The computationally "heavy" modality turns out to be the efficient one, which should make us question some basic assumptions about how we architect these systems.