DeepSeek's new OCR model achieves 97% accuracy while using one-tenth the tokens by rendering text as images. The computationally "heavy" modality turns out to be the efficient one, which should make us question some basic assumptions about how we architect these systems.