This post collects the notes I jotted on Threads at different times, all answering some version of “which model should I use.” It’s not a systematic benchmark — just the raw thoughts as they were, lined up by topic.
Codex: Complex Bugs
Using Codex. Codex is genuinely god-tier for solving complex bugs. (But slow.)
Sonnet: Good Enough for Documents
As for everyday document work, someone recently brought up “which model should I use,” and my one-line reply was: for documents, Sonnet is good enough.
Haiku 4.5: Full-Stack Dev
Most pretentious flex of 2026: doing full-stack development with Haiku 4.5. Not bravado — only Haiku has the cost-and-speed combination to sustain a “tweak, verify, tweak again” full-stack rhythm.
DeepSeek: The Best Value Alternative
When you’re looking for an alternative to the main model, they’ll keep cutting prices once their compute comes online. Honestly, compliance requirements aside, DeepSeek is probably the best value-for-money alternative right now.
Qwen 3.6 Plus: Wait and See
Another one worth watching, though I’m still on the sidelines, is Qwen. After the core team exodus drama, Alibaba released a preview of their next-gen model Qwen 3.6 Plus on OpenRouter. They claim enhanced coding, agentic capability, frontend development, and complex problem solving. Note: the preview version collects prompts and completion output — be careful in production.
Small Models for Local Deployment: Depends on the Scenario
If we’re talking 3B parameters, the biggest appeal is lightweight plus local deployment. After all, many OCR scenarios involve highly sensitive data that can’t leave the premises. Add federated learning architecture on top and it gets even better.
Gemini Supports Skills Now: Staying on the Sidelines
Last, Gemini. Now Gemini supports Skills too. But given Gemini’s track record, I’m staying on the sidelines.