May 20, 2026 · vibeprompt · 7 min read
Which LLM for Which Step: A Model Strategy Matrix for Vibe Coders
Opus for planning, Sonnet for building, Haiku for chores. Stop paying Opus prices for git commits — a model-per-step playbook with current 2026 pricing.
Most vibe coders pick one model and use it for everything. Opus for the rename, Opus for the commit message, Opus for the typo fix. It works, but you're paying Opus prices for Haiku work — and worse, you're using a slow model when you want fast feedback.
The better setup is a model strategy matrix: a default model for each step of the workflow. Plan with the smart one, build with the workhorse, fix typos with the cheap one. You ship the same code at a third of the cost, with snappier iteration where it matters.
This article maps every step of the 10-step vibe coding workflow to a recommended model — across Claude, GPT-5, and Gemini 3 — and explains the reasoning behind each pick.
The three model tiers (and what they actually do well)
Every major lab now ships three tiers. The names differ, the shape doesn't:
| Tier | Claude | OpenAI | |
|---|---|---|---|
| Heavy reasoning | Opus 4.7 | GPT-5 Pro | Gemini 3 Pro |
| Workhorse | Sonnet 4.6 | GPT-5 | Gemini 3 |
| Fast & cheap | Haiku 4.5 | GPT-5 mini | Gemini 3 Flash |
The split is roughly: heavy tier costs 5-10× the workhorse, runs 2-4× slower, and beats it on multi-hop reasoning, long-context synthesis, and architecture decisions. Workhorse handles 80% of real coding work just fine. Fast tier is for batch tasks where latency and cost matter more than nuance.
The mistake is treating these like a single dial. They're three separate tools.
Step-by-step model picks
Step 00 — Deep Research / market validation
Use: Heavy tier (Opus 4.7 / GPT-5 Pro / Gemini 3 Pro)
Market validation is a synthesis task. You're asking: does this problem actually exist, who has it, what do they currently do about it, how big is the wedge. The model needs to read a lot, hold competing claims in mind, and surface a confident take.
Heavy-tier models are dramatically better at this. Workhorse models tend to give you a pleasant-sounding summary that doesn't actually push back on weak ideas.
Worth the cost: you do this once per project.
Step 01 — Plan
Use: Heavy tier with extended thinking enabled
Planning is the single best place to spend Opus tokens. A bad plan compounds across every later step. A good plan saves hours of "wait, we should have done X" later.
Specifically, planning needs:
- Multi-hop reasoning ("if we pick Supabase, the auth flow looks like X, which means the onboarding screen needs Y")
- Trade-off analysis (heavy models are noticeably less agreeable — they push back on flawed assumptions)
- Long-context synthesis (you're feeding it your PRD, your architecture notes, and asking for an ordered implementation plan)
Pro tip: ask for three plans with different trade-offs (fastest to MVP, lowest infrastructure cost, most robust). Heavy tier produces meaningfully different alternatives. Workhorse tends to produce three variations of the same plan.
Step 02 — Set up context (AGENTS.md, PRD)
Use: Workhorse (Sonnet 4.6 / GPT-5 / Gemini 3)
This is structured drafting from a known shape. You're not reasoning about anything new — you're filling in a template. Workhorse-tier models are perfectly capable, and you save ~70% on cost vs heavy.
Step 03 — Build (core features)
Use: Workhorse — escalate to heavy only when stuck
Real coding work is where workhorse models shine. Sonnet 4.6 ships ~90% of production code in the field, and the community consensus is that defaulting to Opus for everyday building is wasteful.
The escalation rule: if Sonnet has produced two failed attempts at the same problem, switch to Opus. Don't sit on the same model hoping the third try works — it's almost always the model that's the bottleneck, not the prompt.
Step 04 — Test
Use: Workhorse for writing tests, fast tier for running batches
Writing tests is workhorse work. Running a long suite of generated tests, summarizing failures, or batch-generating fixtures is fast-tier work — you want throughput, not nuance.
Step 05 — Ship
Use: Fast tier (Haiku 4.5 / GPT-5 mini / Gemini 3 Flash)
Commit messages. Changelog drafts. README updates. Renaming variables. Bumping versions. All chores. None require reasoning beyond "follow the existing style." Fast tier does these in under a second and costs roughly nothing.
If you're using Claude Code or Codex CLI for git operations, run them on Haiku. You'll save 80% on a category of calls that runs constantly.
Step 06 — Audit / security review
Use: Heavy tier
Security review is the other place heavy reasoning pays off. You want the model to catch the non-obvious issue: the SQL injection that only triggers on a specific Unicode normalization, the race condition between two endpoints, the auth bypass that requires three steps of misuse.
Workhorse models will catch the obvious stuff. Heavy models catch the obvious stuff and the rest.
Step 07 — Polish / UX
Use: Workhorse with vision
UX polish is iterative and visual. Take a screenshot, paste it in, ask for the fix. Workhorse-tier vision models (Sonnet, GPT-5, Gemini 3) are all strong here and the latency matters — you're iterating in seconds, not minutes.
Step 08 — Launch (first 100 users)
Use: Workhorse for content drafts, fast tier for variants
Drafting your Show HN post, your Product Hunt copy, your launch tweet — workhorse work. Generating 30 variants of the same tweet to A/B test? Fast tier, in parallel.
Step 09 — Iterate
Use: Default to workhorse, escalate by signal
Post-launch you're juggling bug reports, feature requests, and small improvements. Default to workhorse for everything. Escalate to heavy when:
- A bug has resisted two workhorse attempts
- You're making an architectural change (not a feature change)
- The decision affects more than one screen / route / model
Pricing reality check (2026)
Approximate output token costs as of May 2026:
| Provider | Heavy | Workhorse | Fast |
|---|---|---|---|
| Anthropic | $75 / Mtok | $15 / Mtok | $1.25 / Mtok |
| OpenAI | $60 / Mtok | $10 / Mtok | $1 / Mtok |
| $40 / Mtok | $7 / Mtok | $0.50 / Mtok |
Heavy tier is roughly 5× workhorse, 50× fast. If 80% of your work is on workhorse and 15% on fast, swapping that for "Opus everywhere" makes your monthly bill 5-10×.
(Specific pricing changes monthly. The ratio between tiers is what to plan around.)
How to actually set this up
The model strategy matrix is useless if you have to manually switch every prompt. The setup that works:
Default to workhorse globally. In Claude Code:
/model sonnet. In Codex CLI:/model gpt-5-medium. In Cursor: set "Auto" to Sonnet, leave Opus available on shortcut.One-tap escalation. Bind a hotkey or alias to switch to heavy tier. In Claude Code:
/model opus. Use it intentionally — for planning, security review, or when stuck.Fast tier on infrastructure. Configure your git tooling, commit hooks, or shell wrappers to call fast tier explicitly. Example: if you use an AI commit message tool, point it at
haiku-4-5notclaude-opus-4-7.Cost dashboard, always on. Look at your monthly spend by model. If your Opus % is over 30, you're overpaying. If your Haiku % is under 20, you're leaving savings on the table.
When to break the rules
Three cases where the matrix above is wrong:
- You're learning. If a new tool/library/concept is the bottleneck, run heavy tier for everything for a week. The extra context-awareness from a smarter model accelerates the learning, even if it's expensive.
- You're on a deadline. Heavy tier's slower per-token, but it's faster per task because it gets to a correct answer in fewer turns. Under pressure, "Opus for everything" can actually be the right call.
- You're shipping production. For the final audit pass before launch, run heavy tier on the diff. Cheap insurance.
TL;DR
| Step | Model tier |
|---|---|
| 00 Deep Research | Heavy |
| 01 Plan | Heavy + thinking |
| 02 Context setup | Workhorse |
| 03 Build | Workhorse, escalate when stuck |
| 04 Test | Workhorse + fast |
| 05 Ship (commits, changelog) | Fast |
| 06 Audit | Heavy |
| 07 Polish (UX) | Workhorse + vision |
| 08 Launch content | Workhorse + fast for variants |
| 09 Iterate | Workhorse by default |
The pattern: heavy tier for thinking, workhorse for building, fast tier for chores. Set defaults once, escalate by signal. Your bill drops 60-70% and you ship the same code.