shadcn/improve: The Agent Skill That Splits AI Coding Into Expensive Thinking and Cheap Execution

Here's what nobody talks about when the AI coding demo goes viral: running a Mythos-class model in a tight agentic loop on a 50k-line codebase is expensive. Not "oops I overspent on coffee" expensive. More like "this feature cost three times what we estimated and the inference bill arrives on the first of the month" expensive.
Teams are learning this the hard way in 2026. Claude Fable 5 — Anthropic's new Mythos-class model launched June 9 — hits around 80% on SWE-Bench Pro. It can work autonomously for days. It's genuinely impressive. It also costs $10 per million input tokens and $50 per million output tokens, double the price of Opus 4.8. When you're feeding it 200k tokens of context every other turn in an agentic loop, those numbers stop being abstract.
But here's the thing most teams haven't quantified yet: most of those tokens aren't doing the hard work. Understanding a codebase, spotting systemic issues, ranking what actually matters — that part requires intelligence. Actually typing the refactor, adding the tests, running the verification commands? Once you have a precise spec, a much cheaper model can do that reliably.
shadcn just shipped the tool that formalizes that split.
What shadcn/improve Actually Is
Launched June 10, 2026 — the day after Fable 5 dropped, and that timing is not accidental — shadcn/improve is an Agent Skill with one job: use your most capable model to audit a codebase and write implementation plans precise enough that cheaper models can execute them without you babysitting the session.
One hard rule embedded into the skill itself: it never edits source code. Read-only, always. The plan is the product.
you → /improve (Fable 5, advises)
plans/ → 001-fix-n-plus-one.md (self-contained specs)
other agent → implements, tests, ships (cheap model, executes)
This is the same person who built shadcn/ui (100k+ GitHub stars) and designed the shadcn/skills release in March's CLI v4 update. The throughline is consistent: you should own your components, your design system should be agentic-ready, and your AI orchestration should be deliberate. /improve is the orchestration layer for the hard part — the codebase audit.
1.4k+ GitHub stars inside 48 hours. Developer communities called it "clever," "exactly the right way," "the senior architect handing plans to junior devs." The framing resonated immediately because it maps onto something engineers already believe: good specs matter more than smart executors.
Why Fable 5 + This Skill = A Window You Should Use Now
Paid Claude subscribers get free access to Fable 5 through June 22, 2026. After that: $10/M input, $50/M output.
shadcn explicitly designed /improve around this window. Run the smartest model now on the hard reasoning work — deep codebase understanding, systemic issue detection, prioritized implementation specs. Let those plans execute on cheaper models for weeks afterward. One expensive audit session versus many cheap execution passes.
Fable 5 is purpose-built for exactly this kind of task. Anthropic's own documentation describes it as capable of planning across stages, delegating to sub-agents, and running for days in a Claude Code harness. GitHub Copilot's internal benchmarks found Fable 5 completed equivalent work with fewer tool calls and lower token consumption than prior Opus-tier models. On the hardest multi-step codebase reasoning tasks — the kind /improve needs — the capability delta over cheaper models is largest.
The question isn't whether Fable is better at auditing your code than Haiku. It obviously is. The question is whether you need Fable for every code edit that follows. You don't.
How the Workflow Runs, Step by Step
Install with:
npx skills add shadcn/improve
Works in Claude Code, Cursor, and any Agent Skills-compatible host.
The skill moves through six phases:
Recon — Maps your stack, build commands, conventions, and git history. Establishes the factual baseline before any judgments get made.
Parallel Audit — Spawns subagents across nine categories simultaneously: correctness/bugs, security, performance, test coverage, tech debt, dependencies/migrations, DX/tooling, docs, and product direction. Every finding requires a file:line reference, an impact assessment, an effort estimate (S/M/L), and a confidence rating. No vague suggestions allowed.
Vet — The main advisor re-validates every finding to kill false positives. Rejected findings are recorded with reasons so they don't resurface next run.
Prioritize — Rank by impact, not severity theater.
Plan — Write self-contained implementation specs in plans/. Each file contains everything a different model needs — one that wasn't in the audit session and has never seen the codebase — to implement correctly.
Execute + Reconcile — /improve execute <plan> dispatches a cheaper model in an isolated git worktree, then has the advisor review the diff. You control merging. /improve reconcile handles hygiene between sessions: verifying what landed, refreshing what drifted, unblocking what got stuck.
The Full Command Surface
| Command | When to Use |
|---|---|
/improve | Full audit, medium to large repos |
/improve quick | Fast scan, low token cost, good starting point |
/improve deep | Thorough analysis, highest ceiling |
/improve branch | Scoped to current branch changes only |
/improve security (or perf, docs, etc.) | Category-focused runs |
/improve next | Roadmap and feature suggestions |
/improve plan <desc> | Write a plan for something you already know needs doing |
/improve execute <plan> | Dispatch cheap executor on one plan |
/improve reconcile | Backlog hygiene between sessions |
--issues | Publish plans as GitHub issues |
Start with /improve quick or /improve branch. Don't burn 200k tokens on a full deep audit before you understand what the skill surfaces on your specific repo.
Why Cheap Models Can Actually Execute These Plans
Most "AI plans" fail for weaker models for three predictable reasons: missing context, vague steps, no verification gates. A plan that says "refactor the authentication module to be more modular" is useless to a less capable model. It'll hallucinate an approach, mutate half the codebase, and break three things you didn't intend to touch.
Here's a real finding from a run against shadcn/ui itself:
# | Finding | Category | Effort | Confidence
1 | shadow-config duplicated in search.ts/view.ts, | tech-debt | M | HIGH
| copies already drifted (TODO at search.ts:31) | | |
2 | O(n²) icon migration (migrate-icons.ts:168) | perf | S | HIGH
The resulting plan for finding #1 includes all of this: a drift check with a git commit stamp so the executor knows if the codebase moved, current-state code excerpts with exact line references, the specific repo commands to run with expected outputs, a narrow in-scope/out-of-scope declaration, numbered steps each ending in a verification gate, a test plan, machine-checkable done criteria, explicit STOP conditions, and maintenance notes.
The README calls this being written for the "weakest plausible executor." Not as a slight — as an acknowledgment that good specs enable reliable execution regardless of model capability. Engineers have understood this for decades. You write detailed tickets not because your teammates are incapable, but because ambiguity is expensive at execution time. The AI workflow equivalent of this is only now getting formalized.
When you add --issues, plans get published directly as GitHub issues. Your team or other agents can pick them up in whatever workflow they already use.
The Token Economics at Scale
Here's where this becomes interesting for anyone running AI-assisted development on a SaaS product.
The math works like this: one audit session with Fable 5 — say 400k input tokens to map a medium-sized codebase — runs you roughly $4 on the input side before output. That's not negligible, but it's a one-time cost per codebase state.
The execution passes? A plan that says "extract shadow-config resolution into a shared helper, update 4 files, verify with pnpm test --filter=shadow" can be executed by Claude Haiku or a comparable cheap model. Cents per execution rather than dollars.
Compound this across real scenarios where the savings stack:
- A mature SaaS codebase with ongoing tech debt cleanup
- Security and performance audits before compliance reviews or launches
- Onboarding a new agent to a complex repo (hand it a plan, not a 200k-token context dump)
- Running reconcile regularly instead of expensive full re-audits every sprint
The pattern: intelligence once, execution many times. You're not re-reasoning about the codebase every time you touch a file. You're executing against a validated, scoped spec that was produced when the expensive reasoning was worth it.
For AI SaaS companies building agentic features, this pattern is directly productizable. A planner tier on Fable/Opus and an executor tier on Haiku/Sonnet, with plans as the handoff artifact — that's a real architecture for controlling agentic feature margins. It mirrors mixture-of-experts ideas but at the workflow level, which is often more practical than custom routing logic inside a single inference call.
💡 Related: Curious about how token costs actually break down across different agent coding plans? Our best AI agent coding token plans 2026 comparison covers pricing tiers, hidden costs, and which plans make sense for different team sizes.
Where It Falls Short (And When Not to Reach for It)
The advisor model matters a lot. Plans written by a weaker model are weaker plans. If you run /improve on Haiku to save costs, you're defeating the whole point. The value proposition assumes you have access to a genuinely capable advisor for the audit phase.
Repos with minimal test coverage or broken CI create problems. The verification gates in plans depend on pnpm test (or your equivalent) actually working. That said, the skill surfaces this as a high-priority finding, so you'll know about it.
For pure greenfield invention — "what should we build next?" with nothing in the codebase to cite — the skill is honest about its constraints. Every directional suggestion has to be grounded in repo evidence. You can't use it to generate product strategy from thin air, and it won't pretend otherwise.
Plans drift. Codebases keep moving. Running /improve reconcile regularly is not optional overhead — it's the mechanism that keeps specs accurate as development continues. Treat the plans/ folder like living documentation, not a one-time artifact.
Where This Fits in the Bigger Picture
This is the clearest practical implementation yet of something that's been obvious in theory but hard to act on: deliberate model orchestration based on where intelligence actually compounds.
The 2026 agentic coding wave has produced genuinely impressive infrastructure — Claude Code subagents, nested agent architectures, the Agent Skills ecosystem that shadcn has been systematically building into. But a lot of teams are still treating every AI task identically: give the frontier model all the context, let it do everything, reconcile the bill at the end of the month.
shadcn/improve gives you a practical answer to that. Not theoretical. Not a whitepaper. A skill you can run today, during a window when Fable 5 access is free for paid subscribers, against a real codebase, and have a plans/ folder of scoped, verifiable implementation specs by end of week.
The teams that build a durable advantage in AI-assisted development won't be the ones with the biggest models in every loop. They'll be the ones who know exactly where the expensive reasoning earns its keep — and have the infrastructure to make that reasoning count downstream.
shadcn/improve is that infrastructure. Version one.
🚀 Building an AI SaaS? Get It Discovered
If you're building developer tools, agent infrastructure, or anything that helps teams ship faster — you've got the product. Now you need distribution.
SaaSCity.io is the gamified, 3D directory designed specifically for indie hackers, AI builders, and SaaS founders. Every listing gets a 3D building on a live city map — a permanent indexed page with a DR 40+ dofollow backlink.
Why list your AI tool on SaaSCity?
- SEO backlinks: Real dofollow links from a DR 40+ domain — learn how domain rating impacts your visibility
- Targeted traffic: Browse live launches — founders and early adopters actively looking for tools like yours
- Community validation: Upvotes push your tool up the city skyline
- 900+ directory exposure: Paid plans unlock submissions to a curated list of high-DR directories
The best tools aren't just built — they're discovered. Submit your project for free and claim your plot on the map.
Install today: npx skills add shadcn/improve
GitHub: github.com/shadcn/improve (MIT license)
Fable 5 free access window: June 9–22, 2026 for paid Claude subscribers
Start with /improve quick on a real branch. See what it finds. Then decide how much of your inference budget is actually earning its keep.
Keep Reading
- Claude Fable 5 Is Out — The Model That Found 271 Firefox Zero-Days Is Now in Your Hands — everything you need to know about Anthropic's Mythos-class model
- Best AI Agent Coding Token Plans 2026 — which pricing tiers actually make sense for agentic workflows
- The Claw Era Is Here: Six Agentic AI Systems — the broader competitive landscape of agentic AI
- Headroom: Cut LLM Token Costs 60-95% on AI Agents — another approach to slashing agent token costs
- How to Build an AI SaaS in 2026 — the full guide from idea to launch
- How to Increase Domain Rating: Complete Guide 2026 — boost your SEO with the right directory strategy