Claude Code on Large & Existing Codebases in 2026: CLAUDE.md Layering, Semantic Search & Subagents That Scale

The "open a folder and start prompting" workflow that's magic on a fresh project falls apart on a 100k-line monorepo or a legacy codebase nobody fully understands. Not because Claude Code can't handle scale — because the default habits don't. Here's what changes.
This builds on the CLAUDE.md reference and the subagents guide — read those for the fundamentals; this is the scale layer.
Why standard workflows break at scale
- Context isn't infinite, and it isn't free. Even with a large window, every file Claude reads costs tokens and crowds out what matters.
- Default search is agentic. Claude finds code with Grep/Glob/Read on demand. Great for normal repos; on huge ones it can read widely and burn tokens before it finds the right place.
- The silent killer: a bloated CLAUDE.md. On a big repo it's tempting to over-document. A long root file loads every session and reduces adherence across the whole project.
Hierarchical CLAUDE.md for monorepos
The fix is layering, not a bigger root file:
- Lean root CLAUDE.md — repo-wide overview, shared commands, the handful of universal gotchas. Under 200 lines.
- Per-directory CLAUDE.md — package- or module-specific rules in
packages/foo/CLAUDE.md. These load on demand when Claude reads files in that directory, so irrelevant package rules never enter context. - Point at existing docs. If you already maintain architecture docs, import them instead of rewriting:
@docs/architecture.mdpulls them in. (This is exactly how a real project's CLAUDE.md should reference its owndocs/.) claudeMdExcludes— in a shared monorepo, ancestor CLAUDE.md files from other teams get picked up. Skip them by glob in.claude/settings.local.json:
{
"claudeMdExcludes": ["**/monorepo/other-team/CLAUDE.md"]
}
The official large-codebases guide covers the full root + per-directory layout.
Semantic search for very large repos
When agentic Grep/Read is clearly the bottleneck (hundreds of thousands to millions of lines), add a code-search MCP server:
- zilliztech/claude-context (~11.9k★) — "make your entire codebase the context for Claude Code." It adds vector/semantic indexing so Claude locates relevant code by meaning, not just by grepping filenames — cutting the token cost of finding things in massive repos.
Wire it up like any MCP server. Don't add it reflexively — it's for genuine scale, where default search visibly struggles.
Explore before you edit
This is the highest-impact habit on unfamiliar large code:
- Use the Explore subagent (or a custom one) for understanding. Explore reads in its own context and returns a summary — heavy file reads don't pollute your main session. It's read-only and can run on a cheaper model like Haiku to keep exploration cheap.
- Plan mode for any non-trivial change.
Shift+Tabinto plan mode; let Claude lay out the change across files and approve it before a single edit. Never let it edit on first contact with code it hasn't read. - Fresh-context review. After implementing, a review subagent that didn't write the code catches more than self-review.
Context hygiene at scale
/clearbetween unrelated tasks — non-negotiable on big repos; context accumulates fast./compactkeeps the root CLAUDE.md — after compaction Claude re-reads the project-root CLAUDE.md from disk, so your core instructions survive. Nested CLAUDE.md files reload the next time Claude reads that directory.- Let auto memory accumulate. Per-repo, Claude records build quirks and debugging insights to its own
MEMORY.md— on a long-lived codebase this compounds. Run/memoryto see what it's learned.
Legacy-specific discipline
- Surgical changes only. Read callers and exports first; change what the task requires and nothing else.
- Path-scoped rules for known traps. Put "this module uses a deprecated pattern — migrate via X" in a
.claude/rules/file scoped to that path, so the warning loads exactly when Claude touches it. - Debug what's loading. If instructions seem ignored, the
InstructionsLoadedhook logs which instruction files loaded and when — useful for diagnosing path-scoped rules in a big tree.
Onboarding playbook for an existing repo
- Bootstrap with
/init— generates a starting CLAUDE.md from what it can discover (and reads an existing AGENTS.md /.cursorrules). Then trim and customize. - Codebase Q&A first — before any edits. Ask Claude to explain the unfamiliar parts: "where is auth handled?", "how is this class used?", "trace the git history of this function." Let it build a mental model in an Explore subagent.
- Plan, then implement, then verify. Force a plan for the first real task, give it a checkable success criterion (tests pass, lint clean), and review the diff with a fresh subagent.
- Capture as you go. Promote recurring corrections into CLAUDE.md or path-scoped rules; let auto memory hold the rest.
Pitfalls
- One giant root CLAUDE.md instead of per-directory files.
- Editing before exploring on unfamiliar code.
- Adding a semantic-search MCP when plain agentic search was fine (overhead with no payoff).
- Skipping verification because "it's just a small change in a big repo."
Go deeper
- CLAUDE.md fundamentals + template: CLAUDE.md templates that work.
- The subagent mechanics: subagents & agent teams.
- The MCP layer: best MCP servers. The broader workflow: advanced tips.
- Shipped on top of a big codebase with this? List it free on SaaSCity.
FAQ
Does Claude Code index my repo automatically? No — it searches agentically (Grep/Glob/Read). For very large repos, add a code-search MCP like zilliztech/claude-context.
Keeping CLAUDE.md lean in a monorepo?
Lean root + per-directory CLAUDE.md (load on demand) + path-scoped rules + claudeMdExcludes for other teams' files.
Legacy workflow? Explore and trace first, plan mode, surgical edits, path-scoped gotcha rules, verify every change.
Root CLAUDE.md size for 100k+ LOC? Still under 200 lines — push detail to per-directory files and rules.
Get your SaaS in front of founders
List your product on the SaaSCity live city map — a permanent listing, real discovery, and a backlink from a high-DR directory. Free to start; upgrade for a dofollow link and a building on the map.


