Skip to main content
Back to Blog
Claude CodeAI CodingLarge CodebaseMonorepoLegacyDeveloper Tools

Claude Code on Large & Existing Codebases in 2026: CLAUDE.md Layering, Semantic Search & Subagents That Scale

ghosty
Founder, SaaSCity
Claude Code on Large & Existing Codebases in 2026: CLAUDE.md Layering, Semantic Search & Subagents That Scale

The "open a folder and start prompting" workflow that's magic on a fresh project falls apart on a 100k-line monorepo or a legacy codebase nobody fully understands. Not because Claude Code can't handle scale — because the default habits don't. Here's what changes.

This builds on the CLAUDE.md reference and the subagents guide — read those for the fundamentals; this is the scale layer.

Why standard workflows break at scale

  • Context isn't infinite, and it isn't free. Even with a large window, every file Claude reads costs tokens and crowds out what matters.
  • Default search is agentic. Claude finds code with Grep/Glob/Read on demand. Great for normal repos; on huge ones it can read widely and burn tokens before it finds the right place.
  • The silent killer: a bloated CLAUDE.md. On a big repo it's tempting to over-document. A long root file loads every session and reduces adherence across the whole project.

Hierarchical CLAUDE.md for monorepos

The fix is layering, not a bigger root file:

  • Lean root CLAUDE.md — repo-wide overview, shared commands, the handful of universal gotchas. Under 200 lines.
  • Per-directory CLAUDE.md — package- or module-specific rules in packages/foo/CLAUDE.md. These load on demand when Claude reads files in that directory, so irrelevant package rules never enter context.
  • Point at existing docs. If you already maintain architecture docs, import them instead of rewriting: @docs/architecture.md pulls them in. (This is exactly how a real project's CLAUDE.md should reference its own docs/.)
  • claudeMdExcludes — in a shared monorepo, ancestor CLAUDE.md files from other teams get picked up. Skip them by glob in .claude/settings.local.json:
{
  "claudeMdExcludes": ["**/monorepo/other-team/CLAUDE.md"]
}

The official large-codebases guide covers the full root + per-directory layout.

Semantic search for very large repos

When agentic Grep/Read is clearly the bottleneck (hundreds of thousands to millions of lines), add a code-search MCP server:

  • zilliztech/claude-context (~11.9k★) — "make your entire codebase the context for Claude Code." It adds vector/semantic indexing so Claude locates relevant code by meaning, not just by grepping filenames — cutting the token cost of finding things in massive repos.

Wire it up like any MCP server. Don't add it reflexively — it's for genuine scale, where default search visibly struggles.

Explore before you edit

This is the highest-impact habit on unfamiliar large code:

  • Use the Explore subagent (or a custom one) for understanding. Explore reads in its own context and returns a summary — heavy file reads don't pollute your main session. It's read-only and can run on a cheaper model like Haiku to keep exploration cheap.
  • Plan mode for any non-trivial change. Shift+Tab into plan mode; let Claude lay out the change across files and approve it before a single edit. Never let it edit on first contact with code it hasn't read.
  • Fresh-context review. After implementing, a review subagent that didn't write the code catches more than self-review.

Context hygiene at scale

  • /clear between unrelated tasks — non-negotiable on big repos; context accumulates fast.
  • /compact keeps the root CLAUDE.md — after compaction Claude re-reads the project-root CLAUDE.md from disk, so your core instructions survive. Nested CLAUDE.md files reload the next time Claude reads that directory.
  • Let auto memory accumulate. Per-repo, Claude records build quirks and debugging insights to its own MEMORY.md — on a long-lived codebase this compounds. Run /memory to see what it's learned.

Legacy-specific discipline

  • Surgical changes only. Read callers and exports first; change what the task requires and nothing else.
  • Path-scoped rules for known traps. Put "this module uses a deprecated pattern — migrate via X" in a .claude/rules/ file scoped to that path, so the warning loads exactly when Claude touches it.
  • Debug what's loading. If instructions seem ignored, the InstructionsLoaded hook logs which instruction files loaded and when — useful for diagnosing path-scoped rules in a big tree.

Onboarding playbook for an existing repo

  1. Bootstrap with /init — generates a starting CLAUDE.md from what it can discover (and reads an existing AGENTS.md / .cursorrules). Then trim and customize.
  2. Codebase Q&A first — before any edits. Ask Claude to explain the unfamiliar parts: "where is auth handled?", "how is this class used?", "trace the git history of this function." Let it build a mental model in an Explore subagent.
  3. Plan, then implement, then verify. Force a plan for the first real task, give it a checkable success criterion (tests pass, lint clean), and review the diff with a fresh subagent.
  4. Capture as you go. Promote recurring corrections into CLAUDE.md or path-scoped rules; let auto memory hold the rest.

Pitfalls

  • One giant root CLAUDE.md instead of per-directory files.
  • Editing before exploring on unfamiliar code.
  • Adding a semantic-search MCP when plain agentic search was fine (overhead with no payoff).
  • Skipping verification because "it's just a small change in a big repo."

Go deeper

FAQ

Does Claude Code index my repo automatically? No — it searches agentically (Grep/Glob/Read). For very large repos, add a code-search MCP like zilliztech/claude-context.

Keeping CLAUDE.md lean in a monorepo? Lean root + per-directory CLAUDE.md (load on demand) + path-scoped rules + claudeMdExcludes for other teams' files.

Legacy workflow? Explore and trace first, plan mode, surgical edits, path-scoped gotcha rules, verify every change.

Root CLAUDE.md size for 100k+ LOC? Still under 200 lines — push detail to per-directory files and rules.

Get your SaaS in front of founders

List your product on the SaaSCity live city map — a permanent listing, real discovery, and a backlink from a high-DR directory. Free to start; upgrade for a dofollow link and a building on the map.