Does Claude Code automatically index my large codebase, or do I need RAG/MCP?

By default Claude Code uses agentic search — Grep, Glob, and Read — to find what it needs on demand. That works well and needs no setup, but on very large (hundreds of thousands to millions of lines) repos it can burn tokens reading widely. For that scale, a code-search MCP server like zilliztech/claude-context adds vector/semantic indexing so Claude can locate relevant code without reading everything. Most repos don't need it; reach for it when agentic search is clearly the bottleneck.

How do I keep my CLAUDE.md from eating thousands of tokens in a monorepo?

Keep the root CLAUDE.md lean (under 200 lines) with only repo-wide essentials, and put package-specific instructions in per-directory CLAUDE.md files — they load on demand only when Claude reads files in that directory. Use path-scoped rules in .claude/rules/ for file-type-specific guidance, and claudeMdExcludes in your settings to skip other teams' CLAUDE.md files that aren't relevant to your work.

What's the workflow for a decades-old legacy codebase?

Explore before you edit. Have Claude (or an Explore subagent) read the relevant area and trace how things are used first — never let it edit on first contact with unfamiliar code. Use plan mode for any non-trivial change, insist on surgical edits (touch only what the task needs, read callers/exports first), and put known legacy gotchas in path-scoped rules. Verify every change with tests or a fresh-context review subagent.

How big should the root CLAUDE.md be for a 100k+ LOC project?

Still under 200 lines. Size discipline matters more on big repos, not less — a bloated root file loads every session and degrades adherence across the whole project. Push detail down into per-directory CLAUDE.md files and path-scoped .claude/rules/ so context is spent only where it's relevant.

Claude Code on Large & Existing Codebases in 2026: CLAUDE.md Layering, Semantic Search & Subagents That Scale

The "open a folder and start prompting" workflow that's magic on a fresh project falls apart on a 100k-line monorepo or a legacy codebase nobody fully understands. Not because Claude Code can't handle scale — because the default habits don't. Here's what changes.

This builds on the CLAUDE.md reference and the subagents guide — read those for the fundamentals; this is the scale layer.

Why standard workflows break at scale

Context isn't infinite, and it isn't free. Even with a large window, every file Claude reads costs tokens and crowds out what matters.
Default search is agentic. Claude finds code with Grep/Glob/Read on demand. Great for normal repos; on huge ones it can read widely and burn tokens before it finds the right place.
The silent killer: a bloated CLAUDE.md. On a big repo it's tempting to over-document. A long root file loads every session and reduces adherence across the whole project.

Hierarchical CLAUDE.md for monorepos

The fix is layering, not a bigger root file:

Lean root CLAUDE.md — repo-wide overview, shared commands, the handful of universal gotchas. Under 200 lines.
Per-directory CLAUDE.md — package- or module-specific rules in packages/foo/CLAUDE.md. These load on demand when Claude reads files in that directory, so irrelevant package rules never enter context.
Point at existing docs. If you already maintain architecture docs, import them instead of rewriting: @docs/architecture.md pulls them in. (This is exactly how a real project's CLAUDE.md should reference its own docs/.)
claudeMdExcludes — in a shared monorepo, ancestor CLAUDE.md files from other teams get picked up. Skip them by glob in .claude/settings.local.json:

{
  "claudeMdExcludes": ["**/monorepo/other-team/CLAUDE.md"]
}

The official large-codebases guide covers the full root + per-directory layout.

Semantic search for very large repos

When agentic Grep/Read is clearly the bottleneck (hundreds of thousands to millions of lines), add a code-search MCP server:

zilliztech/claude-context (~11.9k★) — "make your entire codebase the context for Claude Code." It adds vector/semantic indexing so Claude locates relevant code by meaning, not just by grepping filenames — cutting the token cost of finding things in massive repos.

Wire it up like any MCP server. Don't add it reflexively — it's for genuine scale, where default search visibly struggles.

Explore before you edit

This is the highest-impact habit on unfamiliar large code:

Use the Explore subagent (or a custom one) for understanding. Explore reads in its own context and returns a summary — heavy file reads don't pollute your main session. It's read-only and can run on a cheaper model like Haiku to keep exploration cheap.
Plan mode for any non-trivial change. Shift+Tab into plan mode; let Claude lay out the change across files and approve it before a single edit. Never let it edit on first contact with code it hasn't read.
Fresh-context review. After implementing, a review subagent that didn't write the code catches more than self-review.

Context hygiene at scale

/clear between unrelated tasks — non-negotiable on big repos; context accumulates fast.
/compact keeps the root CLAUDE.md — after compaction Claude re-reads the project-root CLAUDE.md from disk, so your core instructions survive. Nested CLAUDE.md files reload the next time Claude reads that directory.
Let auto memory accumulate. Per-repo, Claude records build quirks and debugging insights to its own MEMORY.md — on a long-lived codebase this compounds. Run /memory to see what it's learned.

Legacy-specific discipline

Surgical changes only. Read callers and exports first; change what the task requires and nothing else.
Path-scoped rules for known traps. Put "this module uses a deprecated pattern — migrate via X" in a .claude/rules/ file scoped to that path, so the warning loads exactly when Claude touches it.
Debug what's loading. If instructions seem ignored, the InstructionsLoaded hook logs which instruction files loaded and when — useful for diagnosing path-scoped rules in a big tree.

Onboarding playbook for an existing repo

Bootstrap with /init — generates a starting CLAUDE.md from what it can discover (and reads an existing AGENTS.md / .cursorrules). Then trim and customize.
Codebase Q&A first — before any edits. Ask Claude to explain the unfamiliar parts: "where is auth handled?", "how is this class used?", "trace the git history of this function." Let it build a mental model in an Explore subagent.
Plan, then implement, then verify. Force a plan for the first real task, give it a checkable success criterion (tests pass, lint clean), and review the diff with a fresh subagent.
Capture as you go. Promote recurring corrections into CLAUDE.md or path-scoped rules; let auto memory hold the rest.

Pitfalls

One giant root CLAUDE.md instead of per-directory files.
Editing before exploring on unfamiliar code.
Adding a semantic-search MCP when plain agentic search was fine (overhead with no payoff).
Skipping verification because "it's just a small change in a big repo."

Go deeper

CLAUDE.md fundamentals + template: CLAUDE.md templates that work.
The subagent mechanics: subagents & agent teams.
The MCP layer: best MCP servers. The broader workflow: advanced tips.
Shipped on top of a big codebase with this? List it free on SaaSCity.

FAQ

Does Claude Code index my repo automatically? No — it searches agentically (Grep/Glob/Read). For very large repos, add a code-search MCP like zilliztech/claude-context.

Keeping CLAUDE.md lean in a monorepo? Lean root + per-directory CLAUDE.md (load on demand) + path-scoped rules + claudeMdExcludes for other teams' files.

Legacy workflow? Explore and trace first, plan mode, surgical edits, path-scoped gotcha rules, verify every change.

Root CLAUDE.md size for 100k+ LOC? Still under 200 lines — push detail to per-directory files and rules.

Claude Code on Large & Existing Codebases in 2026: CLAUDE.md Layering, Semantic Search & Subagents That Scale

Why standard workflows break at scale

Hierarchical CLAUDE.md for monorepos

Semantic search for very large repos

Explore before you edit

Context hygiene at scale

Legacy-specific discipline

Onboarding playbook for an existing repo

Pitfalls

Go deeper

FAQ

Get your SaaS in front of founders

Founder resources

Related articles

10 Open-Source Claude Code GitHub Repos Worth Cloning in 2026 (Frameworks, Skills & Workflows)

CLAUDE.md Templates & Examples That Actually Work in 2026 (Copy-Paste Starter, Rules & Auto Memory)

Claude Code Subagents, Background Agents & Agent Teams in 2026: The Real Multi-Agent Guide