Build a software factory with Claude Code.
Seven specialized agents. Three human checkpoints. Features that ship while you sleep.
You thought you were using AI to code. You were typing faster.
That's the gap most builders haven't named yet — and it's the entire reason Claude Code sessions plateau around month two. Here's the difference, and the seven-agent system that closes it.
Save this one. It's going to save you months.
The loop that feels productive but isn't.
Watch the shape of a normal Claude Code session. You ask Claude to build a feature. It generates code. Something breaks. You paste the error back. It patches it. Something else breaks. You ask again. By the end of the day you've shipped something — but you've also touched the same files seven times, the conventions you set up two weeks ago are quietly broken, and a feature in another part of the codebase is now subtly worse than it was this morning.
On day one this feels like magic. By day thirty you're spending more time supervising AI than you used to spend writing code. Same logic appears in three different places. Claude forgot the pattern you established. New features break old ones. Tests are missing or shallow.
You wake up and realize: the AI isn't failing. Your workflow is.
The shift: from vibe coding to a software factory.
Real engineering teams don't work in one big conversation. Different people own different jobs. Someone clarifies the user problem. Someone thinks about architecture. Someone builds the API. Someone builds the UI. Someone considers edge cases. Someone reviews.
When you collapse all of that into one AI session, mistakes compound silently. The fix is to split the work across specialized agents — each with one focused job, its own clean context window, only the tools it actually needs, and strict rules about what it cannot touch.
The result is a software factory. One developer plus seven focused agents equals a coordinated team.
The seven agents, at a glance.
The deep-dive on each agent — their inputs, outputs, and the rules they cannot break — lives in the Advanced tab. Here's the working reference you can keep open while you build.
How the chain actually runs.
You open Claude Code and type: "Build invoice reminders for invoices unpaid for more than 7 days." That's it. Here's what happens without you typing anything else.
- 01
Researcher maps the code.
Pulls the invoice, payment, and email files. Returns relevant patterns and risks. No code written.
- 02
Story Writer drafts the user story.
Acceptance criteria. Edge cases. Out-of-scope. Open questions.
- 03pause
You approve the story.
First human checkpoint. The single approval that prevents 80% of downstream rework.
- 04
Spec Writer drafts the technical brief.
Data model, API, files that will change, risks. Every file that will be touched, named.
- 05pause
You approve the brief.
Second human checkpoint. Highest-leverage approval in the whole chain. If you see "store IDs in memory" — catch it here.
- 06
Backend Builder ships the backend.
Service. API route. Background job. Unit tests. Returns: files changed, patterns reused, all tests green.
- 07
Frontend Builder ships the UI.
Reads the backend summary first. Builds the admin UI and the reminder button. Component tests. All green.
- 08
Test Verifier writes the acceptance tests.
One test per acceptance criterion. Reports: 7 passing, 1 failing — manual trigger doesn't check tenant ownership.
- 09
Validator catches it.
Reports as Critical with file path and line number. Chain loops back to Backend Builder. Fix applied. All 8 tests pass. Validator re-runs. Clean.
- 10pause
You review and open the PR.
Third checkpoint. Three human approvals total. Everything else ran on its own.
The take.
The factory doesn't remove you from the process. It removes you from the parts that don't need you. You stay in the loop where your judgment actually matters: Is this the right problem? Is this the right design? Is this safe to ship?
The agents handle everything between. That's the difference between using AI as a faster keyboard — and using AI as a coordinated team.
The seven agents in full. Inputs, outputs, what they cannot do, why each rule exists.
- 01
The Codebase Researcher.
The single biggest mistake builders make with AI is asking for code as the first move. The AI accepts the prompt, fills the gaps with guesses, and starts generating. That's when bad designs sneak in.
The Researcher fixes this. Its only job is to inspect the codebase and explain how things work — before any code is written.
What it does. Maps the files relevant to the feature. Documents the existing patterns. Finds similar features already built. Flags risks (timezone handling, multi-tenant concerns, retry logic, anywhere the codebase has been weird in the past). Lists which tests will need updating.
What it cannot do. Edit any file. Run any command that modifies state. Make assumptions — it asks instead.
Tools.
Read, Grep, Glob.Nothing else.The rule is simple: explore before you build, every single time. The Researcher always runs first.
- 02
The Story Writer.
Most features fail not because the code was wrong — because the problem was never clearly defined.
The Story Writer turns a rough feature idea into a real user story before any technical decisions get made.
Input. Your rough feature description plus the Researcher's findings.
Output. One user story in the canonical form — "As a [role], I want [behavior], so that [outcome]." A list of acceptance criteria (statements a test can verify directly). Edge cases. A clear out-of-scope section. A list of open questions (it never guesses).
What it cannot do. Invent business rules. Write code. Move forward when something is genuinely unclear.
Tools.
Readonly.This is the first human checkpoint. You read the story and approve it. This single approval prevents 80% of the downstream rework that vibe coding produces.
- 03
The Spec Writer.
Once the story is approved, the Spec Writer turns it into a technical brief. This is the blueprint every build agent follows.
Input. The approved story, the Researcher's findings, your project's
CLAUDE.md.Output. Data model changes. Process flow. API changes. Frontend changes. Tests required — success, failure, edge cases. Every file that will change, named.
What it cannot do. Edit any file. Invent new infrastructure (it calls it out instead). Skip tenant isolation or timezone concerns. Leave questions unanswered.
Tools.
Read, Grep, Glob.Second human checkpoint. If you see something like "store IDs in memory" — that's your red flag. Catch it now, not after ten files have changed.
- 04
The Backend Builder.
Now the building starts. The Backend Builder implements the backend half of the feature — and only the backend half.
What it builds. API routes. Services and business logic. Database access and migrations. Background jobs. Unit tests for everything it writes.
What it cannot do. Touch React components, pages, or hooks. Invent new dependencies. Modify files outside the agreed scope. Stop without running typecheck, lint, and the full test suite.
Tools.
Read, Edit, Write, Bash— scoped to backend folders only.After finishing, it returns a summary: every file added or edited, every helper or pattern reused, any CLAUDE.md rule that would have helped (so you can add it for next time).
The separation is the point. The Backend Builder cannot accidentally break the frontend. Ever.
- 05
The Frontend Builder.
The UI half — and only the UI half. Reads the Backend Builder's summary first, which matters: it consumes the API exactly as the backend produced it.
It does not invent endpoints. If the API shape is wrong for the UI, it surfaces the mismatch as feedback — not as a patch.
What it builds. React components and pages. Hooks and state. Loading and error states. Component and unit tests.
What it cannot do. Touch services, API routes, workers, or migrations. Invent endpoints or response shapes. Add dependencies without instruction.
Tools.
Read, Edit, Write, Bash— scoped to frontend folders only.Two builders. Two clean context windows. Zero chance one breaks the other's work.
- 06
The Test Verifier.
Both builders wrote unit tests for their own code. That's not enough.
The Test Verifier does one thing only: prove the feature actually does what the user story said it should. Acceptance tests, not unit tests. From the outside, the way a user would experience it.
Output. One acceptance test file covering every acceptance criterion. A report: passed / failed / not cleanly coverable.
What it cannot do. Modify any backend or frontend code. Invent workarounds for untestable criteria. Mark a criterion as covered if it genuinely isn't.
Tools.
Read, Edit, Write(test files only),Bash.If a test fails, the feature doesn't satisfy the story. The Verifier reports which criterion. It does not patch the code — that goes back to the right builder.
You don't have a feature until the acceptance tests pass.
- 07
The Implementation Validator.
The agent that catches everything everyone else missed. The Validator compares the current implementation against the approved story and brief — and reports gaps. It never fixes anything. It just tells the truth.
Every check it runs. Acceptance criteria not yet implemented. Failure paths with no coverage. Security issues — missing auth, tenant isolation gaps, secrets in logs, raw errors to clients. Files changed outside agreed scope. Patterns inconsistent with CLAUDE.md. Duplicate logic that should reuse existing helpers. Timezone or multi-tenant concerns from the brief that got quietly skipped.
Output. Grouped by severity — Critical (must fix before merge), Important (should fix), Minor (opinion, reviewer's call). Every finding includes file path and line number. If there's nothing wrong, it says so plainly.
Tools.
Read, Grep, Glob.This agent is why the factory is trustworthy. A self-graded paper is worthless. A Validator that sees only what's on disk — not how it was written — is honest.
The foundation: CLAUDE.md.
Before any of the agents work well, you need this in place. Every time you open Claude Code, it starts with zero memory of your project. CLAUDE.md fixes that. It's a Markdown file at your repo root that loads automatically every session.
Where permanent project facts live: your stack (Next.js App Router, Node.js, Prisma, BullMQ, Resend) · your commands (npm run dev, npm test, npx prisma migrate dev) · architecture rules ("Business logic lives in services. API routes stay thin.") · what not to do ("Do not add cron — use BullMQ. Do not log raw payment payloads.") · pointers to deeper docs.
Keep it between 100 and 300 lines. Every time AI makes a mistake that surprises you, ask: would a rule in CLAUDE.md have prevented this? If yes, add it.
In a few weeks, your CLAUDE.md becomes a record of every assumption the AI got wrong — and your sessions get noticeably better.
Context drift — the silent killer.
Most Claude Code sessions don't fail dramatically. They drift.
A wrong assumption enters the context. The model keeps building on top of it. You ask Claude to build subscription management. It designs User → Subscription. You remember subscriptions belong to the company, not the user.
If you just say "no, subscriptions belong to companies" — Claude patches. Now you have both user.subscriptionId and company.subscriptionId floating around. Tests pass. The bug ships.
What changes once the factory is running.
Before. Vibe coding loop — prompt, generate, error, patch, repeat. Session context fills with noise. Wrong assumptions compound into broken features. One engineer can only do one thing at a time.
After. Structured chain — research, story, brief, build, verify, validate. Each agent gets a clean context window with only what it needs. Wrong assumptions get caught at the brief approval, not after ten files. One engineer ships a complete vertical slice.
The real shift comes when the agents start carrying expert knowledge. The payments specialist on your team builds a payments-integration agent. Now every engineer on the team can ship a feature that touches billing — without waiting, without a handoff. The frontend lead's component patterns live in the Frontend Builder. The DevOps engineer's CI checks live in a pre-commit hook. The QA lead's edge-case taxonomy lives in the Test Verifier.
Expert knowledge stops being trapped in availability. It becomes infrastructure.
Five symptoms with the fix that works.
№ 01Chain stalls at the brief approval.+
№ 02Backend Builder broke the frontend.+
№ 03Validator misses obvious gaps.+
№ 04Sessions get slower over time.+
docs/ and reference them from CLAUDE.md by path. Start a fresh Claude Code session per feature.№ 05An agent committed something it shouldn't have.+
git commit with no guardrail..env, .key, .pem, or known secret patterns. Five minutes of setup. Prevents the disaster.Eight steps. Two to three hours total. Then the factory tunes itself across three or four features.
- 015 min
Install Claude Code.
Get it from code.claude.com. Authenticate. Point it at the repo you want to upgrade.
- 0210 min
Create the folder structure.
Three folders, two skill folders, one hook folder. The shape is the architecture.
.claude/ ├── agents/ ├── skills/ │ ├── feature-factory/ │ └── build-with-tests/ └── hooks/
- 0330 min
Write your CLAUDE.md.
100 to 300 lines. Stack, commands, architecture rules, "don't do this" list. Don't overthink the first version — you'll iterate it as you catch the AI making mistakes.
# CLAUDE.md — project context for Claude Code ## Stack - Next.js App Router (Node.js 22) - Prisma + Postgres (Supabase) - BullMQ for background jobs - Resend for transactional email - Vercel for hosting ## Commands you can run - `npm run dev` — dev server - `npm run build` — production build - `npm test` — full test suite - `npm run typecheck` — tsc --noEmit - `npx prisma migrate dev` — new migration - `npx prisma studio` — inspect data ## Architecture rules - Business logic lives in `lib/services/*`. Routes stay thin. - All DB access goes through Prisma. No raw SQL outside `lib/db/`. - Tenant isolation is enforced in services, not routes. - Use BullMQ for anything async. Do not add cron jobs. ## Don't do this - Do not log raw payment payloads. - Do not store secrets in code. Use env vars + Vercel encrypted storage. - Do not write a fourth copy of the same helper. Grep first. ## Deeper docs - `docs/architecture.md` — full service layout - `docs/billing.md` — Stripe webhooks + retries - `docs/tenancy.md` — how isolation works in services
- 0445 min
Create the seven agents.
Use the
/agentscommand in Claude Code. For each one: describe its role in plain language, list its tools, list what it cannot touch. Claude writes the agent file. You review and commit.Start with the Researcher and Validator first — they're read-only, so they're safe to test on a real feature without risk.
- 0520 min
Create the feature-factory orchestrator skill.
Ask Claude to write a skill that reads your seven agent files and wires the chain — Researcher first, then Story Writer, etc. This is what turns "seven separate agents" into "one coordinated factory."
- 0615 min
Create the build-with-tests skill.
Describes how your team builds: match existing patterns, write tests alongside code, run typecheck before declaring done. The Backend Builder and Frontend Builder both invoke this skill.
- 075 min
Add a pre-commit hook.
Block commits that include
.env,.key,.pem, orsecrets.json. Five minutes of setup. Prevents the disaster where one of the agents accidentally commits a secret. - 081–3 hrs
Run one real feature through the full chain.
Pick something small — small enough that you'd normally just bang it out. Watch where the chain stumbles. Add a rule to CLAUDE.md or to the relevant agent for every stumble.
After three or four features, the factory knows your codebase. You'll spend less time supervising and more time deciding what to build next.
Want this wired into your actual codebase — not a blank .claude/ folder? That's the kind of build the audit + retainer relationship was made for. austinaiguy.com.