Building a 5x PM agent on Claude Code

Full write-up on Medium: Building a 5x PM Agent. This is the condensed case-study version.

Context

Most PM work isn’t the thinking — it’s the production around the thinking. The judgment (“what should we build, and why”) takes minutes; writing the PRD, hunting for context, sketching a mockup, and filing the ticket take hours. And every new request starts that assembly over from scratch.

I built an autonomous PM agent to own that assembly layer, so I keep the judgment and the agent does the production.

The problem

The goal was to collapse the distance between having an insight and having a reviewable artifact — without giving up the parts of PM that actually require a human. That meant an agent that could take a raw, ambiguous Slack message and turn it into a ready-for-review PRD, mockup, and ticket, while staying inspectable enough that I’d trust it and improve it.

Approach

Architecture. The kernel is Claude Code in headless mode (claude -p) with product-agnostic routing logic. Capabilities are six on-demand skills — create-prd, create-story, update-spec, design-mockup, ask-pm, and action-request — loaded only when needed.

The pipeline, in five stages:

Capture — a Slack Socket Mode listener (always-on, no public webhook) picks up the message.
Classify — a two-tier classifier (cheap regex first, model only when ambiguous) routes to the right skill.
Research in parallel — sub-agents simultaneously pull context from Notion, Linear, and Fireflies, run competitive research over web + knowledge base, and mine call transcripts; an orchestrator collects the results.
Generate — a structured PRD (problem traced to evidence, requirements as tables, testable acceptance criteria) plus an interactive JSX mockup rendered to PNG via headless Chrome.
Publish — a Linear ticket, the PRD in Notion, the mockup image, and a tidy cross-linked Slack reply. Thread memory lets follow-ups update existing artifacts instead of regenerating them.

Knowledge & cost. Per-product versioned knowledge packs (surfaces, vocabulary, integration IDs, a screenshot index) keep mockups consistent with the real product. A stronger model orchestrates while a cheaper model runs the parallel research sub-agents, under a hard per-task budget cap.

Impact

~5x lift in product velocity — the agent absorbs the hours of assembly so the cycle from insight to reviewable artifact runs in minutes.
One Slack message now yields a complete, cross-linked PRD + mockup + Linear ticket, ready for review.

Reflection & tradeoffs

Human stays in the loop by design. Everything lands as ready-for-review — the PM owns the what and the why; the agent owns the how.
Observability was the unlock. Every run is inspectable (repo logs + a Notion work log), so the agent improved continuously — no black box.
Phased rollout earned trust. Solo testing in a private channel → manual trace inspection → team access only after it proved reliable.
Designed for the failure modes: a two-tier classifier for ambiguous intent, budget caps for cost, knowledge packs treated as living docs to fight staleness, a screenshot index to stop the agent reinventing UI, and explicit “report what broke” handling for flaky MCP connections.
What’s next: turning the captured traces into formal eval harnesses (Braintrust), and generalizing from drafting into operations (metrics monitoring via PostHog, support triage via Intercom).