Re-architecting benefits verification into an agentic system

Context

Alongside the PLG scribe, I own an enterprise B2B benefits-verification product at Supanote — a 0-to-1 line I launched and then scaled. Benefits verification is a high-stakes, high-variance workflow: lots of edge cases, lots of ways to be subtly wrong, and enterprise customers who feel every incident.

The problem

The initial implementation was rule-based. Rules are predictable but brittle: every new payer quirk or edge case meant another branch, and the system couldn’t generalize. Reliability suffered as coverage grew — too brittle to hold the bar an enterprise product demands.

Approach

I re-architected the workflow into an agentic, harness-based orchestration:

Parallel agents handling sub-tasks concurrently instead of one monolithic flow.
Skill hierarchies so capabilities compose rather than duplicate.
Tool calling to reach the systems of record the verification depends on.
Context/memory loops so the system carries state across steps.

The reliability work was inseparable from the architecture work. I ran architectural audits to find failure surfaces and built a custom eval harness so we could catch regressions before they reached production rather than discovering them as incidents.

Impact

2.2x ARR within 3 months of the 0-to-1 launch.
Hardened reliability to an enterprise bar — a custom eval harness plus architectural audits caught regressions before they reached production.

Reflection & tradeoffs

The throughline: an agentic system buys you generalization, but it only earns enterprise trust if reliability is engineered in deliberately — evals as a gate, audits as a habit, and clear decisions about where a human stays in the loop. (Expand: the specific human-in-the-loop and guardrail choices you made, and where you chose control over full automation.)