CitationBenchTalk to Sales
Concepts

Agent: one durable, tool-using process driven by skills

How CitationBench's single generic agent loads skills, calls tools, pauses for approval, and persists every decision — the three-layer mental model for autonomous SEO ops.

CitationBench has one agent. It's a generic, durable, tool-using process that runs whatever task you hand it. The variety comes from skills — packaged capabilities the agent loads on demand (e.g., bootstrap_brand, rank_monitor, link_hunter, refresh_stale).

You invoke the agent the same way you call any other endpoint — but instead of returning a single result, the agent runs a graph of tool calls, can pause for human approval, can spawn child runs, and persists every decision it makes.

This is the page that explains how we think about agents — what they are, what they aren't, and how they relate to the rest of the platform.

The mental model

CitationBench has three layers. Most platforms only have one.

Layer 3  Agent + Skills      ← one generic agent. agent.invoke(skill, input).
                               Skills are named capabilities (bootstrap_brand,
                               link_hunter, …). The agent loads them, plans,
                               calls tools, pauses for approval, finishes.

Layer 2  Tools (REST + MCP)  ← composable primitives. research.keyword.research(...),
                               produce.blog_post.create(...), indexing.gsc.submit(...)

Layer 1  Resources           ← persistent objects. Keyword, BlogPost, LandingPage,
                               LinkBuildingRelationship, AgentInvocation, ...

The split matters because the agent is not magic — it's just a program that loads a skill, asks the LLM what tool to call, calls it, appends the result, and repeats until done. Every step is observable, replayable, and human-approvable.

You can:

  • Skip the agent entirely — call tools directly from your own code. Use CitationBench as a data layer.
  • Use a built-in skillbootstrap_brand, rank_monitor, link_hunter, citation_hunter, content_factory, refresh_stale, keyword_manager, link_swap_evaluator. These are battle-tested skill definitions you invoke via the generic agent.
  • Bring your own agent loop — run your own LLM (Claude, GPT, Gemini) and use CitationBench tools via MCP. CitationBench becomes your tool layer; the agent loop runs wherever you want.

Why we modeled it this way

Three design constraints shaped this.

1. Durability beats brilliance. Agents that run for 20 minutes and silently lose state are useless. Every CitationBench agent invocation is a durable record — it survives restarts, can be queried at any time, and produces an immutable replay log. If a step fails, you see exactly which step, with what input, and what the LLM responded.

2. Approval is a first-class state. Agencies don't want autonomous publishing or autonomous outreach. They want speed plus the ability to gate any outside-world action behind a human approval. So every skill step can declare requiresApproval: true. The agent literally stops, the invocation state moves to WAITING_APPROVAL, and resumes only when an approver acts.

3. One agent, many skills — not many agents. Earlier designs had a registry of named agents. We collapsed that into one generic agent + a skills registry because:

  • Skills compose. The agent running bootstrap_brand may load and apply research.keyword and research.competitor skills mid-run. Treating them all as peer skills made composition obvious.
  • One agent has one lifecycle, one observability surface, one budget model. No per-agent special cases.
  • Users author new skills (prompt templates + tool lists) without us shipping a new "agent."

The data model

Three things back every agent invocation. You'll see them in API responses.

Invocation (the run)

One invocation = one agent run for one skill (which may chain into child invocations for sub-skills).

FieldNotes
invocationIdinv_*** (CUID)
agentIdagt_*** (CUID) — the specific agent instance that ran this invocation. Useful for audit / reproducibility / linking to debug traces.
skillThe skill that was invoked (bootstrap_brand, link_hunter, ...)
skillsUsedAll skills the agent actually loaded during the run (often more than just the primary)
parentInvocationIdNull for root; set for skills the agent chained into
rootInvocationIdStable across the whole graph
depth0 at root, 1 for children, 2 for grandchildren, ...
briefPlain-language summary of what this invocation is doing
modeFOREGROUND (synchronous wait) or BACKGROUND (fire and forget)
statusPENDING, RUNNING, WAITING_INPUT, WAITING_APPROVAL, WAITING_CHILDREN, SUCCEEDED, FAILED, CANCELLED
resultFinal structured output (when status = SUCCEEDED), shape defined by the skill's outputSchema
joinPolicyALL (wait for every child) or ANY (resume when any child finishes)
maxLlmCalls / llmCallsUsedBudget guardrails
lastHeartbeatAtLiveness signal for stuck-invocation detection

Session (the conversation)

A session is a series of related invocations. Multi-turn chats live in one session. Useful for skills you talk to (keyword_manager, custom conversational skills).

FieldNotes
sessionIdsess_***
titleHuman-readable label
messagesFull conversation log (system, user, assistant turns)
loadedSkillsWhich skills the agent had access to in this session

Approval (the gate)

When a step pauses, an Approval record is created. Approving resumes the agent; rejecting kills the invocation.

FieldNotes
approvalIdappr_***
invocationIdLinks to the paused invocation
approverEmail or user ID
decisionAPPROVED or REJECTED
decidedAtWhen the human acted
noteFree-text reason / edit notes

The universal response envelope

Every terminal invocation response carries five fields you'll see across the entire API:

FieldWhat it gives you
invocationIdStable handle for this run; query, replay, cancel, audit
agentIdagt_*** — the specific agent instance that ran
resultThe typed structured output, shape defined by the skill's outputSchema
rawThe agent's raw text — its narration, reasoning, what it was about to do next
filesArray of file paths the agent wrote during the run — scratch notes, intermediate artifacts, final outputs. Read with Agent · files.

Treat result as the contract; raw + files are the audit trail. When you need the why behind a decision (and the structured result doesn't carry it), read raw and the files.

How agents fit with everything else

ConceptRelationship to Agent
WorkspacesEvery invocation is scoped to one workspace. Cross-workspace runs spawn one child per workspace.
ToolsThe skills' "actions." Skills are built out of CitationBench tools; custom skills can use yours too.
Approval WorkflowsEach skill step can declare requiresApproval: true. State machine described above.
DurabilityInvocations are durable — they survive restarts and produce an immutable replay log under the hood. You don't manage the orchestrator yourself.
Prompt templatesSkills are defined as prompt templates with a tool-access list. You can read, fork, or override them.
FilesThe agent can read uploaded files and write its own workspace files during a run.

The built-in skill catalog

Eight built-in skills at v1. Each is fully observable, fully approvable, fully replayable.

SkillWhat it doesCalls tools from
bootstrap_brandURL → full SEO+GEO operating plan in 20 minproduce.crawlresearch.icpresearch.keywordresearch.competitorresearch.discussproduce.blog_post (planning) → produce.landing_page (briefs)
rank_monitorRecurring rank checks with conditional follow-upsdistribute.track_rank (cron) + optional refresh_stale on drop
link_hunterEnd-to-end link buildinglink_building.serp_outreachlink_building.crm.contact.discoverlink_building.campaign.send_email
citation_hunterDaily AI search citation tracking + reclamationresearch.ai_citation.check → on drop, produce.refine
content_factoryKeyword → research → draft → refine → publishresearch.discussproduce.blog_postproduce.refineproduce.publish
refresh_staleRank drop or citation drop → content audit → updated draftdistribute.track_rankproduce.evaluateproduce.refine
keyword_managerConversational keyword DB managementresearch.keyword.list/update/relabel
link_swap_evaluatorScore a partner's link-swap proposalresearch.competitor.backlinks + Ahrefs DR lookups

You also can:

  • Fork a built-in skill (agent.skills.fork(slug)) to make a custom workspace-scoped version with different defaults.
  • Define your own by registering a new prompt template with the available tool list — no code deploy needed.

Code samples

REST

# Invoke a skill
curl -X POST https://api.citationbench.com/v1/agent/invoke \
  -H "Authorization: Bearer sk_live_***" \
  -H "X-Workspace-Id: ws_acme" \
  -H "Content-Type: application/json" \
  -d '{
    "skill": "bootstrap_brand",
    "input": { "domain": "acme.com", "depth": "thorough" },
    "approval": { "required": true },
    "mode": "BACKGROUND"
  }'

# → 202 Accepted
# {
#   "invocationId": "inv_01HVZ...",
#   "agentId":      "agt_01HVZ...",
#   "skill":        "bootstrap_brand",
#   "status":       "PENDING",
#   "links": { ... }
# }

MCP (natural language)

> Bootstrap acme.com — full SEO and GEO research. Pause at each step for me to approve.

Claude calls agent.invoke with skill: "bootstrap_brand" and the right input. The MCP server streams progress as notifications.

Common patterns

1. Fire-and-forget at agency scale

For agencies, the common pattern is BACKGROUND mode with cross-workspace fan-out. One call kicks off the same skill across every client workspace.

curl -X POST https://api.citationbench.com/v1/workspaces/bulk-action \
  -d '{
    "action": "agent.invoke",
    "workspaces": "all",
    "config": {
      "skill": "rank_monitor",
      "input": { "alertOn": { "drop": 5 } }
    }
  }'

2. Foreground with streaming

When a human is watching (Claude Code, CLI), use FOREGROUND + SSE event stream.

INVOCATION=$(curl -sf -X POST .../agent/invoke -d '{...}' | jq -r '.invocationId')
curl -N -H "Authorization: Bearer $KEY" \
  "https://api.citationbench.com/v1/agent/invocations/$INVOCATION/events"

3. Approval everywhere outbound

Set approval.required: true on any skill whose steps touch the outside world (publishing, outreach, indexing). The agent pauses; you decide.

4. Compose your own skill

For workflows that don't match a built-in, register a custom skill via the prompt-template API. Then invoke it like any other skill.

curl -X POST https://api.citationbench.com/v1/agent/invoke \
  -d '{
    "skill": "custom:my-weekly-audit",
    "input": { "workspaceId": "ws_acme" }
  }'

The custom: prefix tells the system to load from your workspace-scoped skill registry instead of the built-in registry.

5. Multi-turn conversation

For conversational skills like keyword_manager, pass sessionId to continue.

const first = await cb.agent.invoke({
  skill: "keyword_manager",
  input: {
    message: "Show me PROBLEM_SOLUTION keywords missing landing pages.",
  },
});

const second = await cb.agent.invoke({
  skill: "keyword_manager",
  sessionId: first.sessionId,
  input: { message: "Drop the ones with KD > 40." },
});

On this page