AO

Architecture

How Agent Orchestrator fits together — plugin slots, session lifecycle, event bus, prompt assembly, and activity detection.

Agent Orchestrator (AO) is a Node.js orchestrator that spawns and manages parallel AI coding agents across isolated git worktrees. Every moving part is a plugin; the core provides the state machine, event bus, and prompt assembly that ties them together.

The 8 Plugin Slots

Each abstraction in AO is a named interface defined in packages/core/src/types.ts. Seven of the eight slots are pluggable at runtime; the eighth (Lifecycle) is built into core and cannot be replaced.

SlotDefaultPurposeInterface
RuntimetmuxWhere agent sessions execute (tmux, process, docker, k8s)Runtime
Agentclaude-codeWhich AI coding tool is launchedAgent
WorkspaceworktreeCode isolation — each session gets its own git worktree or cloneWorkspace
TrackergithubIssue tracking (GitHub Issues, Linear, GitLab)Tracker
SCMgithubPR lifecycle, CI checks, and code reviewsSCM
NotifierdesktopPush notifications to the human (desktop, Slack, webhook)Notifier
Terminaliterm2How humans view and interact with running sessionsTerminal
Lifecyclecore (non-pluggable)State machine, poll loop, and reaction engineLifecycleManager

The Lifecycle slot is not pluggable. It is instantiated by core and wired to all other plugins automatically. You configure its behaviour (poll interval, reactions, thresholds) through agent-orchestrator.yaml rather than by replacing the implementation.


Session Status Lifecycle

Every session moves through a well-defined set of statuses. The values are defined by the SESSION_STATUS constant in packages/core/src/types.ts.

spawning


working ──────────────────────────────────────────────► stuck
   │                                                     ▲
   ▼                                                     │
pr_open ──────────────────────────────────────────────► stuck

   ├──► ci_failed

   ├──► review_pending

   ├──► changes_requested

   └──► approved


         mergeable


          merged ──► cleanup ──► done

Terminal statuses (session is dead and will no longer be polled): killed, terminated, done, cleanup, errored, merged.

StatusDescription
spawningSession is being created — worktree, branch, and tmux window are initialising
workingAgent is active; no PR yet
pr_openAgent has pushed a PR; CI and reviews are pending
ci_failedOne or more CI checks on the PR are failing
review_pendingPR has been submitted for review; waiting for a decision
changes_requestedReviewer(s) have requested changes
approvedPR is approved but not yet mergeable (e.g. still behind base)
mergeablePR is approved, CI is green, and it can be merged
mergedPR has been merged (terminal)
cleanupPost-merge cleanup in progress (terminal)
doneSession completed cleanly (terminal)
needs_inputAgent is waiting for a permission prompt or human input
stuckAgent has been idle beyond the configured agent-stuck threshold
erroredUnexpected error — session is dead (terminal)
killedSession was explicitly killed or the PR was closed (terminal)
idleAgent process is alive but has not produced activity for an extended period
terminatedSession was terminated externally (terminal)

How transitions are determined

The lifecycle manager calls determineStatus(session) on every poll cycle. The logic follows this cascade:

  1. Runtime liveness — If the runtime reports the session is not alive, return killed.
  2. Agent activitygetActivityState() is called; waiting_input maps to needs_input, exited maps to killed, and idle beyond the configured threshold maps to stuck.
  3. PR auto-detection — If no PR is recorded and the agent has a branch, scm.detectPR() is called once per cycle to catch PRs created without a metadata hook.
  4. PR state — If a PR exists, the SCM plugin provides CI status, review decision, and merge readiness to determine ci_failed, review_pending, changes_requested, approved, mergeable, or merged.
  5. Default — Fall back to working (or preserve stuck/needs_input).

Event Bus

After each status transition, the lifecycle manager constructs a typed OrchestratorEvent and fans it out to all configured notifiers and reaction handlers. Events have four priority levels: urgent, action, warning, and info.

Priority is inferred by inferPriority() in lifecycle-manager.ts:

  • urgent — events containing stuck, needs_input, or errored
  • action — events containing approved, ready, merged, or completed
  • warning — events containing fail, changes_requested, or conflicts
  • info — everything else, including all summary.* events
event.typePriorityWhen emitted
session.spawnedinfoSession transitions out of spawning
session.workinginfoSession enters working
session.exitedinfoAgent process exits
session.killedinfoSession is killed
session.idleinfoSession enters idle
session.stuckurgentSession exceeds the agent-stuck threshold
session.needs_inputurgentAgent is waiting on a permission prompt
session.erroredurgentSession enters errored
pr.createdinfoSession transitions to pr_open
pr.updatedinfoPR title or state changes
pr.mergedactionPR is merged
pr.closedinfoPR is closed without merging
ci.passingactionCI checks recover from failing to passing
ci.failingwarningSession enters ci_failed
ci.fix_sentinfoCI fix message sent to agent
ci.fix_failedwarningCI fix attempt failed
review.pendinginfoSession enters review_pending
review.approvedactionSession enters approved
review.changes_requestedwarningSession enters changes_requested
review.comments_sentinfoReview comments forwarded to agent
review.comments_unresolvedwarningUnresolved review comments still present
automated_review.foundwarningBot/automated review comments detected
automated_review.fix_sentinfoAutomated review fix sent to agent
merge.readyactionSession enters mergeable
merge.conflictswarningPR has merge conflicts
merge.completedactionSession enters merged
reaction.triggeredinfoA configured reaction fired
reaction.escalatedurgentA reaction exceeded its retry/escalation threshold
summary.all_completeinfoAll sessions have reached terminal statuses

For the webhook wire format, see Webhook Notifier. For configuring which events trigger automated reactions, see Reactions.


Poll Loop

The lifecycle manager runs a recurring poll loop. The default interval is 30 seconds (configurable via start(intervalMs)). Each cycle:

  1. Lists all active sessions via sessionManager.list().
  2. Batch-fetches PR enrichment data — a single GraphQL query retrieves CI status, review decision, and merge readiness for all open PRs at once, replacing N×3 individual REST calls with one request.
  3. Checks each session concurrently — checkSession(session) calls determineStatus(), detects transitions, fires events, and evaluates reactions.
  4. Prunes stale tracker entries for sessions that no longer exist.
  5. Checks whether all sessions are complete and fires summary.all_complete if so (emitted once per batch, not repeatedly).

The dashboard then receives these state changes via SSE at a 5-second cadence. The poll loop and SSE cadence are independent — the dashboard may show state that is up to 5 seconds behind the last poll cycle.


Prompt Assembly (3 Layers)

Every agent session is launched with a composed prompt built by buildPrompt() in packages/core/src/prompt-builder.ts. The three layers are always concatenated in order:

Layer 1 — Base prompt (fixed)

BASE_AGENT_PROMPT provides identity, session lifecycle rules, git workflow guidance, and PR best practices. It is identical across all sessions. For projects without a remote repository, a trimmed variant (BASE_AGENT_PROMPT_NO_REPO) is used instead — it omits PR and CI instructions that do not apply.

Layer 2 — Config context (per-project)

Built from the project configuration. Includes:

  • Project name and ID
  • Repository (owner/repo)
  • Default branch
  • Tracker plugin name
  • Issue ID and issue body (when spawning from a tracker issue)
  • Reaction hints — lists which events will auto-send instructions back to the agent

Layer 3 — User rules (per-project)

Loaded from agentRules (inline string in agent-orchestrator.yaml) and/or agentRulesFile (path to a file, relative to the project root). Both are concatenated when present. If neither is provided, this layer is omitted.

An explicit userPrompt is appended after Layer 3 as "Additional Instructions" — it has the highest precedence and overrides anything above it.

Orchestrator rules

The orchestratorRules field in ProjectConfig is reserved for orchestrator-role sessions but is not applied by buildPrompt(). Orchestrator sessions receive a completely different prompt generated by generateOrchestratorPrompt() — see the next section.


Orchestrator Prompt

Orchestrator sessions do not receive the standard three-layer prompt. Instead, generateOrchestratorPrompt() in packages/core/src/orchestrator-prompt.ts builds a standalone prompt that provides:

  • Role rules — read-only investigations only; never own a PR; never use tmux send-keys directly; always use ao send / ao spawn to delegate.
  • Project info — name, repo, default branch, session prefix, local path, dashboard port.
  • Quick-start commandsao status, ao spawn, ao batch-spawn, ao session claim-pr, ao send, ao open.
  • Available ao commands table — full reference adapted to whether a repo is configured.
  • Session management workflows — spawning, monitoring, PR takeover, investigation workflow, cleanup.
  • Dashboard info — URL and feature summary.
  • Automated reactions — lists configured reactions so the orchestrator knows what the system will handle automatically.
  • Common workflows — bulk issue processing, handling stuck agents, PR review flow, manual intervention.
  • Project-specific rules — content of orchestratorRules from ProjectConfig, appended last.

For a guide on per-role agents, see Per-Role Agents.


Activity Detection

Every agent plugin must implement getActivityState(session, readyThresholdMs?). This is the most critical method in the agent plugin — the dashboard, lifecycle manager, and stuck-detection all depend on it.

The 6 activity states

StateMeaningWhen
activeAgent is processing — thinking, writing code, running toolsActivity within the last 30 seconds
readyAgent finished its turn and is alive, waiting for input30 seconds – 5 minutes since last activity
idleAgent has been quiet for an extended periodMore than 5 minutes since last activity (default threshold)
waiting_inputAgent is at a permission prompt or asking a questionPermission request detected
blockedAgent hit an error it cannot recover from on its ownError state detected
exitedAgent process is no longer runningisProcessRunning returns false

The getActivityState cascade

Every agent plugin must implement this cascade in order:

1. PROCESS CHECK
   └─ isProcessRunning() → false → return { state: "exited" }

2. ACTIONABLE STATES
   └─ checkActivityLogState() → waiting_input or blocked → return immediately

3. NATIVE SIGNAL (agent-specific)
   └─ session list API, native JSONL timestamp, etc.
   └─ classify by age: active (<30s) / ready (30s–threshold) / idle (>threshold)

4. JSONL ENTRY FALLBACK (mandatory)
   └─ getActivityFallbackState(activityResult, activeWindowMs, threshold)
   └─ age-based decay: active→ready→idle (never promotes)
   └─ staleness cap: waiting_input/blocked entries expire after 5 minutes

5. Return null only when there is genuinely no data at all

Step 4 (the JSONL entry fallback) is mandatory. Skipping it means getActivityState returns null whenever the native API fails — the dashboard shows no activity state and stuck-detection breaks for the entire session lifetime. This was a real bug in the OpenCode plugin.

Two JSONL patterns

PatternUsed byHow it works
Agent-native JSONLClaude Code, CodexThe agent writes its own JSONL with rich state entries (permission_request, tool_call, error, etc.). getActivityState reads the last entry and maps it to activity states.
AO activity JSONLAider, OpenCode, new agentsThe agent implements recordActivity, which calls recordTerminalActivity()classifyTerminalActivity()appendActivityEntry() to write to {workspacePath}/.ao/activity.jsonl. getActivityState reads from this file.

Thresholds

ConstantValuePurpose
DEFAULT_ACTIVE_WINDOW_MS30 secondsActivity newer than this is active; older is ready
DEFAULT_READY_THRESHOLD_MS5 minutesready sessions older than this become idle
ACTIVITY_INPUT_STALENESS_MS5 minuteswaiting_input / blocked JSONL entries expire after this duration

PATH Wrappers

When an agent creates a PR or switches a branch, AO needs to update the session metadata (e.g. write pr=https://... or branch=feat/INT-123) so the dashboard and lifecycle manager stay in sync. Two mechanisms exist:

Claude Code — PostToolUse hooks

Claude Code writes .claude/settings.json with a PostToolUse hook that fires after every gh pr create or git checkout command. The hook script calls update_ao_metadata directly.

All other agents — PATH wrappers

Agents without a native hook system (Codex, Aider, OpenCode, custom agents) use ~/.ao/bin/gh and ~/.ao/bin/git shell wrappers. These wrappers are installed to ~/.ao/bin/ by setupPathWrapperWorkspace(workspacePath) from packages/core/src/agent-workspace-hooks.ts. The function also writes session context to {workspacePath}/.ao/AGENTS.md (gitignored — does not touch tracked files).

The wrappers intercept:

  • gh pr create — captures the PR URL from stdout and writes pr=<url> and status=pr_open
  • gh pr merge — writes status=merged
  • git checkout -b <branch> / git switch -c <branch> — writes branch=<name>

All other commands pass through transparently via exec "$real_gh" "$@" or exec "$real_git" "$@".

For storage details, see Storage — PATH Wrappers.


Observability

The lifecycle manager, session manager, and plugin registry emit structured telemetry using project observers created by createProjectObserver(). Each running process writes a JSON snapshot to:

~/.agent-orchestrator/{hash}-observability/processes/{component}-{pid}.json

The hash is the first 12 characters of the SHA-256 of the config directory path. The {component} segment matches the internal observer name (e.g. lifecycle-manager, session-manager).

The dashboard's /api/observability route reads and merges these per-process snapshots to produce a live observability view.

Feedback reports from the agent's bug_report and improvement_suggestion tools are written as flat key-value files at:

~/.agent-orchestrator/{hash}-{projectId}/feedback-reports/*.kv

Data Flow Summary

agent-orchestrator.yaml ──► Config Loader (Zod) ──► Plugin Registry

                                          ┌───────────────┘


                                    Session Manager ◄─── ao spawn / ao session


                                  Lifecycle Manager ────► Events ────► Notifiers
                                          │                 │              │
                                          │           Reactions       Webhook


                                    Dashboard API
                                  (Next.js App Router)

                              ┌───────────┴──────────────┐
                              │                           │
                              ▼                           ▼
                         SSE (5s)                  WebSocket (terminal)
                              │                           │
                              ▼                           ▼
                          React UI                   xterm.js

Next Steps