Architecture

How Agent Orchestrator fits together — plugin slots, session lifecycle, event bus, prompt assembly, and activity detection.

Agent Orchestrator (AO) is a Node.js orchestrator that spawns and manages parallel AI coding agents across isolated git worktrees. Every moving part is a plugin; the core provides the state machine, event bus, and prompt assembly that ties them together.

The 8 Plugin Slots

Each abstraction in AO is a named interface defined in packages/core/src/types.ts. Seven of the eight slots are pluggable at runtime; the eighth (Lifecycle) is built into core and cannot be replaced.

Slot	Default	Purpose	Interface
Runtime	tmux	Where agent sessions execute (tmux, process, docker, k8s)	`Runtime`
Agent	claude-code	Which AI coding tool is launched	`Agent`
Workspace	worktree	Code isolation — each session gets its own git worktree or clone	`Workspace`
Tracker	github	Issue tracking (GitHub Issues, Linear, GitLab)	`Tracker`
SCM	github	PR lifecycle, CI checks, and code reviews	`SCM`
Notifier	desktop	Push notifications to the human (desktop, Slack, webhook)	`Notifier`
Terminal	iterm2	How humans view and interact with running sessions	`Terminal`
Lifecycle	core (non-pluggable)	State machine, poll loop, and reaction engine	`LifecycleManager`

The Lifecycle slot is not pluggable. It is instantiated by core and wired to all other plugins automatically. You configure its behaviour (poll interval, reactions, thresholds) through agent-orchestrator.yaml rather than by replacing the implementation.

Session Status Lifecycle

Every session moves through a well-defined set of statuses. The values are defined by the SESSION_STATUS constant in packages/core/src/types.ts.

spawning
   │
   ▼
working ──────────────────────────────────────────────► stuck
   │                                                     ▲
   ▼                                                     │
pr_open ──────────────────────────────────────────────► stuck
   │
   ├──► ci_failed
   │
   ├──► review_pending
   │
   ├──► changes_requested
   │
   └──► approved
            │
            ▼
         mergeable
            │
            ▼
          merged ──► cleanup ──► done

Terminal statuses (session is dead and will no longer be polled): killed, terminated, done, cleanup, errored, merged.

Status	Description
`spawning`	Session is being created — worktree, branch, and tmux window are initialising
`working`	Agent is active; no PR yet
`pr_open`	Agent has pushed a PR; CI and reviews are pending
`ci_failed`	One or more CI checks on the PR are failing
`review_pending`	PR has been submitted for review; waiting for a decision
`changes_requested`	Reviewer(s) have requested changes
`approved`	PR is approved but not yet mergeable (e.g. still behind base)
`mergeable`	PR is approved, CI is green, and it can be merged
`merged`	PR has been merged (terminal)
`cleanup`	Post-merge cleanup in progress (terminal)
`done`	Session completed cleanly (terminal)
`needs_input`	Agent is waiting for a permission prompt or human input
`stuck`	Agent has been idle beyond the configured `agent-stuck` threshold
`errored`	Unexpected error — session is dead (terminal)
`killed`	Session was explicitly killed or the PR was closed (terminal)
`idle`	Agent process is alive but has not produced activity for an extended period
`terminated`	Session was terminated externally (terminal)

How transitions are determined

The lifecycle manager calls determineStatus(session) on every poll cycle. The logic follows this cascade:

Runtime liveness — If the runtime reports the session is not alive, return killed.
Agent activity — getActivityState() is called; waiting_input maps to needs_input, exited maps to killed, and idle beyond the configured threshold maps to stuck.
PR auto-detection — If no PR is recorded and the agent has a branch, scm.detectPR() is called once per cycle to catch PRs created without a metadata hook.
PR state — If a PR exists, the SCM plugin provides CI status, review decision, and merge readiness to determine ci_failed, review_pending, changes_requested, approved, mergeable, or merged.
Default — Fall back to working (or preserve stuck/needs_input).

Event Bus

After each status transition, the lifecycle manager constructs a typed OrchestratorEvent and fans it out to all configured notifiers and reaction handlers. Events have four priority levels: urgent, action, warning, and info.

Priority is inferred by inferPriority() in lifecycle-manager.ts:

urgent — events containing stuck, needs_input, or errored
action — events containing approved, ready, merged, or completed
warning — events containing fail, changes_requested, or conflicts
info — everything else, including all summary.* events

`event.type`	Priority	When emitted
`session.spawned`	info	Session transitions out of `spawning`
`session.working`	info	Session enters `working`
`session.exited`	info	Agent process exits
`session.killed`	info	Session is killed
`session.idle`	info	Session enters `idle`
`session.stuck`	urgent	Session exceeds the `agent-stuck` threshold
`session.needs_input`	urgent	Agent is waiting on a permission prompt
`session.errored`	urgent	Session enters `errored`
`pr.created`	info	Session transitions to `pr_open`
`pr.updated`	info	PR title or state changes
`pr.merged`	action	PR is merged
`pr.closed`	info	PR is closed without merging
`ci.passing`	action	CI checks recover from failing to passing
`ci.failing`	warning	Session enters `ci_failed`
`ci.fix_sent`	info	CI fix message sent to agent
`ci.fix_failed`	warning	CI fix attempt failed
`review.pending`	info	Session enters `review_pending`
`review.approved`	action	Session enters `approved`
`review.changes_requested`	warning	Session enters `changes_requested`
`review.comments_sent`	info	Review comments forwarded to agent
`review.comments_unresolved`	warning	Unresolved review comments still present
`automated_review.found`	warning	Bot/automated review comments detected
`automated_review.fix_sent`	info	Automated review fix sent to agent
`merge.ready`	action	Session enters `mergeable`
`merge.conflicts`	warning	PR has merge conflicts
`merge.completed`	action	Session enters `merged`
`reaction.triggered`	info	A configured reaction fired
`reaction.escalated`	urgent	A reaction exceeded its retry/escalation threshold
`summary.all_complete`	info	All sessions have reached terminal statuses

For the webhook wire format, see Webhook Notifier. For configuring which events trigger automated reactions, see Reactions.

Poll Loop

The lifecycle manager runs a recurring poll loop. The default interval is 30 seconds (configurable via start(intervalMs)). Each cycle:

Lists all active sessions via sessionManager.list().
Batch-fetches PR enrichment data — a single GraphQL query retrieves CI status, review decision, and merge readiness for all open PRs at once, replacing N×3 individual REST calls with one request.
Checks each session concurrently — checkSession(session) calls determineStatus(), detects transitions, fires events, and evaluates reactions.
Prunes stale tracker entries for sessions that no longer exist.
Checks whether all sessions are complete and fires summary.all_complete if so (emitted once per batch, not repeatedly).

The dashboard then receives these state changes via SSE at a 5-second cadence. The poll loop and SSE cadence are independent — the dashboard may show state that is up to 5 seconds behind the last poll cycle.

Prompt Assembly (3 Layers)

Every agent session is launched with a composed prompt built by buildPrompt() in packages/core/src/prompt-builder.ts. The three layers are always concatenated in order:

Layer 1 — Base prompt (fixed)

BASE_AGENT_PROMPT provides identity, session lifecycle rules, git workflow guidance, and PR best practices. It is identical across all sessions. For projects without a remote repository, a trimmed variant (BASE_AGENT_PROMPT_NO_REPO) is used instead — it omits PR and CI instructions that do not apply.

Layer 2 — Config context (per-project)

Built from the project configuration. Includes:

Project name and ID
Repository (owner/repo)
Default branch
Tracker plugin name
Issue ID and issue body (when spawning from a tracker issue)
Reaction hints — lists which events will auto-send instructions back to the agent

Layer 3 — User rules (per-project)

Loaded from agentRules (inline string in agent-orchestrator.yaml) and/or agentRulesFile (path to a file, relative to the project root). Both are concatenated when present. If neither is provided, this layer is omitted.

An explicit userPrompt is appended after Layer 3 as "Additional Instructions" — it has the highest precedence and overrides anything above it.

Orchestrator rules

The orchestratorRules field in ProjectConfig is reserved for orchestrator-role sessions but is not applied by buildPrompt(). Orchestrator sessions receive a completely different prompt generated by generateOrchestratorPrompt() — see the next section.

Orchestrator Prompt

Orchestrator sessions do not receive the standard three-layer prompt. Instead, generateOrchestratorPrompt() in packages/core/src/orchestrator-prompt.ts builds a standalone prompt that provides:

Role rules — read-only investigations only; never own a PR; never use tmux send-keys directly; always use ao send / ao spawn to delegate.
Project info — name, repo, default branch, session prefix, local path, dashboard port.
Quick-start commands — ao status, ao spawn, ao batch-spawn, ao session claim-pr, ao send, ao open.
Available ao commands table — full reference adapted to whether a repo is configured.
Session management workflows — spawning, monitoring, PR takeover, investigation workflow, cleanup.
Dashboard info — URL and feature summary.
Automated reactions — lists configured reactions so the orchestrator knows what the system will handle automatically.
Common workflows — bulk issue processing, handling stuck agents, PR review flow, manual intervention.
Project-specific rules — content of orchestratorRules from ProjectConfig, appended last.

For a guide on per-role agents, see Per-Role Agents.

Activity Detection

Every agent plugin must implement getActivityState(session, readyThresholdMs?). This is the most critical method in the agent plugin — the dashboard, lifecycle manager, and stuck-detection all depend on it.

The 6 activity states

State	Meaning	When
`active`	Agent is processing — thinking, writing code, running tools	Activity within the last 30 seconds
`ready`	Agent finished its turn and is alive, waiting for input	30 seconds – 5 minutes since last activity
`idle`	Agent has been quiet for an extended period	More than 5 minutes since last activity (default threshold)
`waiting_input`	Agent is at a permission prompt or asking a question	Permission request detected
`blocked`	Agent hit an error it cannot recover from on its own	Error state detected
`exited`	Agent process is no longer running	`isProcessRunning` returns false

The `getActivityState` cascade

Every agent plugin must implement this cascade in order:

1. PROCESS CHECK
   └─ isProcessRunning() → false → return { state: "exited" }

2. ACTIONABLE STATES
   └─ checkActivityLogState() → waiting_input or blocked → return immediately

3. NATIVE SIGNAL (agent-specific)
   └─ session list API, native JSONL timestamp, etc.
   └─ classify by age: active (<30s) / ready (30s–threshold) / idle (>threshold)

4. JSONL ENTRY FALLBACK (mandatory)
   └─ getActivityFallbackState(activityResult, activeWindowMs, threshold)
   └─ age-based decay: active→ready→idle (never promotes)
   └─ staleness cap: waiting_input/blocked entries expire after 5 minutes

5. Return null only when there is genuinely no data at all

Step 4 (the JSONL entry fallback) is mandatory. Skipping it means getActivityState returns null whenever the native API fails — the dashboard shows no activity state and stuck-detection breaks for the entire session lifetime. This was a real bug in the OpenCode plugin.

Two JSONL patterns

Pattern	Used by	How it works
Agent-native JSONL	Claude Code, Codex	The agent writes its own JSONL with rich state entries (`permission_request`, `tool_call`, `error`, etc.). `getActivityState` reads the last entry and maps it to activity states.
AO activity JSONL	Aider, OpenCode, new agents	The agent implements `recordActivity`, which calls `recordTerminalActivity()` → `classifyTerminalActivity()` → `appendActivityEntry()` to write to `{workspacePath}/.ao/activity.jsonl`. `getActivityState` reads from this file.

Thresholds

Constant	Value	Purpose
`DEFAULT_ACTIVE_WINDOW_MS`	30 seconds	Activity newer than this is `active`; older is `ready`
`DEFAULT_READY_THRESHOLD_MS`	5 minutes	`ready` sessions older than this become `idle`
`ACTIVITY_INPUT_STALENESS_MS`	5 minutes	`waiting_input` / `blocked` JSONL entries expire after this duration

PATH Wrappers

When an agent creates a PR or switches a branch, AO needs to update the session metadata (e.g. write pr=https://... or branch=feat/INT-123) so the dashboard and lifecycle manager stay in sync. Two mechanisms exist:

Claude Code — PostToolUse hooks

Claude Code writes .claude/settings.json with a PostToolUse hook that fires after every gh pr create or git checkout command. The hook script calls update_ao_metadata directly.

All other agents — PATH wrappers

Agents without a native hook system (Codex, Aider, OpenCode, custom agents) use ~/.ao/bin/gh and ~/.ao/bin/git shell wrappers. These wrappers are installed to ~/.ao/bin/ by setupPathWrapperWorkspace(workspacePath) from packages/core/src/agent-workspace-hooks.ts. The function also writes session context to {workspacePath}/.ao/AGENTS.md (gitignored — does not touch tracked files).

The wrappers intercept:

gh pr create — captures the PR URL from stdout and writes pr=<url> and status=pr_open
gh pr merge — writes status=merged
git checkout -b <branch> / git switch -c <branch> — writes branch=<name>

All other commands pass through transparently via exec "$real_gh" "$@" or exec "$real_git" "$@".

For storage details, see Storage — PATH Wrappers.

Observability

The lifecycle manager, session manager, and plugin registry emit structured telemetry using project observers created by createProjectObserver(). Each running process writes a JSON snapshot to:

~/.agent-orchestrator/{hash}-observability/processes/{component}-{pid}.json

The hash is the first 12 characters of the SHA-256 of the config directory path. The {component} segment matches the internal observer name (e.g. lifecycle-manager, session-manager).

The dashboard's /api/observability route reads and merges these per-process snapshots to produce a live observability view.

Feedback reports from the agent's bug_report and improvement_suggestion tools are written as flat key-value files at:

~/.agent-orchestrator/{hash}-{projectId}/feedback-reports/*.kv

Data Flow Summary

agent-orchestrator.yaml ──► Config Loader (Zod) ──► Plugin Registry
                                                          │
                                          ┌───────────────┘
                                          │
                                          ▼
                                    Session Manager ◄─── ao spawn / ao session
                                          │
                                          ▼
                                  Lifecycle Manager ────► Events ────► Notifiers
                                          │                 │              │
                                          │           Reactions       Webhook
                                          │
                                          ▼
                                    Dashboard API
                                  (Next.js App Router)
                                          │
                              ┌───────────┴──────────────┐
                              │                           │
                              ▼                           ▼
                         SSE (5s)                  WebSocket (terminal)
                              │                           │
                              ▼                           ▼
                          React UI                   xterm.js