Build a production-grade governance layer for AI agent systems with cryptographic identity, trust-mediated delegation, policy enforcement, audit trails, LLM threat detection, and SIEM integration.
Modern AI agent systems present unique governance challenges. Unlike monolithic applications, agents can autonomously delegate tasks to other agents, invoke external tools, write to persistent memory, and access sensitive data. Without a comprehensive governance framework, organizations face:
A low-trust agent can delegate to higher-trust agents, bypassing security boundaries. Delegation chains compound risk as each hop loses visibility into the original intent.
Agents writing to memory without classification checks can persist restricted data in shared collections. No provenance tracking means you can't audit who wrote what or when.
Without breadth and depth limits, a single agent could spawn hundreds of child tasks, creating infinite loops or resource exhaustion attacks.
Traditional logging captures tool invocations but misses the context: which agent, under what authority, with what data classification, and what was the delegation chain?
LLMs are vulnerable to adversarial inputs that override system instructions. Without runtime detection, an attacker can manipulate agent behavior through crafted prompts.
Tool outputs can inadvertently expose governance internals, manifest structures, or system instructions. No post-tool scanning means leaks go undetected.
The governance framework addresses these challenges through six core subsystems working in concert: manifest identity, trust broker, policy engine, audit bus, memory governor, and LLM threat detector. Together they provide end-to-end governance from session start through tool execution to memory persistence.
The governance framework is implemented as a Claude Code plugin with hook-based enforcement. All components share a unified audit bus for event logging and a centralized policy file for configuration.

Load root agent manifest, generate session ID, initialize audit bus, emit MANIFEST_LOADED event, purge stale delegation registry entries, apply environment-aware policy overrides.
Scan input for prompt injection, classify tool risk tier, check manifest permissions, evaluate policy gate (allow/deny/human_gate), emit POLICY_CHECK or LLM_THREAT event.
Trust broker validates breadth/depth limits, checks classification ceiling, derives child manifest with parent constraints, issues delegation token, registers child manifest, emits DELEGATION_EVENT.
Memory governor classifies content, enforces agent ceiling, blocks restricted data, queues confidential writes for review, adds provenance tags (9 fields), emits MEMORY_WRITE event.
Output validator scans for system prompt leakage, sensitive data disclosure, governance artifacts, emits LLM_THREAT events for critical/high severity findings, records metrics.
Flush audit queue, export session events to JSONL, send session summary metrics, optionally archive old events per retention policy, deregister session manifests.
Agent identity documents define trust level, data classification, permitted tools, and delegation rules.
Manifests are YAML files stored in state/manifests/ with cryptographic signing for tamper evidence.
agent_id: security-analyst
manifest_id: gov-sec-analyst-v2
manifest_version: "2.1.0"
trust_level: 4 # 1-5 scale
data_classification: confidential # public | internal | confidential | restricted
permitted_tools:
- "Read"
- "Grep"
- "Bash"
- "mcp__*" # fnmatch wildcards supported
permitted_delegations:
- "pentest-agent"
- "compliance-*"
human_required: false
max_autonomy_depth: 3 # Delegation depth budget
max_delegation_count: 5 # Breadth limit per session
model_id: claude-opus-4-6
model_version: "4.6"
Parent ceiling enforcement is the security foundation: a child can never exceed its parent's trust level, data classification, or autonomy depth. This creates a monotonically decreasing privilege chain.
| Field | Type | Ceiling Rule |
|---|---|---|
trust_level |
Integer (1-5) | min(static, parent) |
data_classification |
Enum | Lower classification wins |
max_autonomy_depth |
Integer | min(static, parent - 1) |
permitted_tools |
List[pattern] | Union (additive) |
permitted_delegations |
List[pattern] | Union (additive) |
Manifests are signed with HMAC-SHA256 on load. The signing key is a 32-byte secret stored in
state/.signing-key with 0600 permissions. Signature verification happens during
manifest validation — tampered manifests fail validation and fall back to default-restrictive.
# Canonical manifest for signing (excludes volatile fields)
canonical = json.dumps(
{k: v for k, v in manifest.items()
if k not in {"manifest_signature", "manifest_hash",
"audit_session_id", "audit_parent_id"}},
sort_keys=True, separators=(",", ":"))
signature = hmac.new(signing_key, canonical.encode(), hashlib.sha256).hexdigest()
Mediates inter-agent delegation by validating breadth/depth limits, classification boundaries, and permitted targets. Issues delegation tokens for forensic linkage and registers resolved child manifests in a session-scoped registry.
Query audit bus for previous DELEGATION_EVENT events from this agent in this session. If count >= max_delegation_count, deny with TRUST_DENY event. Prevents agent from spawning unlimited child tasks.
Check parent's max_autonomy_depth. If <= 0, escalate to human gate with TRUST_DENY. Child receives parent.depth - 1, creating a monotonically decreasing budget.
Child's data_classification cannot exceed parent's. If target manifest requests higher classification, deny delegation. Prevents data exfiltration via upward delegation.
Child's trust_level cannot exceed parent's. Defense-in-depth check — should be caught by ceiling enforcement, but trust broker validates independently.
Target agent ID must match a pattern in parent's permitted_delegations list (supports fnmatch wildcards). Prevents delegation to arbitrary agents.
SHA-256 hash of session_id:parent_manifest_id:child_manifest_id:timestamp:nonce. Stored in DELEGATION_EVENT detail for forensic chain reconstruction.
A JSON file (state/manifest-registry.json) with file-lock-based concurrency control.
Stores active child manifests keyed by session_id:agent_id. TTL purging (1 hour)
removes stale entries. SessionStart hook purges on startup.
Classifies tools into risk tiers (exempt/standard/elevated) and enforces manifest permissions + conductor tier matrix to decide allow/deny/human_gate. Unknown tools default to elevated (fail toward scrutiny).
| Tier | Examples | Audit Strategy | Gate Logic |
|---|---|---|---|
| exempt | Read, Glob, Grep, TaskList | Async emit (nowait queue) | Always allow, skip policy checks |
| standard | Edit, Write, Task, Bash | Sync audit if tier >= STANDARD | Check manifest permissions, depth, human_required |
| elevated | memory_store, NotebookEdit, MCP tools | Always sync audit | Check + tier matrix (MAJOR + elevated = human_gate) |
Conductor plugin exports current task tier (TRIVIAL/MINOR/STANDARD/MAJOR) in conductor-state.json.
Policy engine loads this and applies tier-specific gates:
# MAJOR tier + elevated tool = always human gate
if tool_tier == "elevated" and conductor_tier == "MAJOR":
return PolicyDecision("human_gate",
"MAJOR task + elevated tool requires human approval")
# Synchronous audit for elevated or STANDARD+
if tool_tier == "elevated" or conductor_tier in ("STANDARD", "MAJOR"):
audit_bus.emit(EventType.POLICY_CHECK, manifest, tool_name=tool_name)
else:
audit_bus.emit_nowait(EventType.TOOL_INVOKED, manifest, tool_name=tool_name)
SQLite database with WAL mode for concurrent writes. Bounded async queue (256 depth) for low-risk events, synchronous writes for critical events, JSON buffer fallback for database failures. Supports JSONL export and retention-based archival.
CREATE TABLE audit_events (
id INTEGER PRIMARY KEY AUTOINCREMENT,
event_id TEXT UNIQUE NOT NULL,
timestamp TEXT NOT NULL,
audit_session_id TEXT NOT NULL,
event_type TEXT NOT NULL,
agent_id TEXT NOT NULL,
manifest_id TEXT,
manifest_version TEXT,
manifest_hash TEXT,
trust_level INTEGER,
data_classification TEXT,
autonomy_depth_remaining INTEGER,
tool_name TEXT,
task_id TEXT,
target_agent_id TEXT,
context_hash TEXT,
detail TEXT, -- JSON-encoded event-specific fields
outcome TEXT -- allow | deny | escalate | warn
);
CREATE INDEX idx_audit_session ON audit_events(audit_session_id);
CREATE INDEX idx_audit_timestamp ON audit_events(timestamp);
CREATE INDEX idx_audit_agent ON audit_events(agent_id);
CREATE INDEX idx_audit_type ON audit_events(event_type);
When SQLite writes fail (locked, disk full, corrupted), events are appended to
state/audit-buffer.jsonl. On next startup, buffer is renamed to
.replaying, events are replayed to database, then buffer is deleted.
This ensures zero event loss even during database failures.
Intercepts memory writes via PreToolUse hook on mcp__claude-memory__memory_store.
Classifies content using regex patterns, enforces agent classification ceiling, blocks restricted
data, queues confidential writes for review, adds 9-field provenance tags.
restricted:
- '\b(password|secret|api[_-]?key|private[_-]?key|token|credential)\s*[:=]\s*\S+'
- '\b\d{3}-\d{2}-\d{4}\b' # SSN pattern
- '-----BEGIN\s+(RSA|EC|PRIVATE)\s+KEY-----'
- '\b(bearer\s+[a-zA-Z0-9\-._~+/]+=*)\b'
confidential:
- '\b(internal[_-]?only|do[_-]?not[_-]?share|proprietary|confidential)\b'
- '\bCVE-\d{4}-\d{4,7}\b' # Vulnerability IDs
- '\b(salary|compensation|revenue|profit)\s*[:=$]'
- '\b(ssn|social[_-]?security|tax[_-]?id)\b'
internal:
- '\b(prod(uction)?|staging)\s+\b(server|host|endpoint|cluster)\b'
- '\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b' # IP addresses
- '\b[a-zA-Z0-9\-]+\.(internal|corp|local)\b'
| Classification | Agent Ceiling Check | Action | Audit Event |
|---|---|---|---|
| public | N/A | Allow with provenance tags | MEMORY_WRITE (allow) |
| internal | Agent must be internal+ | Allow with provenance tags | MEMORY_WRITE (allow) |
| confidential | Agent must be confidential+ | Queue for review, persist with pending_review tag | HUMAN_GATE (escalate) |
| restricted | Always exceeds ceiling | Block (do not persist) | POLICY_DENY (deny) |
provenance = {
"gov_manifest_id": manifest["manifest_id"],
"gov_agent_id": manifest["agent_id"],
"gov_manifest_version": manifest["manifest_version"],
"gov_manifest_hash": manifest["manifest_hash"],
"gov_trust_level": manifest["trust_level"],
"gov_classification": manifest["data_classification"],
"gov_session_id": manifest["audit_session_id"],
"gov_task_id": manifest.get("task_id"),
"gov_timestamp": datetime.now(timezone.utc).isoformat(),
}
These tags are merged into the metadata field of the memory_store tool input,
persisting alongside the content in Qdrant. This enables provenance-based memory queries
(e.g., "show me all memories written by security-analyst agent in session X").
Scans tool inputs for prompt injection attempts and tool outputs for system prompt leakage and sensitive data disclosure. Uses 30 prompt injection patterns and 24 system leakage patterns across three severity levels (critical/high/medium).
</system>, <|im_start|>, [INST]aWdub3JlIHByZXZpb3Vz (ignore previous)SYSTEM_LEAKAGE_PATTERNS = {
"critical": [
r'governance/lib/\w+\.py', # File paths
r'state/manifests/',
r'\bmanifest_hash\s*[:=]', # Manifest internals
r'\btrust_level\s*[:=]\s*\d+',
r'\bdata_classification\s*[:=]\s*(public|internal|confidential|restricted)',
r'\baudelegation_token\s*[:=]',
r'agent_id\s*:\s*\w+', # YAML structure
r'permitted_tools\s*:',
],
"high": [
r'\bgovernance\.lib\.', # Module references
r'\bpolicy_engine\b',
r'\btrust_broker\b',
r'gov-[a-z]+-[0-9a-f]{8}', # Session IDs
],
"medium": [
r'\bgovernance\s+plugin\b', # Generic terms
r'\bmanifest\s+registry\b',
],
}
| Severity | Input Scan Action | Output Scan Action | Audit Event |
|---|---|---|---|
| critical | Block tool execution | Block output, emit alert | LLM_THREAT (block) |
| high | Block tool execution | Block output, emit alert | LLM_THREAT (block) |
| medium | Log warning, allow | Log warning, allow | LLM_THREAT (warn) |
Sends governance security events to external monitoring systems via webhook (n8n compatible) and syslog (Wazuh compatible). Alerting is fail-open — failures never block governance operations.
Five event types trigger alerts: policy_deny, trust_deny,
circuit_break, human_gate, llm_threat. All other events
are audit-only.
{
"source": "governance",
"timestamp": "2026-03-17T14:23:45.123456Z",
"event_type": "llm_threat",
"agent_id": "security-analyst",
"tool_name": "Bash",
"outcome": "block",
"detail": {
"threat_type": "prompt_injection",
"severity": "critical",
"pattern_matched": "\\bignore\\s+all\\s+previous\\s+instructions\\b",
"scan_type": "input",
"detail": "Detected prompt injection pattern in content"
},
"session_id": "gov-sess-a4f8d2c1",
"manifest_id": "gov-sec-analyst-v2",
"trust_level": 4
}
<131>1 2026-03-17T14:23:45.123456Z governance claude-code - - - \
event_type=llm_threat agent_id=security-analyst tool_name=Bash \
outcome=block session_id=gov-sess-a4f8d2c1
PRI calculation: facility * 8 + severity. Facility defaults to local0 (16).
Severity is 3 (error) for deny/block, 4 (warning) for escalate/warn. Wazuh can parse these
via custom decoder rules.
Separate SQLite database (state/governance-metrics.db) tracks operational metrics:
agent success/failure rates, gate trigger frequency, confidence scores, delegation depth,
circuit breaker activations. Enables drift detection and performance monitoring.
CREATE TABLE governance_metrics (
id INTEGER PRIMARY KEY AUTOINCREMENT,
timestamp TEXT NOT NULL,
session_id TEXT NOT NULL,
metric_type TEXT NOT NULL, -- agent_success | gate_trigger | delegation_depth
agent_id TEXT,
value REAL NOT NULL,
metadata TEXT -- JSON for additional context
);
Detects environment from hostname or GOVERNANCE_ENV env var, loads base policy
from governance-policy.yaml, then applies environment-specific overrides from
state/env-overrides/<env>.yaml. Enables different retention policies,
gate enforcement modes, and alerting configs per environment.
# state/env-overrides/production.yaml
retention:
audit_events:
retention_days: 365 # Override from base 90 days
metrics:
retention_days: 730
alerting:
enabled: true
webhook:
enabled: true
url: "https://n8n.example.com/webhook/governance"
syslog:
enabled: true
host: "siem.example.com"
port: 514
state/governance-policy.yaml. No scattered config files.
Tool tiers, classification patterns, gate rules, tier matrix, and retention policies all in one place.
Every agent manifest MUST include: agent_id, manifest_id, manifest_version, trust_level (1-5), data_classification (public/internal/confidential/restricted), permitted_tools (list of fnmatch patterns), permitted_delegations (list), human_required (bool), max_autonomy_depth (int), max_delegation_count (int).
Manifests MUST be signed with HMAC-SHA256 using a 32-byte signing key stored in
state/.signing-key. Signatures MUST be verified on manifest load. Tampered
manifests MUST fail validation and fall back to default-restrictive manifest.
Every manifest MUST include a SHA-256 hash of canonicalized JSON content (excluding manifest_signature and manifest_hash fields). Hash MUST be recomputed on each load to detect tampering.
When resolving a child manifest with parent context, the system MUST enforce: trust_level = min(static, parent), data_classification = lower(static, parent), max_autonomy_depth = min(static, parent - 1). Child capabilities MUST be monotonically decreasing along delegation chains.
Manifests MUST track model_id and model_version fields. SessionStart hook MUST populate these from runtime context. Audit events MUST include manifest_version for model drift analysis.
Trust broker MUST query audit bus for DELEGATION_EVENT count from source agent in current session. If count >= max_delegation_count, MUST deny with TRUST_DENY event and reason "delegation_count_exceeded".
Trust broker MUST check parent max_autonomy_depth. If <= 0, MUST escalate to human_gate with TRUST_DENY event. Child MUST receive parent.depth - 1 in resolved manifest.
Trust broker MUST deny delegation if target data_classification exceeds parent data_classification (using CLASSIFICATION_ORDER = ["public", "internal", "confidential", "restricted"]). MUST emit TRUST_DENY with reason "classification_boundary_violation".
Trust broker MUST deny delegation if target trust_level exceeds parent trust_level. MUST emit TRUST_DENY with reason "trust_escalation_attempt".
Trust broker MUST check if target agent_id matches any pattern in parent's permitted_delegations list (using fnmatch). If no match, MUST deny with TRUST_DENY and reason "delegation_target_not_permitted".
Trust broker MUST issue a unique delegation token for each approved delegation, computed as SHA-256(session_id:parent_manifest_id:target_manifest_id:timestamp:nonce)[:24]. Token MUST be stored in DELEGATION_EVENT detail field.
Trust broker MUST maintain a session-scoped registry of active child manifests at
state/manifest-registry.json. Registry operations MUST use file locking
(fcntl.flock) for concurrency safety. TTL purging (1 hour) MUST remove stale entries.
Policy engine MUST classify tools into exempt/standard/elevated tiers using
state/tool-tiers.yaml. Unknown tools MUST default to "elevated" tier
(fail toward scrutiny).
Exempt tools (Read, Glob, Grep, TaskList, TaskGet) MUST always be allowed with async audit emission (nowait queue). Policy checks MUST be skipped for exempt tools.
For standard and elevated tools, policy engine MUST verify tool_name matches a pattern in manifest.permitted_tools (using fnmatch). If no match, MUST deny with POLICY_DENY and reason "tool_not_permitted".
Policy engine MUST check manifest.max_autonomy_depth. If <= 0, MUST emit CIRCUIT_BREAK and return human_gate decision with reason "autonomy_depth_exhausted".
Policy engine MUST check manifest.human_required. If true, MUST emit HUMAN_GATE and return human_gate decision with reason "agent manifest requires human approval".
Policy engine MUST load conductor_tier from conductor-state.json. If tier
is "MAJOR" and tool tier is "elevated", MUST emit HUMAN_GATE and return human_gate
decision with reason "MAJOR task + elevated tool requires human approval".
Policy engine MUST emit synchronous POLICY_CHECK events for elevated tools or when conductor_tier is STANDARD or MAJOR. All other allowed tools MUST use async emit_nowait.
Audit bus MUST use SQLite with WAL (Write-Ahead Logging) mode and NORMAL synchronous
mode for concurrent write safety and performance. Database MUST be at
state/audit.db.
Audit bus MUST support 15 event types: TOOL_INVOKED, DELEGATION_EVENT, CONTEXT_PRESSURE, MEMORY_WRITE, MEMORY_READ, POLICY_CHECK, POLICY_DENY, HUMAN_GATE, MANIFEST_LOADED, MANIFEST_DERIVED, TRUST_CHECK, TRUST_DENY, CIRCUIT_BREAK, BUFFER_REPLAY, LLM_THREAT.
Audit bus MUST implement a bounded queue (256 depth) with single daemon worker thread for async event emission. If queue is full, MUST fall back to synchronous emit.
When SQLite writes fail, audit bus MUST append events to state/audit-buffer.jsonl.
On next SessionStart, MUST replay buffered events to database and delete buffer file.
Every audit event MUST include: event_id (UUID), timestamp (ISO 8601), audit_session_id, event_type, agent_id, manifest_id, manifest_version, manifest_hash, trust_level, data_classification, autonomy_depth_remaining, tool_name, task_id, target_agent_id, context_hash, detail (JSON), outcome (allow/deny/escalate/warn).
Audit bus MUST support retention-based purging with configurable retention_days
(default 90). Old events MUST be archived to JSONL before deletion. Archive path
defaults to state/archive/audit-archive-YYYYMMDD.jsonl.
Memory governor MUST classify content using regex patterns from
state/classification-patterns.yaml. MUST scan restricted patterns first,
then confidential, then internal. Highest match wins. Unmatched content defaults to "public".
Memory governor MUST deny writes if content classification exceeds agent data_classification ceiling (using CLASSIFICATION_ORDER). MUST emit POLICY_DENY with reason "classification_ceiling_exceeded".
Memory governor MUST block (not persist) all restricted content with POLICY_DENY event and reason "restricted_content_blocked". Human approval via /governance-review required before storage.
Memory governor MUST allow confidential writes to proceed but tag with gov_approval_status="pending_review". MUST emit HUMAN_GATE event with reason "confidential_write_queued".
Memory governor MUST add 9 provenance fields to all memory writes: gov_manifest_id, gov_agent_id, gov_manifest_version, gov_manifest_hash, gov_trust_level, gov_classification, gov_session_id, gov_task_id, gov_timestamp.
LLM threat detector MUST scan all Task, Bash, and Skill tool inputs for prompt injection patterns. MUST check critical patterns first (30 total), then high, then medium. First match determines severity.
LLM threat detector MUST scan Write, Edit, Bash, NotebookEdit outputs for system prompt leakage (24 patterns) and sensitive data disclosure (using classification patterns). MUST emit LLM_THREAT events for all detections.
For critical/high severity threats, MUST block tool execution (input scan) or block output (output scan) and emit LLM_THREAT with outcome="block". For medium severity, MUST log warning with outcome="warn" and allow execution.
Threat detector MUST address OWASP LLM01 (Prompt Injection), LLM06 (Sensitive Information Disclosure), and LLM09 (Overreliance) through pattern-based detection and output validation.
Alerting service MUST send alerts for: policy_deny, trust_deny, circuit_break, human_gate, llm_threat. All other events are audit-only.
Webhook alerts MUST POST JSON with fields: source="governance", timestamp, event_type, agent_id, tool_name, outcome, detail (parsed from JSON), session_id, manifest_id, trust_level. Timeout MUST be 5 seconds.
Syslog alerts MUST use RFC 5424 format with facility=local0 (16) and severity=3 (error) for deny/block, severity=4 (warning) for escalate/warn. MUST send via UDP.
Alerting failures (network timeout, unreachable host) MUST NOT block governance operations. MUST silently continue after logging error.
Metrics collector MUST use separate SQLite database at state/governance-metrics.db
with schema: timestamp, session_id, metric_type, agent_id, value (float), metadata (JSON).
MUST support metric types: agent_success, gate_trigger, delegation_depth, circuit_break, confidence_score. Additional types MAY be added without schema migration.
Metrics collector MUST provide session summary aggregation: count, average, and max value per metric_type for a given session_id.
MUST detect environment from GOVERNANCE_ENV env var, falling back to hostname detection. MUST support: local, staging, production, c2. Default to "local" if unresolvable.
All governance components MUST reference state/governance-policy.yaml for:
tool_tiers, classification_patterns, gate rules, tier_gate_matrix, retention policies.
NO scattered config files.
MUST support environment-specific policy overrides at
state/env-overrides/<env>.yaml. Overrides MUST deep-merge into base
policy. Override values win on conflicts.
MUST load root agent manifest, generate session ID, initialize audit bus, emit MANIFEST_LOADED event, purge stale registry entries, apply retention policy archival.
MUST scan for prompt injection (threat detector), classify tool tier (policy engine), check manifest permissions, evaluate delegation (trust broker for Task tool), emit appropriate audit events.
MUST scan output for leakage (threat detector on Write/Edit/Bash/NotebookEdit), deregister completed tasks (trust broker), record metrics, emit LLM_THREAT events if detected.
All hooks MUST have timeout limits: SessionStart 10s, PreToolUse 10s, PostToolUse 10s. Timeout failures MUST fail-open (allow operation to proceed).
Copy this prompt and give it to Claude Code in a fresh project directory. It will generate the entire governance framework with all components, hooks, config files, and test fixtures.
I need you to build a complete Agent Governance & Trust Framework for Claude Code as a plugin.
## Architecture
Create a plugin at `governance/` with the following structure:
```
governance/
├── __init__.py
├── lib/
│ ├── __init__.py
│ ├── manifest.py # Agent identity, signing, resolution
│ ├── trust_broker.py # Delegation mediation, manifest registry
│ ├── policy_engine.py # Tool classification, gate logic
│ ├── audit_bus.py # SQLite event store with WAL
│ ├── memory_governor.py # Content classification, provenance
│ ├── llm_threat_detector.py # OWASP LLM Top 10 patterns
│ ├── alerting.py # Webhook + syslog
│ ├── metrics_collector.py # Performance monitoring
│ ├── env_policy.py # Environment detection
│ └── manifest_signing.py # HMAC-SHA256 signing
├── hooks/
│ ├── hooks.json # Hook registration
│ ├── session_start.py
│ ├── pre_tool_check.py
│ ├── approval_gate_hook.py
│ ├── post_task_cleanup.py
│ └── output_validator.py
├── state/
│ ├── governance-policy.yaml
│ ├── tool-tiers.yaml
│ ├── classification-patterns.yaml
│ ├── manifests/
│ │ ├── root.yaml
│ │ └── security-analyst.yaml
│ └── env-overrides/
│ └── production.yaml
└── tests/
└── test_governance.py
```
## Core Components
### 1. Manifest System (lib/manifest.py)
Implement:
- Load static YAML manifests from state/manifests/
- HMAC-SHA256 signing with key at state/.signing-key
- SHA-256 content hash for tamper detection
- Parent ceiling enforcement: trust_level, data_classification, autonomy_depth
- Derive restrictive child manifests for unknown agents
- Validation against required fields
Required manifest fields:
- agent_id, manifest_id, manifest_version
- trust_level (1-5), data_classification (public/internal/confidential/restricted)
- permitted_tools (list of fnmatch patterns)
- permitted_delegations (list)
- human_required (bool)
- max_autonomy_depth (int), max_delegation_count (int)
- model_id, model_version
### 2. Trust Broker (lib/trust_broker.py)
Implement:
- Delegation validation: breadth limit (query audit bus for session count)
- Depth budget check (parent.depth - 1)
- Classification boundary (child cannot exceed parent)
- Trust escalation prevention
- Permitted delegation target check (fnmatch)
- Delegation token generation (SHA-256 hash)
- ManifestRegistry with file locking (fcntl), TTL purging (1 hour)
### 3. Policy Engine (lib/policy_engine.py)
Implement:
- Tool classification: exempt/standard/elevated from tool-tiers.yaml
- Unknown tools default to "elevated"
- Manifest permission check (fnmatch)
- Autonomy depth gate
- Human_required gate
- Conductor tier matrix: MAJOR + elevated = human_gate
- Sync audit for elevated/STANDARD+, async for others
### 4. Audit Bus (lib/audit_bus.py)
Implement:
- SQLite with WAL mode at state/audit.db
- 15 event types (EventType enum)
- Bounded queue (256 depth) with daemon worker
- Buffer fallback to state/audit-buffer.jsonl
- Buffer replay on SessionStart
- Schema with all required fields + indexes
- query() method with filterable columns
- export_jsonl() for session export
- purge_old_events() with archival to state/archive/
- Alerting callback integration
- Metrics callback integration
### 5. Memory Governor (lib/memory_governor.py)
Implement:
- Content classification via regex from classification-patterns.yaml
- Scan restricted first, then confidential, then internal
- Ceiling check: block if content classification exceeds agent ceiling
- Restricted = block (emit POLICY_DENY)
- Confidential = queue with pending_review tag (emit HUMAN_GATE)
- Public/internal = allow with provenance tags
- 9 provenance fields: gov_manifest_id, gov_agent_id, gov_manifest_version,
gov_manifest_hash, gov_trust_level, gov_classification, gov_session_id,
gov_task_id, gov_timestamp
### 6. LLM Threat Detector (lib/llm_threat_detector.py)
Implement:
- 30 prompt injection patterns (critical/high/medium)
- 24 system leakage patterns (critical/high/medium)
- Input scanning for Task, Bash, Skill tools
- Output scanning for Write, Edit, Bash, NotebookEdit
- Sensitive disclosure using classification patterns
- Emit LLM_THREAT events
- Block on critical/high, warn on medium
Patterns:
- Critical injection: delimiter injection, direct override, role hijacking
- High injection: indirect override, base64 encoding, instruction leakage
- Critical leakage: governance file paths, manifest internals, YAML structure
- High leakage: module references, session IDs
### 7. Alerting (lib/alerting.py)
Implement:
- Webhook POST (n8n compatible) with 5s timeout
- Syslog UDP RFC 5424 format (Wazuh compatible)
- Alert on: policy_deny, trust_deny, circuit_break, human_gate, llm_threat
- Fail-open design (never block on alerting failure)
- Config from governance-policy.yaml alerting section
### 8. Metrics Collector (lib/metrics_collector.py)
Implement:
- Separate SQLite at state/governance-metrics.db
- Schema: timestamp, session_id, metric_type, agent_id, value, metadata
- record() method (never raises)
- query_metrics() with filters
- get_session_summary() with count/avg/max per metric_type
### 9. Environment Policy (lib/env_policy.py)
Implement:
- detect_environment() from GOVERNANCE_ENV or hostname
- load_governance_policy() with deep-merge of env overrides
- Deep merge function (override wins on conflicts)
### 10. Manifest Signing (lib/manifest_signing.py)
Implement:
- generate_signing_key() → 32 bytes to state/.signing-key (0600 perms)
- load_signing_key() → bytes or None
- sign_manifest() → HMAC-SHA256 hex string
- verify_manifest_signature() → bool
- _canonicalize_manifest() → sorted JSON excluding volatile fields
## Hooks
### hooks.json
Register:
- SessionStart: session_start.py (timeout 10s)
- PreToolUse: approval_gate_hook.py, pre_tool_check.py (timeout 10s each)
- PostToolUse: post_task_cleanup.py (Task matcher), output_validator.py (Write|Edit|Bash matcher)
### session_start.py
1. Load root manifest (agent_id from env or default "root")
2. Generate session_id (gov-sess-{8 hex chars})
3. Initialize audit bus with alerting and metrics callbacks
4. Emit MANIFEST_LOADED event
5. Purge stale registry entries
6. Run retention policy archival if configured
7. Print session summary to stderr
### pre_tool_check.py
1. Load manifest from registry or resolve
2. If Task tool: trust_broker.evaluate_delegation()
3. LLM threat detector: scan_input()
4. Policy engine: evaluate()
5. If decision is deny: exit 1 with error message
6. If decision is human_gate: print gate prompt to stderr, wait for approval
7. If allow: exit 0
### approval_gate_hook.py
Check for external communication gate and data classification gate from
governance-policy.yaml gates section. Inject approval prompts per skill config.
### post_task_cleanup.py
For Task tool completion: deregister child manifest from registry.
### output_validator.py
1. Read tool output from stdin JSON
2. LLM threat detector: scan_output()
3. If critical/high threat detected: emit LLM_THREAT, exit 1
4. If medium: emit warning, exit 0
5. Record metrics
## Configuration Files
### state/governance-policy.yaml
Include sections:
- version, effective_from
- tool_tiers: exempt, standard, elevated, elevated_patterns
- classification_patterns: restricted, confidential, internal (regex lists)
- gates: external_communication, data_classification
- tier_gate_matrix: TRIVIAL/MINOR/STANDARD/MAJOR with gate modes
- retention: audit_events, metrics (retention_days, archive_path)
- environments: default, env_var
- alerting: enabled, webhook (url, enabled), syslog (host, port, facility, enabled)
### state/tool-tiers.yaml
List exempt, standard, elevated tools and elevated_patterns.
### state/classification-patterns.yaml
Regex patterns for restricted, confidential, internal classifications.
### state/manifests/root.yaml
Root agent with trust_level=5, data_classification=restricted, full permissions.
### state/manifests/security-analyst.yaml
Example agent with trust_level=4, confidential, limited tools and delegations.
## Tests
Write comprehensive pytest tests in tests/test_governance.py covering:
- Manifest loading, signing, ceiling enforcement
- Trust broker delegation validation (breadth, depth, classification)
- Policy engine tool classification and gate logic
- Audit bus emit, query, buffer fallback, retention
- Memory governor classification, ceiling, provenance
- LLM threat detector injection and leakage patterns
- Alerting webhook and syslog (mocked)
- Metrics recording and session summary
- Environment detection and policy merging
## Implementation Guidelines
- Use only stdlib (no external dependencies except PyYAML which is in Claude Code)
- All paths relative to plugin root via GOVERNANCE_PLUGIN_ROOT env var
- Never raise exceptions from audit, alerting, metrics (fail-open)
- Use WAL mode for all SQLite databases
- File locking (fcntl) for manifest registry
- Hooks read stdin JSON, write stderr for messages, exit 0/1 for allow/deny
- All timestamps in ISO 8601 UTC format
- All hashes in hexadecimal lowercase
Build the complete framework with all files. Make it production-ready.
Hook failures fail-open: If a hook times out or crashes, the tool execution proceeds. Rationale: Governance failures should not lock up the entire system. Users can still make progress while governance is degraded.
Trust/policy failures fail-closed: If trust broker or policy engine denies a delegation or tool, the operation is blocked. Rationale: Privilege escalation and unauthorized tool use are security-critical failures that must prevent execution.
Audit/alerting failures fail-open: If audit bus or alerting service fails, the operation continues. Events fall back to buffer. Rationale: Observability failures should not block functional operations.
Rationale: Governance framework must work on single-machine deployments without infrastructure dependencies. SQLite provides ACID guarantees, concurrent reads (via WAL), and zero operational overhead. For distributed deployments, audit events can be exported to external SIEM via webhook/syslog.
Trade-offs: SQLite limits to ~1M events before performance degradation. Retention policy archival mitigates this. Cannot query across multiple agent hosts without centralized aggregation. SIEM integration provides distributed visibility.
Rationale: HMAC-SHA256 is in Python stdlib (hashlib + hmac). Ed25519 requires PyNaCl or cryptography library, adding external dependencies. Manifest signing provides tamper evidence, not authenticity proof (shared secret vs asymmetric keys). HMAC is sufficient for this use case.
Trade-offs: HMAC cannot prove manifest authorship (anyone with the key can sign). Ed25519 would enable signature verification without access to signing key. For multi-party governance (untrusted manifest sources), Ed25519 would be preferred. For single-operator systems, HMAC is simpler.
Rationale: Regex patterns are deterministic, auditable, and have zero inference latency. ML models (e.g., prompt injection classifiers) require GPU, model hosting, and introduce non-deterministic false positives. For governance hooks with 10s timeout, regex is the only viable approach.
Trade-offs: Regex patterns can be evaded (obfuscation, encoding, paraphrasing). ML models adapt to novel attacks. Future enhancement: add ML-based scanning as async post-analysis (emit events, train on patterns, update regex rules).
Pattern Maintenance: OWASP LLM Top 10 patterns require periodic updates. Governance policy YAML allows operators to add custom patterns without code changes.
Rationale: Hooks integrate natively with Claude Code's tool execution lifecycle. No need to wrap every tool or modify core runtime. Hooks receive full tool context (name, input, output) and can block or transform execution.
Trade-offs: Hooks run in separate processes (command-type hooks), adding 10-50ms overhead per tool invocation. In-process hooks (if Claude Code supported Python plugins) would be faster but less isolated. Current design prioritizes safety (hook crash doesn't crash agent) over performance.
Rationale: Not all governance violations warrant hard blocks. TRIVIAL tasks can skip most gates. MINOR tasks get advisory prompts. STANDARD tasks get blocking gates on critical operations. MAJOR tasks require human approval for all elevated tools.
Configurability: Tier-to-gate matrix in governance-policy.yaml allows operators to customize enforcement per task tier. Example: PRE_RELEASE gate is "skip" for TRIVIAL/MINOR, "advisory" for STANDARD, "blocking" for MAJOR.
Human-in-the-Loop: human_gate decision prints approval prompt to stderr and waits for operator input. Enables oversight without breaking agent autonomy.
Rationale: CLASSIFICATION_ORDER = ["public", "internal", "confidential", "restricted"] is a total order (linear hierarchy). Simpler to reason about and enforce than a lattice model (e.g., medical vs financial vs PII classifications that don't strictly order).
Trade-offs: Linear order doesn't capture orthogonal sensitivity dimensions. For complex data governance (HIPAA + PCI-DSS + GDPR), would need multi-dimensional classification with intersection logic. Current design optimizes for 80% use case (corporate data tiers).
Rationale: Each agent session gets isolated delegation tracking. session_id scoping prevents breadth limit circumvention via session restarts. TTL purging (1 hour) removes stale entries without manual cleanup.
Trade-offs: Session restart resets delegation count. Malicious agent could restart session to bypass breadth limit. Mitigation: SessionStart hook logs all manifest loads for forensic analysis. Future enhancement: persistent delegation ledger across sessions.
Rationale: Embedding 9 provenance fields in memory_store metadata (merged into Qdrant payload) keeps data + provenance together. No separate index to maintain or sync. Memory queries can filter by provenance (e.g., "gov_trust_level >= 4").
Trade-offs: Increases payload size by ~200 bytes per memory chunk. For 1M chunks, adds ~200MB storage. Alternative: separate provenance collection with foreign key links. Current design optimizes for query simplicity over storage efficiency.
Rationale: All governance components reference one file. Eliminates scattered config files, version skew, and partial updates. Git-tracked policy enables version control, diffs, and rollback. Environment overrides provide per-env customization without duplicating base policy.
Trade-offs: Single file becomes large (current: ~120 lines, projected: 500+ with all patterns). YAML parsing on every SessionStart adds ~10ms overhead. Alternative: compiled policy cache (pickle) with invalidation on YAML mtime change. Current design optimizes for simplicity.
Governance framework is a Claude Code plugin at ~/.claude/plugins/local/governance
(symlink from project directory). Plugin manifest declares hooks:
{
"description": "Governance enforcement hooks",
"hooks": {
"SessionStart": [{
"hooks": [{
"type": "command",
"command": "python3 ${CLAUDE_PLUGIN_ROOT}/hooks/session_start.py",
"timeout": 10
}]
}],
"PreToolUse": [{
"hooks": [{
"type": "command",
"command": "python3 ${CLAUDE_PLUGIN_ROOT}/hooks/pre_tool_check.py",
"timeout": 10
}]
}],
"PostToolUse": [{
"matcher": { "tool_name": "Write|Edit|Bash|NotebookEdit" },
"hooks": [{
"type": "command",
"command": "python3 ${CLAUDE_PLUGIN_ROOT}/hooks/output_validator.py",
"timeout": 10
}]
}]
}
}
Hooks receive tool context via stdin JSON and control execution via exit codes. Exit 0 = allow, exit 1 = deny. Stderr messages shown to user.
Every agent receives a manifest at delegation time. Conductor plugin exports current task tier
to conductor-state.json for policy engine consumption:
{
"active_task": "task-a4f8d2c1",
"agent_stack": ["root", "security-analyst"],
"governance": {
"conductor_tier": "STANDARD",
"intent": {
"sensitivity": "medium",
"external_visibility": false
}
}
}
Trust broker reads this file during Task tool validation to determine if MAJOR + elevated tool requires human gate.
Memory governor intercepts mcp__claude-memory__memory_store tool via PreToolUse hook.
Provenance tags are merged into tool input before MCP tool execution:
# Original tool input
{
"content": "Agent completed security analysis...",
"collection": "claude_memories",
"metadata": {}
}
# After memory governor
{
"content": "Agent completed security analysis...",
"collection": "claude_memories",
"metadata": {
"gov_manifest_id": "gov-sec-analyst-v2",
"gov_agent_id": "security-analyst",
"gov_trust_level": 4,
"gov_classification": "confidential",
"gov_session_id": "gov-sess-a4f8d2c1",
"gov_timestamp": "2026-03-17T14:23:45.123456Z"
}
}
Memory system (Qdrant) stores these tags in the payload. Queries can filter by provenance:
memory_recall("security analysis",
filter={"gov_trust_level": {"$gte": 4}})
Governance events (especially CONTEXT_PRESSURE, CIRCUIT_BREAK) inform context management decisions. When autonomy depth is exhausted, context manager can trigger human escalation or task decomposition instead of silent failure.
Audit bus sends events to external SIEM via webhook or syslog. Example n8n workflow:
# n8n webhook trigger receives governance events
# Filter: event_type in [policy_deny, trust_deny, llm_threat]
# Route to Slack/PagerDuty/Wazuh based on severity
Webhook → Filter → Switch (severity):
- critical → PagerDuty incident
- high → Slack security channel
- medium → Wazuh log aggregation
Metrics database enables trend analysis and anomaly detection. Example queries:
# Agent success rate over last 30 days
SELECT agent_id,
AVG(value) as success_rate,
COUNT(*) as invocations
FROM governance_metrics
WHERE metric_type = 'agent_success'
AND timestamp > datetime('now', '-30 days')
GROUP BY agent_id
ORDER BY success_rate ASC;
# Gate trigger frequency by agent
SELECT agent_id,
COUNT(*) as gate_count
FROM governance_metrics
WHERE metric_type = 'gate_trigger'
AND timestamp > datetime('now', '-7 days')
GROUP BY agent_id
ORDER BY gate_count DESC;
Governance framework development uses standard Python tooling:
# Install in development mode
cd ~/.claude/plugins/local/governance
pip install -e .
# Run tests
pytest tests/ -v
# Lint
ruff check governance/
# Type check
mypy governance/lib/
# Generate signing key
python3 -c "from governance.lib.manifest_signing import generate_signing_key; \
generate_signing_key()"
# Export session audit trail
python3 -c "from governance.lib.audit_bus import AuditBus; \
from pathlib import Path; \
bus = AuditBus(Path('state/audit.db'), Path('state/audit-buffer.jsonl')); \
bus.export_jsonl('gov-sess-a4f8d2c1', Path('audit-export.jsonl'))"