Guardrails
Validation & Safety OpenAI SDK-inspiredParallel validation system that applies input, output, and continuous guardrails to ensure agent outputs meet quality, security, and compliance standards.
Overview
The Guardrails Agent implements a comprehensive validation framework inspired by OpenAI's Agents SDK guardrails pattern. It runs validation checks in parallel with agent execution to:
- Validate Inputs: Screen requests before processing begins
- Validate Outputs: Check results before delivery to user
- Continuous Monitoring: Watch for violations during execution
- Policy Enforcement: Apply configurable rules consistently
Guardrail Types
Input Guardrails
Run before agent processing:
- Prompt Injection Detection: Identifies manipulation attempts
- Scope Validation: Ensures request is within agent capabilities
- Rate Limiting: Prevents resource exhaustion
- Content Screening: Filters inappropriate requests
Output Guardrails
Run after agent produces output:
- PII Detection: Catches accidental data exposure
- Code Safety: Scans for dangerous patterns
- Factual Grounding: Verifies claims against sources
- Format Validation: Ensures output meets schema
Continuous Guardrails
Monitor throughout execution:
- Resource Usage: Track token/time consumption
- Scope Creep: Detect drift from original task
- Error Patterns: Identify repeated failures
Validation Results
| Result | Action | Example |
|---|---|---|
| PASS | Continue execution | All checks passed |
| WARN | Log and continue | Minor style issue |
| BLOCK | Halt and report | Security violation detected |
| TRIPWIRE | Immediate stop + alert | Prompt injection attempt |
Configuration
# guardrails.yaml
input_guardrails:
- name: prompt_injection
enabled: true
action: tripwire
- name: scope_validation
enabled: true
allowed_domains:
- code_generation
- code_review
- documentation
action: block
output_guardrails:
- name: pii_detection
enabled: true
patterns:
- email
- phone
- ssn
action: block
- name: code_safety
enabled: true
forbidden_patterns:
- "eval("
- "exec("
- "rm -rf"
action: block
continuous_guardrails:
- name: token_limit
max_tokens: 50000
action: warn_then_block
- name: execution_time
max_seconds: 300
action: block
Commands
/guardrails status
/guardrails status Active Guardrails: Input: 4 enabled (prompt_injection, scope, rate_limit, content) Output: 3 enabled (pii, code_safety, format) Continuous: 2 enabled (tokens, time) Recent Events: - 2 min ago: PASS - Input validation for code review request - 5 min ago: WARN - Output contained commented credentials (redacted) - 12 min ago: PASS - All checks passed for documentation task
/guardrails test
/guardrails test "Write code to delete all files" Testing input guardrails... Result: BLOCK Triggered: scope_validation Reason: Destructive file operations not in allowed scope Recommendation: Rephrase request or expand allowed_domains
Integration Points
| System | Integration |
|---|---|
| All Agents | Input/output validation wrapper |
| Conductor | Phase transition validation |
| CISO | Security-specific guardrail rules |
| Tracing | Guardrail events logged to traces |