GR

Guardrails

Validation & Safety OpenAI SDK-inspired

Parallel validation system that applies input, output, and continuous guardrails to ensure agent outputs meet quality, security, and compliance standards.

Overview

The Guardrails Agent implements a comprehensive validation framework inspired by OpenAI's Agents SDK guardrails pattern. It runs validation checks in parallel with agent execution to:

Guardrail Types

Input Guardrails

Run before agent processing:

Output Guardrails

Run after agent produces output:

Continuous Guardrails

Monitor throughout execution:

Validation Results

ResultActionExample
PASSContinue executionAll checks passed
WARNLog and continueMinor style issue
BLOCKHalt and reportSecurity violation detected
TRIPWIREImmediate stop + alertPrompt injection attempt

Configuration

# guardrails.yaml
input_guardrails:
  - name: prompt_injection
    enabled: true
    action: tripwire
    
  - name: scope_validation
    enabled: true
    allowed_domains:
      - code_generation
      - code_review
      - documentation
    action: block

output_guardrails:
  - name: pii_detection
    enabled: true
    patterns:
      - email
      - phone
      - ssn
    action: block
    
  - name: code_safety
    enabled: true
    forbidden_patterns:
      - "eval("
      - "exec("
      - "rm -rf"
    action: block

continuous_guardrails:
  - name: token_limit
    max_tokens: 50000
    action: warn_then_block
    
  - name: execution_time
    max_seconds: 300
    action: block

Commands

/guardrails status

/guardrails status

Active Guardrails:
Input:  4 enabled (prompt_injection, scope, rate_limit, content)
Output: 3 enabled (pii, code_safety, format)
Continuous: 2 enabled (tokens, time)

Recent Events:
- 2 min ago: PASS - Input validation for code review request
- 5 min ago: WARN - Output contained commented credentials (redacted)
- 12 min ago: PASS - All checks passed for documentation task

/guardrails test

/guardrails test "Write code to delete all files"

Testing input guardrails...

Result: BLOCK
Triggered: scope_validation
Reason: Destructive file operations not in allowed scope
Recommendation: Rephrase request or expand allowed_domains

Integration Points

SystemIntegration
All AgentsInput/output validation wrapper
ConductorPhase transition validation
CISOSecurity-specific guardrail rules
TracingGuardrail events logged to traces