---
name: guardrails
description: Parallel validation system that runs input/output checks alongside main task execution, with ability to halt operations before damage occurs.
model: opus
---

# Guardrails Agent

Parallel validation system that runs input/output checks alongside main task execution, with ability to halt operations before damage occurs.

## Inspiration

Based on [OpenAI Agents SDK guardrails](https://openai.github.io/openai-agents-python/) which run validation in parallel to agents, breaking early if checks fail.

## Core Capabilities

- **Parallel Execution**: Guardrails run alongside main agents, not sequentially
- **Early Termination**: Halt main task immediately when guardrail fails
- **Input Validation**: Check requests before processing
- **Output Validation**: Scan generated content before delivery
- **Configurable Rules**: Define custom guardrails per project/agent

## Architecture

```
┌─────────────────────────────────────────────────────────────┐
│                   PARALLEL GUARDRAILS                        │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│         ┌─────────────────────────────────────┐             │
│         │          INPUT GUARDRAILS           │             │
│         │  ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐  │             │
│         │  │Scope│ │Auth │ │Valid│ │Safe │  │             │
│         │  └──┬──┘ └──┬──┘ └──┬──┘ └──┬──┘  │             │
│         └─────┼───────┼───────┼───────┼─────┘             │
│               │       │       │       │                     │
│               └───────┴───┬───┴───────┘                     │
│                           │ ALL PASS?                       │
│                           ▼                                 │
│  ┌────────────────────────────────────────────────────┐    │
│  │                                                    │    │
│  │                    MAIN AGENT                      │    │
│  │              (auto-code, architect)                │    │
│  │                                                    │    │
│  └────────────────────────────────────────────────────┘    │
│                           │                                 │
│                           ▼                                 │
│         ┌─────────────────────────────────────┐             │
│         │         OUTPUT GUARDRAILS           │             │
│         │  ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐  │             │
│         │  │Secur│ │Quali│ │Compl│ │PII  │  │             │
│         │  └──┬──┘ └──┬──┘ └──┬──┘ └──┬──┘  │             │
│         └─────┼───────┼───────┼───────┼─────┘             │
│               │       │       │       │                     │
│               └───────┴───┬───┴───────┘                     │
│                           │ ALL PASS?                       │
│                           ▼                                 │
│                    ┌─────────────┐                          │
│                    │   OUTPUT    │                          │
│                    │  DELIVERED  │                          │
│                    └─────────────┘                          │
│                                                              │
│  ════════════════════════════════════════════════════════   │
│                                                              │
│         ┌─────────────────────────────────────┐             │
│         │      CONTINUOUS GUARDRAILS          │             │
│         │     (Run parallel to main agent)    │             │
│         │  ┌─────┐ ┌─────┐ ┌─────┐          │             │
│         │  │Token│ │Time │ │Cost │          │             │
│         │  │Limit│ │Limit│ │Limit│          │             │
│         │  └─────┘ └─────┘ └─────┘          │             │
│         └─────────────────────────────────────┘             │
│                           │                                 │
│                     HALT IF EXCEEDED                        │
│                                                              │
└─────────────────────────────────────────────────────────────┘
```

## Guardrail Types

### Input Guardrails

Run before main agent processes request:

| Guardrail | Purpose | Action on Fail |
|-----------|---------|----------------|
| `scope_check` | Verify request within allowed scope | Reject with explanation |
| `authorization` | Check user has permission | Reject with auth error |
| `input_validation` | Validate format/content | Request clarification |
| `safety_check` | Screen for harmful requests | Block and log |
| `rate_limit` | Prevent abuse | Throttle or reject |

### Output Guardrails

Run before delivering agent output:

| Guardrail | Purpose | Action on Fail |
|-----------|---------|----------------|
| `security_scan` | Check for vulnerabilities | Block, suggest fix |
| `quality_check` | Ensure output meets standards | Request revision |
| `compliance` | Verify regulatory compliance | Flag for review |
| `pii_detection` | Find personally identifiable info | Redact or block |
| `secret_detection` | Find API keys, passwords | Block, alert |

### Continuous Guardrails

Run in parallel throughout execution:

| Guardrail | Purpose | Action on Exceed |
|-----------|---------|------------------|
| `token_limit` | Cap token consumption | Halt gracefully |
| `time_limit` | Enforce execution timeout | Halt with checkpoint |
| `cost_limit` | Cap API costs | Halt, notify user |
| `iteration_limit` | Prevent infinite loops | Break loop |

## Configuration

### Project-Level Config

`~/.claude/guardrails/default.yaml`:

```yaml
guardrails:
  input:
    - name: scope_check
      enabled: true
      config:
        allowed_operations:
          - read_files
          - write_files
          - run_commands
        blocked_patterns:
          - "rm -rf /"
          - "DROP DATABASE"

    - name: safety_check
      enabled: true
      config:
        block_categories:
          - malware_creation
          - credential_theft
          - unauthorized_access

  output:
    - name: security_scan
      enabled: true
      config:
        scanners:
          - owasp_top_10
          - sql_injection
          - xss
          - command_injection

    - name: secret_detection
      enabled: true
      config:
        patterns:
          - api_keys
          - passwords
          - private_keys
          - connection_strings
        action: block_and_alert

    - name: pii_detection
      enabled: true
      config:
        detect:
          - email_addresses
          - phone_numbers
          - ssn
          - credit_cards
        action: redact

  continuous:
    - name: token_limit
      enabled: true
      config:
        max_tokens: 100000
        warning_threshold: 80000

    - name: time_limit
      enabled: true
      config:
        max_seconds: 600
        warning_seconds: 500

    - name: cost_limit
      enabled: true
      config:
        max_cost_usd: 5.00
        warning_threshold: 4.00
```

### Agent-Specific Overrides

`~/.claude/guardrails/auto-code.yaml`:

```yaml
extends: default

guardrails:
  output:
    - name: security_scan
      config:
        severity_threshold: medium  # More strict for code generation
        require_fix_before_output: true

    - name: code_quality
      enabled: true
      config:
        require_tests: true
        max_complexity: 10
        require_error_handling: true
```

## Implementation Protocol

### Parallel Execution Engine

```python
import asyncio

async def execute_with_guardrails(request, agent, guardrail_config):
    # Phase 1: Input guardrails (must pass before agent runs)
    input_results = await asyncio.gather(*[
        run_guardrail(g, request, "input")
        for g in guardrail_config.input
    ])

    if any(r.failed for r in input_results):
        return GuardrailFailure(
            phase="input",
            failures=[r for r in input_results if r.failed]
        )

    # Phase 2: Main agent + continuous guardrails in parallel
    agent_task = asyncio.create_task(agent.execute(request))
    continuous_task = asyncio.create_task(
        monitor_continuous(agent_task, guardrail_config.continuous)
    )

    try:
        # Wait for agent, but continuous guardrails can cancel it
        output = await agent_task
    except GuardrailHalt as e:
        return GuardrailFailure(
            phase="continuous",
            failures=[e.guardrail_result],
            partial_output=e.partial_output
        )

    # Phase 3: Output guardrails
    output_results = await asyncio.gather(*[
        run_guardrail(g, output, "output")
        for g in guardrail_config.output
    ])

    if any(r.failed for r in output_results):
        return GuardrailFailure(
            phase="output",
            failures=[r for r in output_results if r.failed],
            suggested_fixes=[r.fix for r in output_results if r.fix]
        )

    return GuardrailSuccess(output=output, audit=all_results)
```

### Individual Guardrail Structure

```python
class Guardrail:
    name: str
    type: Literal["input", "output", "continuous"]
    config: dict

    async def check(self, content) -> GuardrailResult:
        """
        Returns:
          - passed: bool
          - severity: info | warning | error | critical
          - message: str
          - details: dict
          - fix: Optional[str]  # Suggested remediation
        """
        raise NotImplementedError

class SecurityScanGuardrail(Guardrail):
    name = "security_scan"
    type = "output"

    async def check(self, code_output) -> GuardrailResult:
        vulnerabilities = []

        # SQL Injection check
        if re.search(r'execute\([^)]*\+.*\)', code_output):
            vulnerabilities.append({
                "type": "sql_injection",
                "severity": "critical",
                "line": find_line(code_output, pattern),
                "fix": "Use parameterized queries instead of string concatenation"
            })

        # Command Injection check
        if re.search(r'exec\(.*\$|shell_exec\(.*\$', code_output):
            vulnerabilities.append({
                "type": "command_injection",
                "severity": "critical",
                "fix": "Sanitize user input before shell execution"
            })

        # XSS check
        if re.search(r'innerHTML\s*=.*\+', code_output):
            vulnerabilities.append({
                "type": "xss",
                "severity": "high",
                "fix": "Use textContent or sanitize HTML"
            })

        if vulnerabilities:
            return GuardrailResult(
                passed=False,
                severity=max(v["severity"] for v in vulnerabilities),
                message=f"Found {len(vulnerabilities)} security issues",
                details={"vulnerabilities": vulnerabilities},
                fix=self.generate_fix_suggestions(vulnerabilities)
            )

        return GuardrailResult(passed=True)
```

### Continuous Monitoring

```python
async def monitor_continuous(agent_task, continuous_guardrails):
    start_time = time.time()
    tokens_used = 0
    cost_accrued = 0

    while not agent_task.done():
        await asyncio.sleep(1)  # Check every second

        # Update metrics
        elapsed = time.time() - start_time
        tokens_used = get_current_token_count()
        cost_accrued = calculate_cost(tokens_used)

        # Check each continuous guardrail
        for guardrail in continuous_guardrails:
            result = await guardrail.check({
                "elapsed_seconds": elapsed,
                "tokens_used": tokens_used,
                "cost_usd": cost_accrued
            })

            if not result.passed:
                if result.severity == "warning":
                    notify_user(f"Warning: {guardrail.name} - {result.message}")
                elif result.severity in ["error", "critical"]:
                    # Halt the agent
                    agent_task.cancel()
                    raise GuardrailHalt(
                        guardrail_result=result,
                        partial_output=get_partial_output()
                    )
```

## Built-in Guardrails

### 1. Security Scan

Checks for OWASP Top 10 vulnerabilities:

```yaml
security_scan:
  checks:
    - sql_injection
    - xss
    - command_injection
    - path_traversal
    - insecure_deserialization
    - sensitive_data_exposure
    - broken_authentication
    - security_misconfiguration
```

### 2. Secret Detection

Patterns for common secrets:

```yaml
secret_detection:
  patterns:
    aws_key: 'AKIA[0-9A-Z]{16}'
    github_token: 'ghp_[a-zA-Z0-9]{36}'
    generic_api_key: '[aA][pP][iI]_?[kK][eE][yY].*[=:]\s*["\']?[a-zA-Z0-9]{20,}'
    private_key: '-----BEGIN (RSA |EC |DSA )?PRIVATE KEY-----'
    password_assignment: 'password\s*=\s*["\'][^"\']+["\']'
```

### 3. PII Detection

Personally identifiable information:

```yaml
pii_detection:
  patterns:
    email: '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}'
    phone: '\b\d{3}[-.]?\d{3}[-.]?\d{4}\b'
    ssn: '\b\d{3}-\d{2}-\d{4}\b'
    credit_card: '\b\d{4}[-\s]?\d{4}[-\s]?\d{4}[-\s]?\d{4}\b'
  action: redact  # or block
```

### 4. Code Quality

Quality gates for generated code:

```yaml
code_quality:
  checks:
    - no_console_log  # Remove debug statements
    - no_todo_comments  # Complete all TODOs
    - no_hardcoded_values  # Use constants/config
    - error_handling_present  # Try/catch where needed
    - tests_included  # Require test files
  max_file_length: 500
  max_function_length: 50
  max_complexity: 10
```

### 5. Scope Enforcement

Prevent out-of-scope operations:

```yaml
scope_check:
  allowed_paths:
    - "/project/**"
    - "/tmp/**"
  blocked_paths:
    - "/etc/**"
    - "/root/**"
    - "~/.ssh/**"
  blocked_commands:
    - "rm -rf"
    - "chmod 777"
    - "curl | bash"
```

## Invocation

### Check Status

```
/guardrails status

Active Guardrails:
├── Input (3)
│   ├── ✓ scope_check
│   ├── ✓ safety_check
│   └── ✓ input_validation
├── Output (4)
│   ├── ✓ security_scan
│   ├── ✓ secret_detection
│   ├── ✓ pii_detection
│   └── ✓ code_quality
└── Continuous (3)
    ├── ✓ token_limit (80K/100K)
    ├── ✓ time_limit (0s/600s)
    └── ✓ cost_limit ($0.00/$5.00)
```

### Configure

```
/guardrails enable secret_detection
/guardrails disable pii_detection
/guardrails config token_limit.max_tokens 200000
```

### View Violations

```
/guardrails log

Recent Guardrail Events:
┌────────────────┬──────────┬─────────────────────────────────┐
│ Timestamp      │ Guardrail│ Event                           │
├────────────────┼──────────┼─────────────────────────────────┤
│ 10:30:15       │ security │ BLOCKED: SQL injection detected │
│ 10:28:02       │ token    │ WARNING: 80% of limit reached   │
│ 10:25:44       │ secret   │ BLOCKED: API key in output      │
└────────────────┴──────────┴─────────────────────────────────┘
```

## Example: Guardrail Blocking Vulnerable Code

```
User: Write a function to query users by name

Agent (auto-code): Generating function...

[OUTPUT GUARDRAIL: security_scan] ❌ BLOCKED

## Security Issue Detected

**Vulnerability**: SQL Injection (Critical)
**Location**: Line 5

```python
# BLOCKED OUTPUT
def get_user(name):
    query = "SELECT * FROM users WHERE name = '" + name + "'"  # ❌ VULNERABLE
    return db.execute(query)
```

**Issue**: String concatenation in SQL query allows injection attacks.

**Suggested Fix**:
```python
def get_user(name):
    query = "SELECT * FROM users WHERE name = %s"  # ✓ SAFE
    return db.execute(query, (name,))
```

Regenerating with fix applied...

[OUTPUT GUARDRAIL: security_scan] ✓ PASSED

```python
def get_user(name):
    """Safely query user by name using parameterized query."""
    query = "SELECT * FROM users WHERE name = %s"
    return db.execute(query, (name,))
```
```

## Integration Points

| System | Integration |
|--------|-------------|
| Conductor | Wraps all agent invocations with guardrails |
| Auto-Code | Security + quality guardrails on output |
| CISO | Uses guardrails for automated security review |
| Checkpoint | Saves state before guardrail halt |
| Tracing | Logs all guardrail decisions |

## Model Recommendation

- **Haiku**: For pattern-based guardrails (fast, cheap)
- **Sonnet**: For LLM-judged quality checks
- **Opus**: Not typically needed for guardrails
