---
name: tracer
description: Comprehensive tracing system that captures LLM calls, tool invocations, agent handoffs, and workflow execution for debugging and performance analysis.
model: opus
---

# Tracer Agent - Built-in Observability

Comprehensive tracing system that captures LLM calls, tool invocations, agent handoffs, and workflow execution for debugging and performance analysis.

## Inspiration

Based on [OpenAI Agents SDK tracing](https://openai.github.io/openai-agents-python/) which provides built-in observability with automatic capture of agent runs, tool calls, and handoffs.

## Core Capabilities

- **Automatic Capture**: All LLM calls, tool uses, and agent interactions logged
- **Hierarchical Spans**: Nested trace structure showing parent-child relationships
- **Performance Metrics**: Latency, token usage, cost tracking per operation
- **Custom Attributes**: Add context-specific metadata to traces
- **Export Formats**: JSON, OpenTelemetry, visualization-ready

## Architecture

```
┌─────────────────────────────────────────────────────────────┐
│                    TRACING SYSTEM                            │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  ┌──────────────────────────────────────────────────────┐   │
│  │                    TRACE ROOT                         │   │
│  │              trace_id: tr_abc123                      │   │
│  └──────────────────────────────────────────────────────┘   │
│                            │                                 │
│           ┌────────────────┼────────────────┐               │
│           ▼                ▼                ▼               │
│    ┌─────────────┐  ┌─────────────┐  ┌─────────────┐       │
│    │  AGENT SPAN │  │  AGENT SPAN │  │  AGENT SPAN │       │
│    │  architect  │  │  auto-code  │  │   qa    │       │
│    └──────┬──────┘  └──────┬──────┘  └──────┬──────┘       │
│           │                │                │               │
│     ┌─────┴─────┐    ┌─────┴─────┐    ┌─────┴─────┐       │
│     ▼           ▼    ▼           ▼    ▼           ▼       │
│  ┌─────┐    ┌─────┐ ┌─────┐  ┌─────┐ ┌─────┐  ┌─────┐    │
│  │ LLM │    │Tool │ │ LLM │  │Tool │ │ LLM │  │Tool │    │
│  │Call │    │Call │ │Call │  │Call │ │Call │  │Call │    │
│  └─────┘    └─────┘ └─────┘  └─────┘ └─────┘  └─────┘    │
│                                                              │
│  ════════════════════════════════════════════════════════   │
│                                                              │
│  SPANS TABLE:                                               │
│  ┌────────────┬──────────┬────────┬─────────┬──────────┐   │
│  │ Span ID    │ Type     │ Parent │ Duration│ Status   │   │
│  ├────────────┼──────────┼────────┼─────────┼──────────┤   │
│  │ sp_001     │ workflow │ -      │ 45m     │ complete │   │
│  │ sp_002     │ agent    │ sp_001 │ 15m     │ complete │   │
│  │ sp_003     │ llm_call │ sp_002 │ 2.3s    │ complete │   │
│  │ sp_004     │ tool     │ sp_002 │ 0.5s    │ complete │   │
│  │ sp_005     │ handoff  │ sp_001 │ 0.1s    │ complete │   │
│  └────────────┴──────────┴────────┴─────────┴──────────┘   │
│                                                              │
└─────────────────────────────────────────────────────────────┘
```

## Trace Schema

```json
{
  "trace_id": "tr_abc123",
  "name": "Implement Authentication Feature",
  "started_at": "2026-01-11T20:00:00Z",
  "ended_at": "2026-01-11T20:45:00Z",
  "duration_ms": 2700000,
  "status": "completed",

  "metadata": {
    "workflow_id": "wf_xyz789",
    "user_request": "Add user authentication to the app",
    "project": "/path/to/project",
    "model": "claude-opus-4-5-20251101"
  },

  "metrics": {
    "total_tokens": 125000,
    "input_tokens": 85000,
    "output_tokens": 40000,
    "estimated_cost_usd": 2.45,
    "llm_calls": 15,
    "tool_calls": 42,
    "agent_invocations": 4,
    "handoffs": 3
  },

  "spans": [
    {
      "span_id": "sp_001",
      "parent_span_id": null,
      "type": "workflow",
      "name": "conductor_orchestration",
      "started_at": "2026-01-11T20:00:00Z",
      "ended_at": "2026-01-11T20:45:00Z",
      "duration_ms": 2700000,
      "status": "completed",
      "attributes": {
        "phases_completed": 4,
        "total_phases": 4
      }
    },
    {
      "span_id": "sp_002",
      "parent_span_id": "sp_001",
      "type": "agent",
      "name": "architect",
      "started_at": "2026-01-11T20:00:05Z",
      "ended_at": "2026-01-11T20:15:00Z",
      "duration_ms": 895000,
      "status": "completed",
      "attributes": {
        "phase": "design",
        "spec_file": "/TODO/auth-module.md"
      }
    },
    {
      "span_id": "sp_003",
      "parent_span_id": "sp_002",
      "type": "llm_call",
      "name": "generate_specification",
      "started_at": "2026-01-11T20:00:10Z",
      "ended_at": "2026-01-11T20:00:35Z",
      "duration_ms": 25000,
      "status": "completed",
      "attributes": {
        "model": "claude-opus-4-5-20251101",
        "input_tokens": 2500,
        "output_tokens": 1800,
        "temperature": 0.7
      }
    },
    {
      "span_id": "sp_004",
      "parent_span_id": "sp_002",
      "type": "tool",
      "name": "Write",
      "started_at": "2026-01-11T20:00:36Z",
      "ended_at": "2026-01-11T20:00:37Z",
      "duration_ms": 450,
      "status": "completed",
      "attributes": {
        "file_path": "/TODO/auth-module.md",
        "bytes_written": 3500
      }
    },
    {
      "span_id": "sp_005",
      "parent_span_id": "sp_001",
      "type": "handoff",
      "name": "architect_to_auto-code",
      "started_at": "2026-01-11T20:15:00Z",
      "ended_at": "2026-01-11T20:15:01Z",
      "duration_ms": 150,
      "status": "completed",
      "attributes": {
        "handoff_id": "ho_def456",
        "context_size_bytes": 8500,
        "artifacts_transferred": 2
      }
    }
  ],

  "events": [
    {
      "timestamp": "2026-01-11T20:00:00Z",
      "type": "workflow_start",
      "message": "Starting conductor orchestration"
    },
    {
      "timestamp": "2026-01-11T20:15:00Z",
      "type": "handoff",
      "message": "Handoff from architect to auto-code"
    },
    {
      "timestamp": "2026-01-11T20:35:00Z",
      "type": "guardrail_triggered",
      "message": "Security scan detected potential SQL injection",
      "severity": "warning"
    },
    {
      "timestamp": "2026-01-11T20:45:00Z",
      "type": "workflow_complete",
      "message": "All phases completed successfully"
    }
  ],

  "errors": []
}
```

## Span Types

### Workflow Span

Top-level span encompassing entire task:

```json
{
  "type": "workflow",
  "attributes": {
    "workflow_id": "wf_xyz789",
    "phases": ["design", "implement", "test", "review"],
    "agents_involved": ["architect", "auto-code", "qa", "ciso"]
  }
}
```

### Agent Span

Span for each agent invocation:

```json
{
  "type": "agent",
  "attributes": {
    "agent_name": "auto-code",
    "agent_id": "code_abc123",
    "phase": "implement",
    "task": "Implement authentication module",
    "handoff_id": "ho_def456"
  }
}
```

### LLM Call Span

Individual LLM API calls:

```json
{
  "type": "llm_call",
  "attributes": {
    "model": "claude-opus-4-5-20251101",
    "input_tokens": 2500,
    "output_tokens": 1800,
    "total_tokens": 4300,
    "temperature": 0.7,
    "max_tokens": 4096,
    "stop_reason": "end_turn",
    "latency_ms": 2300
  }
}
```

### Tool Span

Tool/function invocations:

```json
{
  "type": "tool",
  "attributes": {
    "tool_name": "Write",
    "parameters": {
      "file_path": "/src/auth/jwt.ts"
    },
    "result_summary": "success",
    "bytes_affected": 2500
  }
}
```

### Handoff Span

Agent-to-agent transfers:

```json
{
  "type": "handoff",
  "attributes": {
    "handoff_id": "ho_abc123",
    "from_agent": "architect",
    "to_agent": "auto-code",
    "context_items": 5,
    "expectations_count": 3
  }
}
```

### Guardrail Span

Guardrail evaluations:

```json
{
  "type": "guardrail",
  "attributes": {
    "guardrail_name": "security_scan",
    "guardrail_type": "output",
    "passed": false,
    "severity": "critical",
    "issues_found": 2
  }
}
```

## Implementation Protocol

### Automatic Instrumentation

```python
class Tracer:
    def __init__(self, trace_id=None):
        self.trace_id = trace_id or generate_id("tr")
        self.spans = []
        self.events = []
        self.current_span = None
        self.start_time = now()

    def start_span(self, name, span_type, attributes=None):
        span = Span(
            span_id=generate_id("sp"),
            parent_span_id=self.current_span.span_id if self.current_span else None,
            type=span_type,
            name=name,
            started_at=now_iso(),
            attributes=attributes or {}
        )
        self.spans.append(span)
        self.current_span = span
        return span

    def end_span(self, status="completed", attributes=None):
        if self.current_span:
            self.current_span.ended_at = now_iso()
            self.current_span.duration_ms = calculate_duration(
                self.current_span.started_at,
                self.current_span.ended_at
            )
            self.current_span.status = status
            if attributes:
                self.current_span.attributes.update(attributes)

            # Return to parent span
            parent_id = self.current_span.parent_span_id
            self.current_span = self.find_span(parent_id) if parent_id else None

    def add_event(self, event_type, message, severity=None):
        event = {
            "timestamp": now_iso(),
            "type": event_type,
            "message": message
        }
        if severity:
            event["severity"] = severity
        self.events.append(event)

    def add_error(self, error_type, message, stack_trace=None):
        error = {
            "timestamp": now_iso(),
            "type": error_type,
            "message": message
        }
        if stack_trace:
            error["stack_trace"] = stack_trace
        self.errors.append(error)

    def export(self, format="json"):
        trace = {
            "trace_id": self.trace_id,
            "started_at": self.start_time,
            "ended_at": now_iso(),
            "duration_ms": calculate_duration(self.start_time, now_iso()),
            "metrics": self.calculate_metrics(),
            "spans": [s.to_dict() for s in self.spans],
            "events": self.events,
            "errors": self.errors
        }

        if format == "json":
            return json.dumps(trace, indent=2)
        elif format == "otel":
            return convert_to_opentelemetry(trace)

    def calculate_metrics(self):
        return {
            "total_tokens": sum(s.attributes.get("total_tokens", 0)
                               for s in self.spans if s.type == "llm_call"),
            "llm_calls": len([s for s in self.spans if s.type == "llm_call"]),
            "tool_calls": len([s for s in self.spans if s.type == "tool"]),
            "agent_invocations": len([s for s in self.spans if s.type == "agent"]),
            "handoffs": len([s for s in self.spans if s.type == "handoff"])
        }
```

### Context Manager Usage

```python
# Automatic span management with context managers
with tracer.span("architect_design", "agent") as span:
    span.set_attribute("phase", "design")

    with tracer.span("generate_spec", "llm_call") as llm_span:
        response = llm.generate(prompt)
        llm_span.set_attribute("tokens", response.usage.total_tokens)

    with tracer.span("write_spec", "tool") as tool_span:
        result = write_file(spec_path, content)
        tool_span.set_attribute("file_path", spec_path)
```

### Storing Traces

```python
def store_trace(trace):
    """Store trace to memory system for later analysis"""
    memory_store({
        "type": "context",
        "content": json.dumps(trace),
        "tags": ["trace", trace["trace_id"], trace["metadata"]["workflow_id"]],
        "project": trace["metadata"]["project"]
    })

def recall_traces(workflow_id=None, time_range=None):
    """Retrieve traces for analysis"""
    query = f"trace workflow {workflow_id}" if workflow_id else "trace"
    return memory_recall({"query": query, "limit": 100})
```

## Commands

### `/trace status`

Show current trace:

```
User: /trace status

## Active Trace: tr_abc123

**Workflow**: Implement Authentication
**Duration**: 23m 45s (ongoing)
**Current Agent**: auto-code
**Current Phase**: implement

### Metrics So Far
| Metric | Value |
|--------|-------|
| LLM Calls | 8 |
| Tool Calls | 24 |
| Tokens Used | 45,230 |
| Est. Cost | $0.89 |

### Recent Events
- 2 min ago: Tool call (Write /src/auth/jwt.ts)
- 5 min ago: LLM call (generate implementation)
- 8 min ago: Handoff (architect → auto-code)
```

### `/trace view`

View completed trace:

```
User: /trace view tr_abc123

## Trace: tr_abc123

**Name**: Implement Authentication Feature
**Duration**: 45m 12s
**Status**: Completed ✓

### Summary
- Agents: architect → auto-code → qa → ciso
- Files Created: 8
- Files Modified: 3
- Tests: 12 passed

### Timeline
20:00:00 ┬─ workflow_start
         │
20:00:05 ├─ agent:architect (15m)
         │  ├─ llm_call: analyze_requirements (2.3s)
         │  ├─ llm_call: design_architecture (4.1s)
         │  └─ tool: Write /TODO/auth.md (0.5s)
         │
20:15:01 ├─ handoff: architect → auto-code
         │
20:15:02 ├─ agent:auto-code (20m)
         │  ├─ llm_call: plan_implementation (3.2s)
         │  ├─ tool: Write /src/auth/jwt.ts (0.3s)
         │  ├─ tool: Write /src/auth/middleware.ts (0.4s)
         │  ├─ guardrail: security_scan ⚠️ (1.2s)
         │  └─ llm_call: fix_security_issue (2.8s)
         │
20:35:05 ├─ handoff: auto-code → qa
...
20:45:12 └─ workflow_complete ✓

### Cost Breakdown
| Agent | Tokens | Cost |
|-------|--------|------|
| architect | 15,200 | $0.30 |
| auto-code | 65,800 | $1.29 |
| qa | 28,400 | $0.56 |
| ciso | 15,600 | $0.30 |
| **Total** | **125,000** | **$2.45** |
```

### `/trace search`

Find traces by criteria:

```
User: /trace search workflow_id:wf_xyz789

## Traces for Workflow wf_xyz789

| Trace ID | Name | Duration | Status | Cost |
|----------|------|----------|--------|------|
| tr_abc123 | Implement Auth | 45m | ✓ | $2.45 |
| tr_def456 | Fix Auth Bug | 12m | ✓ | $0.65 |
| tr_ghi789 | Add OAuth | 38m | ✗ | $1.80 |
```

### `/trace export`

Export trace data:

```
User: /trace export tr_abc123 --format json

Exported trace to: /traces/tr_abc123.json (125KB)

User: /trace export tr_abc123 --format otel

Exported trace to: /traces/tr_abc123.otel.json
Compatible with: Jaeger, Zipkin, Grafana Tempo
```

### `/trace compare`

Compare multiple traces:

```
User: /trace compare tr_abc123 tr_def456

## Trace Comparison

| Metric | tr_abc123 | tr_def456 | Diff |
|--------|-----------|-----------|------|
| Duration | 45m | 32m | -29% |
| LLM Calls | 15 | 11 | -27% |
| Tokens | 125K | 89K | -29% |
| Cost | $2.45 | $1.74 | -29% |
| Tool Calls | 42 | 28 | -33% |

### Insights
- tr_def456 was more efficient due to:
  - Fewer agent handoffs (2 vs 3)
  - No guardrail failures
  - Reused cached specifications
```

## Visualization

### ASCII Timeline

```
/trace timeline tr_abc123

20:00                                                          20:45
  │                                                              │
  ├──────────[architect]──────────┤                              │
  │                               │                              │
  │                               ├─────────────[auto-code]──────┤
  │                               │                              │
  │                               │                    ├──[qa]───┤
  │                               │                    │         │
  │                               │                    │  ├[ciso]┤
  ▼                               ▼                    ▼  ▼      ▼
```

### Flame Graph (Conceptual)

```
/trace flame tr_abc123

┌─────────────────────────workflow (45m)──────────────────────────┐
│┌────────architect (15m)────────┐┌────────auto-code (20m)───────┐│
││┌─llm (2s)─┐┌─llm (4s)─┐┌tool┐││┌llm┐┌tool┐┌tool┐┌─llm (3s)──┐││
│└───────────────────────────────┘└───────────────────────────────┘│
└──────────────────────────────────────────────────────────────────┘
```

## Integration Points

| System | Integration |
|--------|-------------|
| Conductor | Automatic workflow span creation |
| All Agents | Agent span instrumentation |
| Handoff | Handoff span capture |
| Guardrails | Guardrail span and events |
| Checkpoint | Checkpoint events in trace |
| Memory | Trace storage and retrieval |
| Episode | Episode recording includes trace |

## Performance Impact

Tracing adds minimal overhead:
- Span creation: ~0.1ms
- Event logging: ~0.05ms
- Export: ~10ms per 1000 spans
- Storage: ~1KB per span average

## Configuration

`~/.claude/tracer/config.yaml`:

```yaml
tracing:
  enabled: true

  capture:
    llm_calls: true
    tool_calls: true
    handoffs: true
    guardrails: true
    checkpoints: true

  sampling:
    rate: 1.0  # 100% capture, reduce for high-volume

  export:
    auto_export: true
    format: json
    destination: ~/.claude/traces/
    retention_days: 30

  performance:
    max_spans_per_trace: 10000
    max_events_per_trace: 1000

  privacy:
    redact_pii: true
    exclude_file_contents: true
    hash_sensitive_params: true
```

## Model Recommendation

- **Haiku**: For trace operations (fast, minimal overhead)
- **Sonnet**: For trace analysis and comparison
- **Opus**: For deep trace debugging and insights
