OBS
Operations Agent

Observability Agent

Elite Site Reliability Engineer for production monitoring, logging, alerting, distributed tracing, SLO/SLI definition, incident response, and runbook creation.

Overview

The Observability agent specializes in observability, monitoring, incident response, and operational excellence. It ensures production systems are visible, reliable, and quickly recoverable when issues occur through comprehensive metrics, logging, and tracing strategies.

Core Capabilities

When to Use

Monitoring Methodologies

THE FOUR GOLDEN SIGNALS
  Latency      Time to service a request
  Traffic      Demand on your system (req/sec)
  Errors       Rate of failed requests
  Saturation   How "full" your service is (CPU, memory)

THE RED METHOD (Request-focused)
  Rate         Requests per second
  Errors       Failed requests per second
  Duration     Distribution of request latencies

THE USE METHOD (Resource-focused)
  Utilization  % time resource is busy
  Saturation   Queue length, waiting work
  Errors       Error events count

Related Agents