# Toxiproxy - Chaos Engineering

Comprehensive guide to chaos engineering and resilience testing with Toxiproxy for simulating network conditions and failures.

## Overview

**Container**: `ghcr.io/shopify/toxiproxy:latest`
**Category**: Chaos Engineering
**API Port**: 8474
**URL**: http://localhost:8474

Toxiproxy is a proxy for simulating network conditions. It's designed to test the resiliency of applications under adverse network conditions.

## Quick Start

```bash
# Check Toxiproxy status
curl http://localhost:8474/version

# List all proxies
curl http://localhost:8474/proxies

# Create a proxy
curl -X POST http://localhost:8474/proxies \
  -H "Content-Type: application/json" \
  -d '{"name": "redis", "listen": "0.0.0.0:26379", "upstream": "redis:6379"}'

# Add latency toxic
curl -X POST http://localhost:8474/proxies/redis/toxics \
  -H "Content-Type: application/json" \
  -d '{"name": "latency", "type": "latency", "attributes": {"latency": 1000, "jitter": 500}}'
```

## Core Concepts

| Concept | Description |
|---------|-------------|
| **Proxy** | Route between client and upstream service |
| **Toxic** | Failure condition applied to proxy |
| **Upstream** | Backend service being proxied |
| **Stream** | Direction (upstream/downstream) |

## Proxy Management

### Create Proxy

```bash
# Create MySQL proxy
curl -X POST http://localhost:8474/proxies \
  -H "Content-Type: application/json" \
  -d '{
    "name": "mysql",
    "listen": "0.0.0.0:23306",
    "upstream": "mysql:3306",
    "enabled": true
  }'

# Create Redis proxy
curl -X POST http://localhost:8474/proxies \
  -H "Content-Type: application/json" \
  -d '{
    "name": "redis",
    "listen": "0.0.0.0:26379",
    "upstream": "redis:6379"
  }'

# Create HTTP API proxy
curl -X POST http://localhost:8474/proxies \
  -H "Content-Type: application/json" \
  -d '{
    "name": "api",
    "listen": "0.0.0.0:28080",
    "upstream": "api:8080"
  }'
```

### Manage Proxies

```bash
# List all proxies
curl http://localhost:8474/proxies

# Get specific proxy
curl http://localhost:8474/proxies/mysql

# Disable proxy (simulate complete failure)
curl -X POST http://localhost:8474/proxies/mysql \
  -H "Content-Type: application/json" \
  -d '{"enabled": false}'

# Enable proxy
curl -X POST http://localhost:8474/proxies/mysql \
  -H "Content-Type: application/json" \
  -d '{"enabled": true}'

# Delete proxy
curl -X DELETE http://localhost:8474/proxies/mysql

# Reset all proxies (remove toxics)
curl -X POST http://localhost:8474/reset
```

## Toxic Types

### Latency

Adds delay to data flow:

```bash
# Add latency (1 second with 500ms jitter)
curl -X POST http://localhost:8474/proxies/redis/toxics \
  -H "Content-Type: application/json" \
  -d '{
    "name": "latency_downstream",
    "type": "latency",
    "stream": "downstream",
    "toxicity": 1.0,
    "attributes": {
      "latency": 1000,
      "jitter": 500
    }
  }'
```

### Bandwidth

Limits throughput:

```bash
# Limit to 1KB/s
curl -X POST http://localhost:8474/proxies/api/toxics \
  -H "Content-Type: application/json" \
  -d '{
    "name": "bandwidth",
    "type": "bandwidth",
    "stream": "downstream",
    "attributes": {
      "rate": 1
    }
  }'
```

### Slow Close

Delays connection close:

```bash
# Delay close by 2 seconds
curl -X POST http://localhost:8474/proxies/mysql/toxics \
  -H "Content-Type: application/json" \
  -d '{
    "name": "slow_close",
    "type": "slow_close",
    "stream": "downstream",
    "attributes": {
      "delay": 2000
    }
  }'
```

### Timeout

Stops data after timeout:

```bash
# Connection timeout after 5 seconds
curl -X POST http://localhost:8474/proxies/redis/toxics \
  -H "Content-Type: application/json" \
  -d '{
    "name": "timeout",
    "type": "timeout",
    "stream": "upstream",
    "attributes": {
      "timeout": 5000
    }
  }'
```

### Slicer

Slices data into smaller chunks:

```bash
# Slice data into 10-byte chunks
curl -X POST http://localhost:8474/proxies/api/toxics \
  -H "Content-Type: application/json" \
  -d '{
    "name": "slicer",
    "type": "slicer",
    "stream": "downstream",
    "attributes": {
      "average_size": 10,
      "size_variation": 5,
      "delay": 100
    }
  }'
```

### Limit Data

Closes connection after data limit:

```bash
# Close after 1KB transferred
curl -X POST http://localhost:8474/proxies/api/toxics \
  -H "Content-Type: application/json" \
  -d '{
    "name": "limit_data",
    "type": "limit_data",
    "stream": "downstream",
    "attributes": {
      "bytes": 1024
    }
  }'
```

### Reset Peer

Simulates connection reset:

```bash
# Reset connection after 3 seconds
curl -X POST http://localhost:8474/proxies/mysql/toxics \
  -H "Content-Type: application/json" \
  -d '{
    "name": "reset_peer",
    "type": "reset_peer",
    "stream": "downstream",
    "attributes": {
      "timeout": 3000
    }
  }'
```

## Toxic Management

```bash
# List toxics for proxy
curl http://localhost:8474/proxies/redis/toxics

# Get specific toxic
curl http://localhost:8474/proxies/redis/toxics/latency_downstream

# Update toxic
curl -X POST http://localhost:8474/proxies/redis/toxics/latency_downstream \
  -H "Content-Type: application/json" \
  -d '{
    "attributes": {
      "latency": 2000,
      "jitter": 1000
    }
  }'

# Remove toxic
curl -X DELETE http://localhost:8474/proxies/redis/toxics/latency_downstream

# Remove all toxics (reset)
curl -X POST http://localhost:8474/reset
```

## Testing Scenarios

### Database Failover Test

```bash
#!/bin/bash
# test-db-failover.sh

# Create MySQL proxy
curl -X POST http://localhost:8474/proxies \
  -d '{"name":"mysql","listen":"0.0.0.0:23306","upstream":"mysql:3306"}'

# Run baseline test
npm run test:integration

# Simulate high latency
curl -X POST http://localhost:8474/proxies/mysql/toxics \
  -d '{"name":"latency","type":"latency","attributes":{"latency":5000}}'

# Run test - should timeout gracefully
npm run test:integration:timeout

# Simulate complete failure
curl -X POST http://localhost:8474/proxies/mysql -d '{"enabled":false}'

# Run test - should handle failure
npm run test:integration:failure

# Reset for next test
curl -X POST http://localhost:8474/reset
```

### Redis Connection Pool Test

```bash
#!/bin/bash
# test-redis-pool.sh

# Create Redis proxy
curl -X POST http://localhost:8474/proxies \
  -d '{"name":"redis","listen":"0.0.0.0:26379","upstream":"redis:6379"}'

# Simulate intermittent failures (50% toxicity)
curl -X POST http://localhost:8474/proxies/redis/toxics \
  -d '{"name":"reset","type":"reset_peer","toxicity":0.5,"attributes":{"timeout":100}}'

# Run load test
k6 run --vus 10 --duration 1m redis-test.js

# Check connection pool recovery
npm run test:redis:pool
```

### API Degradation Test

```bash
#!/bin/bash
# test-api-degradation.sh

# Create API proxy
curl -X POST http://localhost:8474/proxies \
  -d '{"name":"api","listen":"0.0.0.0:28080","upstream":"backend:8080"}'

# Simulate slow responses
curl -X POST http://localhost:8474/proxies/api/toxics \
  -d '{"name":"latency","type":"latency","attributes":{"latency":3000,"jitter":1000}}'

# Run test - should show graceful degradation
npm run test:api:slow

# Simulate bandwidth throttling
curl -X DELETE http://localhost:8474/proxies/api/toxics/latency
curl -X POST http://localhost:8474/proxies/api/toxics \
  -d '{"name":"bandwidth","type":"bandwidth","attributes":{"rate":10}}'

# Run test - should handle slow downloads
npm run test:api:throttled
```

## Programmatic Usage (Node.js)

```javascript
// toxiproxy-test.js
const Toxiproxy = require('toxiproxy-node-client');

const toxiproxy = new Toxiproxy('http://localhost:8474');

async function runChaosTest() {
  // Create proxy
  const proxy = await toxiproxy.createProxy({
    name: 'redis',
    listen: '0.0.0.0:26379',
    upstream: 'redis:6379'
  });

  // Run baseline test
  console.log('Running baseline test...');
  await runTests();

  // Add latency
  const latencyToxic = await proxy.addToxic({
    name: 'latency',
    type: 'latency',
    attributes: { latency: 1000, jitter: 500 }
  });

  // Run test with latency
  console.log('Running test with 1s latency...');
  await runTests();

  // Update latency
  await latencyToxic.update({ attributes: { latency: 5000 } });

  // Run test with higher latency
  console.log('Running test with 5s latency...');
  await runTests();

  // Remove toxic
  await latencyToxic.remove();

  // Simulate failure
  await proxy.disable();
  console.log('Running test with proxy disabled...');
  await runTests();

  // Cleanup
  await proxy.remove();
}

runChaosTest();
```

## Programmatic Usage (Python)

```python
# toxiproxy_test.py
from toxiproxy import Toxiproxy

toxiproxy = Toxiproxy()
toxiproxy.create_proxy(
    name="redis",
    listen="0.0.0.0:26379",
    upstream="redis:6379"
)

proxy = toxiproxy.get_proxy("redis")

# Add latency
proxy.add_toxic(
    name="latency",
    type="latency",
    attributes={"latency": 1000, "jitter": 500}
)

# Run tests
run_tests()

# Remove latency and add timeout
proxy.toxics.get("latency").destroy()
proxy.add_toxic(
    name="timeout",
    type="timeout",
    attributes={"timeout": 5000}
)

# Run tests
run_tests()

# Cleanup
proxy.destroy()
```

## CI/CD Integration

### GitHub Actions

```yaml
name: Chaos Testing
on: [push]

jobs:
  chaos-tests:
    runs-on: ubuntu-latest
    services:
      toxiproxy:
        image: ghcr.io/shopify/toxiproxy:latest
        ports:
          - 8474:8474
      redis:
        image: redis:7
        ports:
          - 6379:6379

    steps:
      - uses: actions/checkout@v3

      - name: Setup Toxiproxy
        run: |
          curl -X POST http://localhost:8474/proxies \
            -d '{"name":"redis","listen":"0.0.0.0:26379","upstream":"localhost:6379"}'

      - name: Run baseline tests
        run: npm run test:chaos:baseline

      - name: Run latency tests
        run: |
          curl -X POST http://localhost:8474/proxies/redis/toxics \
            -d '{"name":"latency","type":"latency","attributes":{"latency":2000}}'
          npm run test:chaos:latency
          curl -X DELETE http://localhost:8474/proxies/redis/toxics/latency

      - name: Run failure tests
        run: |
          curl -X POST http://localhost:8474/proxies/redis -d '{"enabled":false}'
          npm run test:chaos:failure
          curl -X POST http://localhost:8474/proxies/redis -d '{"enabled":true}'
```

## Testing Patterns

### Pattern 1: Circuit Breaker Validation

```bash
# Test circuit breaker opens on failures
curl -X POST http://localhost:8474/proxies/api/toxics \
  -d '{"name":"timeout","type":"timeout","attributes":{"timeout":100}}'

# Run requests - circuit should open after threshold
for i in {1..10}; do
  curl -w "%{http_code}\n" http://localhost:28080/api/health
done

# Circuit should be open now
curl http://localhost:28080/api/health  # Should fail fast

# Wait for circuit to half-open
sleep 30

# Reset and verify recovery
curl -X POST http://localhost:8474/reset
curl http://localhost:28080/api/health  # Should succeed
```

### Pattern 2: Retry Mechanism Test

```bash
# Simulate 50% failures
curl -X POST http://localhost:8474/proxies/api/toxics \
  -d '{"name":"reset","type":"reset_peer","toxicity":0.5,"attributes":{"timeout":100}}'

# Run test - should succeed with retries
npm run test:api:retry
```

### Pattern 3: Graceful Degradation

```bash
# Simulate cache failure
curl -X POST http://localhost:8474/proxies/redis -d '{"enabled":false}'

# App should fall back to database
npm run test:cache:fallback
```

## Best Practices

1. **Start Simple** - Begin with basic latency tests
2. **Measure Baseline** - Know normal performance first
3. **Incremental Chaos** - Gradually increase failure severity
4. **Monitor Everything** - Track metrics during chaos
5. **Automate Tests** - Include in CI/CD pipeline
6. **Document Expectations** - Define acceptable degradation
7. **Reset After Tests** - Always clean up toxics
8. **Test Recovery** - Verify systems recover properly

## Integration with Stack

- Test application resilience alongside K6 load tests
- Validate timeouts and retries in Playwright E2E tests
- Monitor with Falco during chaos experiments
- Track issues in DefectDojo
- Include results in Allure reports

## Troubleshooting

### Common Issues

**Issue**: Proxy not accessible
```bash
# Check proxy is listening
curl http://localhost:8474/proxies

# Verify network connectivity
docker network inspect testing-security-stack_default
```

**Issue**: Toxics not applied
```bash
# Verify toxic exists
curl http://localhost:8474/proxies/redis/toxics

# Check toxicity level (1.0 = 100%)
# Check stream direction (upstream/downstream)
```

**Issue**: Tests pass despite toxics
```bash
# Verify application uses proxy port
# Check toxicity is 1.0
# Verify correct stream direction
```
