Checkpoint
State Persistence LangGraph-inspiredEnables durable execution through state checkpointing, allowing workflows to resume from any point after interruption, failure, or planned pause.
Overview
The Checkpoint Agent implements a checkpointing system inspired by LangGraph's durable execution model. It enables long-running workflows to persist state at defined points, allowing for:
- Fault Recovery: Resume from last checkpoint after crashes or failures
- Human-in-the-Loop: Pause for approval, then continue seamlessly
- Time-Travel Debugging: Restore any previous state for investigation
- Session Continuity: Pick up work across conversation boundaries
Key Concepts
Checkpoint Types
| Type | When Created | Use Case |
|---|---|---|
| Phase Checkpoint | At workflow phase transitions | Major milestone recovery |
| Step Checkpoint | Before/after significant operations | Fine-grained recovery |
| User Checkpoint | Manually triggered by user | Intentional save points |
| Auto Checkpoint | Periodic (configurable interval) | Background protection |
Checkpoint Data
Each checkpoint captures:
- Workflow State: Current phase, step, and decision history
- Agent Context: Active agent, pending tasks, memory references
- File State: Hash of modified files for integrity verification
- Metadata: Timestamps, checkpoint ID, parent checkpoint reference
Commands
/checkpoint save
Manually create a checkpoint:
/checkpoint save "Before database migration" Checkpoint Created: chk_abc123 - Phase: Implementation - Step: 12 of 25 - Files tracked: 8 - Resume command: /checkpoint restore chk_abc123
/checkpoint list
View available checkpoints:
/checkpoint list Checkpoints for workflow: wf_xyz789 ID | Type | Phase | Created | Size ------------|--------|----------------|---------------|------- chk_abc123 | user | Implementation | 10 min ago | 2.4KB chk_auto_45 | auto | Implementation | 25 min ago | 2.1KB chk_phase_2 | phase | Architecture | 1 hour ago | 1.8KB
/checkpoint restore
Resume from a checkpoint:
/checkpoint restore chk_abc123 Restoring checkpoint chk_abc123... - Phase: Implementation - Step: 12 of 25 - Verifying file integrity... OK - Loading agent context... OK Ready to continue from: "Implement user authentication"
Integration Points
| System | Integration |
|---|---|
| Conductor | Automatic phase checkpoints at workflow transitions |
| Time-Travel | Uses checkpoints as replay starting points |
| Handoff | Checkpoint before agent handoffs for rollback |
| Memory | Stores checkpoint metadata in memory system |
Use Cases
- Long Implementation Sessions: Save progress during multi-hour coding sessions
- Risky Operations: Checkpoint before migrations, refactors, or security changes
- Approval Gates: Pause workflow pending human review, resume after approval
- Context Overflow: When context fills, checkpoint and start fresh session
- A/B Testing: Checkpoint, try approach A, restore, try approach B