Build intelligent multi-agent systems with tier-based workflows, quality gates, BRD-driven development, and intent engineering
The orchestration challenge: Single-agent AI systems fail at complex software development because:
Multi-agent orchestration solves this by decomposing work into specialized agents with formal handoffs, quality gates, and requirement tracing. Instead of one overwhelmed agent, you get a coordinated team where each agent has a clear role, deliverables, and verification criteria.
Every task is classified into TRIVIAL, MINOR, STANDARD, or MAJOR tier using a weighted 5-signal matrix:
| Signal | Weight | What It Measures | 1 (Low) | 4 (High) |
|---|---|---|---|---|
| scope | 30% | How many components affected | Single file tweak | Multi-service platform |
| type | 25% | Nature of work | Bug fix | Greenfield system |
| risk | 25% | Blast radius of failure | Dev-only change | Production auth system |
| ambiguity | 20% | Clarity of requirements | Exact spec provided | Vague description |
| intent_sensitivity | 25% | How closely task touches intent objectives or hard limits | Cosmetic change | Core security decision |
Calculation: score = (scope × 0.30) + (type × 0.25) + (risk × 0.25) + (ambiguity × 0.20) + (intent_sensitivity × 0.25)
# Example: "Build a SaaS dashboard with Stripe integration"
scope = 3.5 # Multi-page app, API, database, payment integration
type = 3.0 # New feature in existing codebase
risk = 3.5 # Payment processing (PCI compliance, fraud risk)
ambiguity = 2.5 # Some details provided, but UX/design unclear
intent_sensitivity = 3.0 # Core product feature, revenue-critical
score = (3.5 × 0.30) + (3.0 × 0.25) + (3.5 × 0.25) + (2.5 × 0.20) + (3.0 × 0.25)
= 1.05 + 0.75 + 0.875 + 0.5 + 0.75
= 3.925 → MAJOR tier (3.3-4.0)Why intent_sensitivity matters: A simple CSS change (scope=1) normally scores TRIVIAL. But if it's changing the color of a security warning that users must notice, intent_sensitivity=4, escalating the tier to ensure proper review.
Each tier uses a different workflow. Higher tiers add more phases and stricter gates.
analyze-codebase → conductor-builder(plan-and-implement) → verifyCharacteristics: Single agent, no critic gates, no BRD extraction. For quick fixes and cosmetic changes.
analyze-codebase → conductor-builder(plan) → conductor-builder(implement)
→ conductor-ciso(advisory) → conductor-critic(advisory) → verify
→ conductor-completeness-validator(advisory)Characteristics: Split planning/implementation, advisory-only gates (log findings but don't block).
conductor-project-setup → conductor-research → conductor-ciso(requirements)
→ CRITIC(post-ciso, advisory) → BRD-EXTRACTION → CRITIC(post-extraction, advisory)
→ [conductor-architect + api-design + database] → CRITIC(post-architect, advisory)
→ conductor-qa → CRITIC(post-qa, advisory) → conductor-builder(implement)
→ conductor-ciso(code-review) → [code-reviewer + qa + performance + compliance]
→ CRITIC(post-implementation, advisory) → FINAL-BRD-VERIFICATION
→ pentest-coordinator → CRITIC(post-pentest, BLOCKING)
→ CRITIC(pre-release, BLOCKING) → conductor-doc-gen → api-docs
→ devops → observability → conductor-completeness-validator(BLOCKING)Characteristics: Full workflow, most gates advisory, PRE-RELEASE and COMPLETENESS gates BLOCKING. Pentest required.
Same as STANDARD, but ALL gates are BLOCKING.Characteristics: Every checkpoint must pass before progression. Maximum scrutiny.
Quality gates are checkpoints where the conductor-critic agent validates deliverables. Gates can be:
| Gate | TRIVIAL | MINOR | STANDARD | MAJOR |
|---|---|---|---|---|
| POST-CISO | skip | advisory | advisory | BLOCKING |
| POST-BRD-EXTRACTION | skip | advisory | advisory | BLOCKING |
| POST-ARCHITECT | skip | advisory | advisory | BLOCKING |
| POST-QA | skip | skip | advisory | BLOCKING |
| POST-IMPLEMENTATION | skip | advisory | advisory | BLOCKING |
| PRE-RELEASE | skip | skip | BLOCKING | BLOCKING |
| POST-PENTEST | skip | skip | BLOCKING | BLOCKING |
| COMPLETENESS | skip | advisory | BLOCKING | BLOCKING |
Critical insight: PRE-RELEASE and COMPLETENESS gates are ALWAYS blocking in STANDARD+ tiers. This ensures no half-finished implementations or broken deployments.
Every project starts with a Business Requirements Document (BRD). The workflow ensures 100% BRD traceability:
{
"requirements": [
{
"id": "REQ-001",
"description": "User can log in with email/password",
"category": "functional",
"priority": "critical",
"acceptance_criteria": ["...", "..."],
"status": "pending",
"todo_file": null,
"is_placeholder": false
}
]
}is_placeholder: false validation.
The conductor uses a capability matrix to route tasks to the right agent. Each agent declares:
conductor-builder:
accepts:
- specification
- bug_fix_request
- implementation_task
produces:
- code
- tests
- updated_brd_tracker
requires:
- TODO_spec_file
- BRD-tracker.json
constraints:
- "No stub implementations"
- "Must update BRD-tracker status"
intent_constraints:
- "Must respect trade-off resolutions when making implementation decisions"
- "Must check delegation_boundaries before executing"
- "Must never violate hard_limits"Handoff validation: Before dispatching, conductor checks:
if not (source.produces ⊆ target.accepts):
error("Handoff invalid: source doesn't produce what target accepts")
if not (target.requires ⊆ available_artifacts):
error("Missing dependencies: {target.requires - available_artifacts}")This turns agent orchestration into a type-checked workflow.
Intent engineering solves the "agent guesses wrong trade-off" problem. Instead of hoping the agent picks the right balance (speed vs security, simplicity vs features), you declare intent upfront.
The intent block in conductor-state.json has four sections:
"objectives": [
"Production-ready authentication with MFA",
"WCAG AA accessibility compliance",
"Sub-200ms API response times"
]"trade_offs": [
{
"decision": "Security over speed",
"rationale": "Financial data - compliance is non-negotiable",
"implications": ["May sacrifice some UX convenience for MFA"]
},
{
"decision": "Simplicity over features",
"rationale": "MVP launch in 6 weeks",
"implications": ["Defer advanced analytics to v2"]
}
]"delegation_boundaries": {
"autonomous": [
"Code implementation within approved specs",
"Test generation",
"Documentation"
],
"human_in_loop": [
"Architecture decisions affecting >3 components",
"Third-party API selection",
"Database schema changes"
]
}"hard_limits": [
"No GPL dependencies in proprietary code",
"No API keys in code or config files",
"No unauthenticated routes to PII",
"Max bundle size 500KB (gzip)",
"Zero OWASP Top 10 violations"
]Agents with intent_constraints in the capability matrix:
At every checkpoint, conductor-critic validates:
# POST-IMPLEMENTATION gate
findings = []
# Check hard_limit violations (always BLOCKING)
if uses_gpl_library(code) and "No GPL" in hard_limits:
findings.append({
"severity": "CRITICAL",
"type": "HARD_LIMIT_VIOLATION",
"message": "GPL library detected",
"blocking": True # regardless of tier
})
# Check trade-off compliance
if trade_off == "Security over speed" and uses_weak_hash(code):
findings.append({
"severity": "HIGH",
"type": "TRADE_OFF_VIOLATION",
"message": "Weak hashing contradicts security priority"
})
# Check delegation boundary violations
if architectural_change and not has_approval:
findings.append({
"severity": "HIGH",
"type": "DELEGATION_VIOLATION",
"message": "Architecture change requires human approval"
})Hard limit violations escalate tier. A TRIVIAL task (score=1.2) that touches a hard limit (e.g., "No unauthenticated PII routes") auto-escalates to STANDARD tier (minimum) for blocking gates.
The workflow survives session restarts via conductor-state.json. Schema excerpt:
{
"project_name": "saas-dashboard",
"initiated_at": "2026-03-17T10:00:00Z",
"last_updated": "2026-03-17T14:32:15Z",
"tier": "MAJOR",
"tier_score": 3.925,
"tier_signals": {
"scope": 3.5,
"type": 3.0,
"risk": 3.5,
"ambiguity": 2.5,
"intent_sensitivity": 3.0
},
"current_phase": {
"number": 3,
"name": "Implementation",
"started_at": "2026-03-17T12:00:00Z"
},
"current_step": {
"number": 11,
"name": "Code Generation",
"assigned_agent": "conductor-builder",
"status": "in_progress"
},
"intent": {
"objectives": ["..."],
"trade_offs": [...],
"delegation_boundaries": {...},
"hard_limits": [...]
},
"task_queue": [
{
"id": "task-042",
"agent": "conductor-builder",
"prompt": "Implement TODO/feature-payment.md",
"status": "pending"
}
],
"completed_tasks": [...],
"verification_status": {
"extraction_complete": true,
"specs_complete": true,
"post_ciso_passed": true,
"post_architect_passed": false,
"gate_failures": [
{
"gate": "POST-ARCHITECT",
"reason": "Missing API error handling spec",
"timestamp": "2026-03-17T11:45:00Z"
}
]
}
}Recovery: /conduct resume reads state, verifies no steps were skipped, continues from current_step.
The conductor-completeness-validator agent runs exhaustive checks across 12 domains:
| Domain | What It Checks |
|---|---|
| Dependencies | Every import resolves, no missing packages |
| Dead Code | No orphan files, unused functions |
| Configuration | All env vars defined, no hardcoded secrets |
| Links | All internal links resolve, external links reachable |
| Assets | All referenced images/fonts/files exist |
| Build | Build succeeds with zero errors |
| Tests | Full test suite passes |
| Routes | Every route returns valid response (not 500) |
| API | Every endpoint responds correctly |
| UI | Pages load without console errors (if applicable) |
| Containers | Health checks pass (if containerized) |
| BRD Traceability | 100% requirements marked complete |
Output: completeness-report-<timestamp>.json with verdict (PASS/FAIL) and findings per domain.
When it runs: Phase 7 (after all code changes). In STANDARD+ tier, BLOCKING gate — workflow cannot complete until PASS.
The conductor-qa-review agent runs multi-model consensus reviews at checkpoints:
# Example: Code quality review
claude_findings = ["Weak input validation in auth.js", "No rate limiting"]
gemini_findings = ["Weak input validation in auth.js", "Missing error logging"]
codex_findings = ["Weak input validation in auth.js"]
# Consensus: 3/3 agree on input validation → CRITICAL
# Split: 1/3 on rate limiting → escalate to user decision
# Split: 1/3 on error logging → escalate to user decision
consensus_report = {
"critical": ["Weak input validation in auth.js"],
"escalated": [
{"finding": "No rate limiting", "votes": 1, "requires_review": true},
{"finding": "Missing error logging", "votes": 1, "requires_review": true}
]
}Escalation rule: If 1/3 models flag CRITICAL and others don't, escalate to user. Never auto-dismiss.
Profile selection by tier:
After implementation, the code-hardener agent runs automated security fixes:
# Example hardener output
{
"auto_fixed": [
{"file": "auth.js", "issue": "MD5 hash", "fix": "Replaced with bcrypt"},
{"file": "config.js", "issue": "Hardcoded API key", "fix": "Moved to .env"}
],
"requires_review": [
{
"file": "payment.js",
"issue": "SQL injection risk in dynamic query",
"todo_file": "TODO/security-payment-sqli.md",
"severity": "CRITICAL"
}
]
}Integration point: Runs in Phase 3 after conductor-builder, before conductor-critic POST-IMPLEMENTATION gate. Ensures security issues caught before final review.
Build a multi-agent orchestration system for software development with tier-based workflows, quality gates, and intent engineering. Create a "conductor" plugin with the following:
**1. Tier Classification System**
- 5-signal weighted matrix: scope (30%), type (25%), risk (25%), ambiguity (20%), intent_sensitivity (25%)
- Score range 1.0-4.0 maps to TRIVIAL/MINOR/STANDARD/MAJOR tiers
- Auto-escalate tier when task touches hard_limits
- Create tier-classifier.py that accepts task description, returns tier + signal breakdown
**2. Workflow Templates**
- Create workflow-templates.yaml with phase sequences for each tier
- TRIVIAL: single agent, no gates
- MINOR: split plan/implement, advisory gates
- STANDARD: full workflow, PRE-RELEASE and COMPLETENESS blocking
- MAJOR: all gates blocking
- Each template defines: phases, agents, gates (with mode: blocking/advisory/skip)
**3. BRD-Driven Development**
- Create BRD-tracker.json schema: id, description, category, status, todo_file, is_placeholder
- conductor-research agent generates BRD with numbered requirements
- BRD extraction (MANDATORY BLOCKING GATE) extracts all to BRD-tracker.json
- conductor-architect creates TODO specs, links in BRD-tracker
- conductor-builder updates status: pending → implemented → tested → complete
- Final verification gate: 100% requirements must be "complete"
**4. Intent Engineering**
- Extend conductor-state.json schema with intent block:
- objectives (array of strings)
- trade_offs (array of {decision, rationale, implications})
- delegation_boundaries ({autonomous: [...], human_in_loop: [...]})
- hard_limits (array of strings)
- conductor-critic validates at every checkpoint:
- hard_limit violations → always BLOCKING
- trade_off compliance → log findings
- delegation_boundary violations → escalate
- Agents with intent_constraints check before decisions, log rationale
**5. Agent Capability Matrix**
- Create capabilities.yaml with 14 core agents:
- conductor (orchestrator, model: opus[1m])
- conductor-research (requirements, model: sonnet)
- conductor-ciso (security, model: opus)
- conductor-architect (design, model: opus)
- conductor-builder (implementation, model: opus)
- conductor-qa (testing, model: sonnet)
- conductor-critic (validation, model: opus)
- conductor-code-reviewer (quality, model: sonnet)
- conductor-completeness-validator (artifact checks, model: opus)
- conductor-doc-gen (documentation, model: sonnet)
- conductor-devops (CI/CD, model: sonnet)
- conductor-performance (load tests, model: sonnet)
- conductor-compliance (SBOM, licenses, model: sonnet)
- conductor-qa-review (adversarial review, model: opus)
- Each agent defines: accepts, produces, requires, constraints, intent_constraints
- Handoff validation: source.produces ⊆ target.accepts, target.requires ⊆ available_artifacts
**6. Quality Gate System**
- Create quality-gates.yaml defining 8 gates:
- POST-CISO (STRIDE, OWASP coverage)
- POST-BRD-EXTRACTION (100% requirements captured)
- POST-ARCHITECT (100% BRD-to-spec mapping)
- POST-QA (test coverage validation)
- POST-IMPLEMENTATION (no placeholders)
- PRE-RELEASE (comprehensive readiness check)
- POST-PENTEST (findings remediated)
- COMPLETENESS (12-domain artifact validation)
- Each gate has mode matrix (tier → blocking/advisory/skip)
- conductor-critic agent executes gates, returns verdict + findings
**7. Completeness Validator**
- conductor-completeness-validator agent checks 12 domains:
- Dependencies (all imports resolve)
- Dead code (no orphan files)
- Configuration (all env vars defined)
- Links (internal resolve, external reachable)
- Assets (all referenced files exist)
- Build (succeeds with 0 errors)
- Tests (full suite passes)
- Routes (all return valid responses)
- API (all endpoints respond)
- UI (pages load without console errors)
- Containers (health checks pass)
- BRD traceability (100% complete)
- Output: completeness-report-.json with verdict + findings
**8. Adversarial Review**
- conductor-qa-review agent with multi-model consensus:
- Claude Opus 4.6 (primary)
- Google Gemini 2.0 (adversarial)
- OpenAI GPT-4o (tie-breaker if available)
- Consensus logic: 3/3 agree → CRITICAL, 1/3 → escalate to user
- Profile selection by tier: quick/standard/thorough
- Never auto-dismiss 1/3 CRITICAL findings
**9. State Persistence**
- conductor-state.json schema with:
- project_name, tier, tier_score, tier_signals
- current_phase, current_step
- task_queue, completed_tasks
- verification_status (gates passed/failed)
- intent block
- BRD progress
- SessionStart hook injects status if state exists
- PostToolUse hook validates state against schema
**10. Commands**
- /conduct command with argument routing:
- new → tier classification, create state, begin workflow
- resume → read state, continue from current_step
- status → comprehensive status display
- reset → delete state
- validate → run completeness-validator
**Deliverables:**
- Complete conductor plugin with all agents, skills, commands
- Tier classification system with 5-signal matrix
- BRD-tracker.json schema and extraction workflow
- Intent engineering with 4-section intent block
- Quality gate system with mode matrix
- Completeness validator checking 12 domains
- Adversarial review with multi-model consensus
- State persistence with recovery
- Working /conduct command with full workflow orchestration
A CSS color change and a payment processing system need different rigor levels. One-size-fits-all means:
Tiered workflows solve this by matching rigor to risk. The 5-signal matrix ensures objective classification.
Without explicit intent, agents guess trade-offs:
Intent engineering declares upfront what matters. Agents don't guess, they consult intent block. Misalignment detected at gates, not in production.
Without forced extraction, agents:
The blocking gate ensures 100% requirements captured before any design work. No progression until BRD-tracker.json complete.
Without capability validation, you get:
The capability matrix turns orchestration into a type-checked workflow. Handoffs validated at dispatch, not at failure.
Single-model review has blind spots:
Multi-model consensus provides defense in depth. If 3/3 agree it's safe, high confidence. If 1/3 flags CRITICAL, escalate — never auto-dismiss.
Agents claim "done" but:
Completeness validator is the "does it actually work" gate. Runs after all code changes, checks 12 domains, blocking in STANDARD+ tier.
Software projects take days/weeks. Without state:
State persistence in conductor-state.json enables resume from any step. Workflow survives network failures, session timeouts, even machine reboots.
Conductor is a Claude Code plugin. It uses:
Conductor integrates with memory plugins for:
# After successful workflow completion
memory_store(
type="procedure",
content=f"STANDARD tier workflow for {project_type}",
metadata={"tier": "STANDARD", "phases": 7, "duration_hours": 18}
)Governance plugins enforce compliance during orchestration:
Code hardener runs in Phase 3 after conductor-builder:
# Phase 3 sequence
conductor-builder(implement) → code-hardener → conductor-ciso(code-review)
→ [code-reviewer + qa] → CRITIC(post-implementation)Hardener auto-fixes safe issues (weak crypto → strong crypto). Complex issues generate TODO specs routed back to conductor-builder.
Testing runs at multiple checkpoints:
Tests stored in git, run in Docker container (testing-security-stack) for isolation.
Phase 6 (deployment) integrates with CI/CD:
Multi-agent orchestration transforms software development from single overwhelmed agent to coordinated specialist team:
The result: production-ready code with no placeholders, 100% test coverage, security validated, and every requirement proven complete.