ADR-046: Streaming Deliberation¶
Status: Implemented 2026-07-03 (epic #412, v0.30.0) Date: 2026-07-03 Decision Makers: Chris Joseph, LLM Council Council Review: 2026-07-03 (4 models, balanced) — feedback incorporated: event schemas, per-phase DoD, ADR-045 fallback Related: ADR-045 (MCP Tasks/progress — complementary), ADR-023 (gateway SSE, v0.27.1), ADR-012 (progress callbacks), ADR-044 (early consensus — event source)
Context¶
A council deliberation takes 30–600s and today renders as a spinner. The HTTP
server has an SSE endpoint emitting coarse stage-completion events
(council.stage1.complete, …), the gateway grew true token streaming in
v0.27.1 (complete_stream), and Stage 2 already observes per-reviewer
completions (ADR-040 Option D / ADR-044 P2 incremental path) — but none of
this reaches users as live content. Survey (2026-07): no implementation in
the llm-council ecosystem streams; this is the most visible differentiation
available, and perceived latency is the product's felt weakness.
Enabler: council.py (~104K) exceeds the Council review cap, so streaming
changes to it cannot be self-reviewed. Split it first (precedent: #380 split
verification/api.py 90K → 39K with back-compat re-exports).
Decision¶
Stream deliberation progressively at three depths, each opt-in per request
(stream=true on HTTP; MCP progress/Tasks per ADR-045). Non-streaming paths
byte-identical.
Phase 0 — Enabler: split council.py below the review cap¶
Extract cohesive units (stage functions, prompts, aggregation) into submodules with verbatim moves + back-compat re-exports, per the #380 playbook.
Phase 1 — Rich stage events¶
Extend the SSE endpoint from coarse stage events to per-model events:
stage1.response (model, full text as each lands — the as_completed path
already observes this), stage2.review (reviewer + parsed ranking),
consensus.early_termination (ADR-044 P2 event), stage3.start.
Phase 2 — Chairman token streaming¶
Stage 3 optionally uses the gateway complete_stream; SSE emits
synthesis.delta tokens, then the final structured result event (verdict,
usage/cost per ADR-011). Non-stream fallback identical to today.
Phase 3 — MCP surface¶
Map the same event stream onto MCP progress notifications (and Task progress when ADR-045 P1 lands) so Claude Code/Cursor users see live deliberation. ADR-045 fallback (council feedback): P3 does not depend on ADR-045 — plain MCP progress notifications carry the coarse events today; Task-based delivery is an upgrade when available.
Event schema (council feedback)¶
All SSE events are {"event": <name>, "data": {…}} with a shared envelope
(session_id, ts, seq). Payloads:
- stage1.response: {model, response, latency_ms, usage}
- stage2.review: {reviewer, ranking: [labels], parse_ok}
- consensus.early_termination: the ADR-044 P2 event payload verbatim
- stage3.start: {chairman} · synthesis.delta: {text}
- result: the full CouncilResponse (incl. ADR-011 usage) — terminal
- error: {stage, error_status, error_detail} (#403 semantics) — terminal
Schema is versioned (v field); additive changes only.
P1 implementation note (2026-07-03): the terminal events keep their
existing ADR-025 wire names council.complete / council.error — renaming
to result/error would break existing SSE consumers, violating this ADR's
own additive-only rule. The envelope (v/session_id/ts/seq) and all
new per-model events ship exactly as specified above.
Consequences¶
Positive: the spinner becomes a live deliberation view; ecosystem differentiation; the event vocabulary also feeds observability (ADR-030).
Negative / risks: streaming paths double some code routes (mitigation: stream assembled FROM the same primitives — the delta path constructs the same final result object, asserted by tests); Stage-1 responses streamed before anonymization must not leak into Stage-2 context (they don't — Stage 2 uses its own anonymized prompt — but tests must pin this); token streams + partial usage require the ADR-011 accounting to stay correct on cancelled/failed streams (gateway already raises on stream HTTP errors, #375).
Definition of Done (per phase — council feedback: concrete)¶
- P0: every extracted module < 50K chars; back-compat re-exports; suite count identical; ruff clean.
- P1: SSE emits all Phase-1 events with the schema above; non-stream responses byte-identical (test); event-order invariants tested.
- P2: streamed synthesis assembles to the SAME final result object as the non-streamed path (equality test); usage/cost correct on cancelled streams.
- P3: MCP progress visible in a real client session; tool descriptions document progress semantics.
- All phases: README streaming section, CLAUDE.md, CHANGELOG.
References¶
docs/roadmap-2026-h2.mditem 3; gateway SSE (v0.27.1, #375); #380 split playbook